This action might not be possible to undo. Are you sure you want to continue?

Design and Modeling for Computer Experiments

© 2006 by Taylor & Francis Group, LLC

Chapman & Hall/CRC

**Computer Science and Data Analysis Series
**

The interface between the computer and statistical sciences is increasing, as each discipline seeks to harness the power and resources of the other. This series aims to foster the integration between the computer sciences and statistical, numerical, and probabilistic methods by publishing a broad range of reference works, textbooks, and handbooks. SERIES EDITORS John Lafferty, Carnegie Mellon University David Madigan, Rutgers University Fionn Murtagh, Royal Holloway, University of London Padhraic Smyth, University of California, Irvine Proposals for the series should be sent directly to one of the series editors above, or submitted to: Chapman & Hall/CRC 23-25 Blades Court London SW15 2NU UK

Published Titles Bayesian Artificial Intelligence Kevin B. Korb and Ann E. Nicholson Pattern Recognition Algorithms for Data Mining Sankar K. Pal and Pabitra Mitra Exploratory Data Analysis with MATLAB® Wendy L. Martinez and Angel R. Martinez Clustering for Data Mining: A Data Recovery Approach Boris Mirkin Correspondence Analysis and Data Coding with Java and R Fionn Murtagh R Graphics Paul Murrell Design and Modeling for Computer Experiments Kai-Tai Fang, Runze Li, and Agus Sudjianto

© 2006 by Taylor & Francis Group, LLC

Computer Science and Data Analysis Series

**Design and Modeling for Computer Experiments
**

Kai-Tai Fang

Hong Kong Baptist University Hong Kong, P.R. China

Runze Li

The Pennsylvania State University University Park, PA, U.S.A.

Agus Sudjianto

Bank of America Charlotte, NC, U.S.A.

Boca Raton London New York

© 2006 by Taylor & Francis Group, LLC

Published in 2006 by Chapman & Hall/CRC Taylor & Francis Group 6000 Broken Sound Parkway NW, Suite 300 Boca Raton, FL 33487-2742 © 2006 by Taylor & Francis Group, LLC Chapman & Hall/CRC is an imprint of Taylor & Francis Group No claim to original U.S. Government works Printed in the United States of America on acid-free paper 10 9 8 7 6 5 4 3 2 1 International Standard Book Number-10: 1-58488-546-7 (Hardcover) International Standard Book Number-13: 978-1-58488-546-7 (Hardcover) Library of Congress Card Number 2005051751 This book contains information obtained from authentic and highly regarded sources. Reprinted material is quoted with permission, and sources are indicated. A wide variety of references are listed. Reasonable efforts have been made to publish reliable data and information, but the author and the publisher cannot assume responsibility for the validity of all materials or for the consequences of their use. No part of this book may be reprinted, reproduced, transmitted, or utilized in any form by any electronic, mechanical, or other means, now known or hereafter invented, including photocopying, microfilming, and recording, or in any information storage or retrieval system, without written permission from the publishers. For permission to photocopy or use material electronically from this work, please access www.copyright.com (http://www.copyright.com/) or contact the Copyright Clearance Center, Inc. (CCC) 222 Rosewood Drive, Danvers, MA 01923, 978-750-8400. CCC is a not-for-profit organization that provides licenses and registration for a variety of users. For organizations that have been granted a photocopy license by the CCC, a separate system of payment has been arranged. Trademark Notice: Product or corporate names may be trademarks or registered trademarks, and are used only for identification and explanation without intent to infringe.

**Library of Congress Cataloging-in-Publication Data
**

Fang, Kaitai. Design and modeling for computer experiments / Kai-Tai Fang, Run-ze Li, and Agus Sudjianto. p. cm. -- (Computer science and data analysis series ; 6) Includes bibliographical references and index. ISBN 1-58488-546-7 (alk. paper) 1. Computer simulation. 2. Experimental design. I. Li, Run-ze. II. Sudjianto, Agus. III. Title. IV. Series. QA76.9.C65F36 2005 003'.3--dc22

2005051751

**Visit the Taylor & Francis Web site at http://www.taylorandfrancis.com
**

Taylor & Francis Group is the Academic Division of T&F Informa plc.

and the CRC Press Web site at http://www.crcpress.com

© 2006 by Taylor & Francis Group, LLC

Preface

With the advent of computing technology and numerical methods, engineers and scientists frequently use computer simulations to study actual or theoretical physical systems. To simulate a physical system, one needs to create mathematical models to represent physical behaviors. Models can take many forms and diﬀerent levels of granularity of representation of the physical system. The models are often very complicated and constructed with diﬀerent levels of ﬁdelity such as the detailed physics-based model as well as more abstract and higher level models with less detailed representation. A physics-based model may be represented by a set of equations including linear, nonlinear, ordinary, and partial diﬀerential equations. A less detailed and a more abstract model may employ stochastic representations such as those commonly used to simulate processes based on empirical observations (e.g., discrete events simulations for queuing networks). Because of the complexity of real physical systems, there is usually no simple analytic formula to sufﬁciently describe the phenomena, and systematic experiments to interrogate the models are required. In particular, systematic experiments are useful in the following situations: 1) the model is very complex where the underlying relationships are nonlinear with many variables interacting with one another; 2) the model exhibits stochastic behavior; or 3) the model produces multiple and complicated outputs such as functional response or even 3D animation. In such situations, it is often diﬃcult and even impossible to study the behavior of computer models using traditional methods of experiments. One approach to studying complex input-output relationships exhibited by the simulation model is by constructing an approximation model (also known as a metamodel) based on a set of limited observation data acquired by running the simulation model at carefully selected design points. This approximation model has an analytic form and is simpler to use and apply. The availability of a metamodel as a surrogate to the original complex simulation model is particularly useful when applying the original model is a resource-intensive process (e.g., in terms of model preparation, computation, and output analysis), as is typically the case in complex simulation models. The need for a “cheap-to-compute” metamodel is even greater when one would like to use the model for more computationally demanding tasks such as sensitivity analysis, optimization, or probabilistic analysis. To perform the aforementioned tasks successfully, a high quality metamodel developed from a limited sample size is a very important prerequisite. To ﬁnd a high quality metamodel, choosing a good set of “training” data becomes an important issue

v

© 2006 by Taylor & Francis Group, LLC

Various modeling techniques (e. we would like to provide readers © 2006 by Taylor & Francis Group. In summer 2002. (orthogonal) polynomial regression models. This has been a challenge for us because. hundreds of papers have been published in both statistics and engineering communities. and it purposefully omits the corresponding theoretical details. local polynomial regression. which is becoming increasingly more common in modern computer experiments. How to ﬁnd the best suited metamodel is another key issue in computer experiments. we learned about the new book “The Design and Analysis of Computer Experiments” by Santner. In summer 2003. optimal Latin hypercube samplings and uniform designs. in particular. After careful discussion. Ford has intensively applied computer experiments (including uniform design) to support new product development. This approach addresses cases where response data are collected over an interval of some index with an intensive sampling rate. In the last chapter. The pervasive use of computer experiments at Ford to support early stages of product design before the availability of physical prototypes is a cornerstone in its “Design for Six Sigma (DFSS)” practice. at the same time. As design and modeling of computer experiments have become popular topics in recent years. Our motivation to write the book was. to visit Ford for a technical exchange and a series of lectures to explore the synergy between statistical theory and the practice of computer experiments. Williams and Notz (2003). The collaboration has been challenging because of the geographical barriers as well as busy schedules.g. There are many space-ﬁlling designs such as Latin hypercube sampling and its modiﬁcations. Bayesian methods.. Part II of this book introduces various space-ﬁlling designs and their constructions while Part III presents most useful modeling techniques. which provides a good reference and complementary material to our book. Therefore. this book has been structured to emphasize practical methodology with applications. By eﬃcient. and neural networks) have been proposed and applied by many practitioners and researchers. Kriging models. LLC . we introduce functional response modeling.vi Design and Modeling for Computer Experiments for computer simulation. The so-called space-ﬁlling design provides ways to generate eﬃcient training data. to share our experience of blending modern and sound statistical approach with extensive practical engineering applications. Several comprehensive review papers have emphasized the importance and challenge of this relatively new area. with tremendous practical success. the ﬁrst two authors of the book were invited by the third author. we mean able to generate a set of sample points that capture maximum information between the input-output relationships. In recent years. we decided to collaborate on a book on design and modeling for computer experiments. strengthened by the lack of a single comprehensive resource on the subject as well as our desire. at that time. design and modeling are two key issues in computer experiments. Because of our motivation to provide useful techniques to practicing engineers and scientists who are interested in applying the methodology. automotive engine design. multivariate spline and wavelets. who was a manager at Ford Motor Company.

Yin. Hong Kong Baptist University and Chinese Academy of Sciences Runze Li. Mahesh Vora and many others.F. and to his former postgraduate students.H. Yeung for their excellent technical assistance.Preface vii with suﬃcient sound statistical justiﬁcations. The Pennsylvania State University Agus Sudjianto. G. Kai-Tai Fang. Y. Q. RenJye Yang. E. Feng Shen. Winker. X. M. Various examples are provided in the book to clarify the methods and their practical implementations. Xie. and A. Pan. LLC . Hong Kong Baptist University grant FRG/0304/II-711. J. Liu. Fang. Wang. Liski. P. and Xiaoping Du. for their successful collaboration and their contributions. X. is grateful to his collaborators L. would like to thank John Dziak. H. Fang is very grateful to Professor C. Radwan Hazime. He also would like to thank his collaborators Wei Chen. J. Sudjianto would like to thank many of his former colleagues at Ford Motor Company. Lin for their encouragement and/or valuable comments. Ge. Chan. the company where he spent his 14-year engineering career. Rao. for their generous collaboration and support. Mr. Ma. the ﬁrst author of the book. The book presents the most useful design and modeling methods for computer experiments. X. Georg Festag. Thomas McCarthy. Hickernell and D. Lu. and Zhe Zhang for enthusiastic support of his work. J. Zhang. and the Statistics Research and Consultancy Centre. Richard Chen. the President of Hong Kong Baptist University. Yiyun Zhang. The book can be used as a textbook for undergraduate and postgraduate students and as a reference for engineers. L. Frank Fsadni. Hong Kong Baptist University. M. and Mr. Lo. the second author. acknowledges ﬁnancial support from Hong Kong RGC grant RGC/HKBU 200804. and J. Ng. for his support and encouragement. for signiﬁcant contributions to this area. C. P. Joseph Stout. including some original contributions from the authors. Bank of America © 2006 by Taylor & Francis Group. Qin. R. Tian. Ho. Y. K. Y. Kuan. All three authors thank Miss P. Ruichen Jin. Mukerjee. Y. Nathan Soderborg. John Koszewnik. is grateful for ﬁnancial support from the National Science Foundation and the National Institute on Drug Abuse. Steve Wang. K. Tang. M. Jovan Zagajac. Y. We thank Professors C. W. R. in particular. X. K. Liem Ferryanto. Li. the ﬁrst author. G. Fang. the second author. Cheung. Li. Fred J.

. . . . .1 IMSE Criterion . . . . .3. 1.2. 2. . . . .4 Optimal Latin Hypercube Designs . . . . . . . . . 1. . 1. . . . . . . .2. . . . . . . .2 Measures of Uniformity . . . . . . . . . . . . 1. 1. . . . . . .3 Computer Experiments . . . . .4. . . . . . . 2. . . . .3 Minimax and Maximin Distance Criteria and Their extension . . . . . . . . 1. . . . . . . 3 Uniform Experimental Design 3. . . . . . . 40 45 47 47 51 54 60 60 62 . . . . . . . . . . . . . . . .10 Guidance for Reading This Book . . . .7 Sensitivity Analysis . . . . . . . . . . . . . . . . . .6 Modeling Techniques . . . . . . . . . . . . . . .9 Remarks on Computer Experiments . . . . . . . . . 2. . . . . . . . . Part II Designs for Computer Experiments 2 Latin Hypercube Sampling and Its Modiﬁcations 2.2 Modiﬁed L2 -Discrepancy . . . 64 . . . . . 3. . . . . 1. . . . . . . . . . . .4. . . . .3. . . . . . . . .Contents Part I An Overview 1 Introduction 1. . . . . . . .4. . . . . . . . .3 The Centered Discrepancy . . . . . . . . .1 Latin Hypercube Sampling . . . . . . . . . .3. . . . . .2 Some Concepts in Experimental Design . . . . . 33 . . . . . . . . . .2. . 2. . . . . . 1. . . . . . . . . . 3. . . . . .2 Entropy Criterion . . . . . . . . . . . 38 . . . 1. .1 Motivations . . . . . . . . . . . . . . LLC . . . 3. . .2 Metamodels . . 2. . . . . . 2. .1 The Star Lp -Discrepancy 3. . . . . . . . . . . . . . . . .4. . . . . . . . . . . . . . . . . . . . . . 67 67 68 68 70 71 ix © 2006 by Taylor & Francis Group. . . . . . . . . . . . . . . . . . . . . . . . .8 Strategies for Computer Experiments and an Illustration Case Study . 1. . . . . . . . . . .3 Computer Experiments in Engineering . . . . . . . . . .4 Examples of Computer Experiments . . . . . . . . . .5 Space-Filling Designs . .1 Introduction . . . . . . . . . . . . . . . . . . . . . 1 3 3 4 10 10 12 16 20 24 26 31 . . . . . . . . . . . . . . . . . . . . .4 Uniformity Criterion . 2.1 Experiments and Their Statistical Designs . . . . . . . 65 . . . . . . . .3 Symmetric and Orthogonal Column Latin Hypercubes . 1. . . .2 Randomized Orthogonal Array . . . . . . . . . . . . . . . . . . . . . 1.

. . . .3 Simulated Annealing Algorithm . . . 3. . . . . . . 3. . . 90 . 3. . . . . . . . . . . . . .1 Algorithms .6 The Cutting Method . . . . . 3.7 Applications of Uniformity in Experimental Designs Construction of Uniform Designs . . 3. . . . . Construction of Asymmetrical Uniform Designs . . . . . . . . . . . . . . 3. . . . . . . .2. . . 121 4.1. . . . .4 The Wrap-Around Discrepancy . . . . . . . . . . .6 4 Optimization in Construction of Designs for Computer Experiments 105 4. . .3. .1 Lower Bounds of the Categorical Discrepancy . . . . . . .1 One-Factor Uniform Designs . . . . . . . . .3.6.1 Algorithmic Construction . . . . . . . . . . . . . . . . . . . . .2. . . . . .6. . . . . .1.5 .3 New Uniform Designs . .5. . . . . . . . . . . . 113 4. 93 93 94 94 97 97 97 100 103 3. . . . . .5.6. . . .4 Threshold Accepting Algorithm .3. . . . . . . . . 3. 72 73 75 76 78 78 79 80 85 86 86 90 3.6. . . . .3 Lower Bounds of the Discrepancy and Related Algorithm . . .3. . . . . . . . . .3 Good Lattice Point Method . . . . . . . . . . . . . . . and Robustness .5 A Uniﬁed Deﬁnition of Discrepancy . . . . . . .3 3. . . . . .2. 105 4. . 3. . . 114 4. .2 Collapsing Method . . .2. . . . . . . . . . . . . . . 113 4. .2 Lower Bounds of the Wrap-Around L2 -Discrepancy . 3. . .4 Miscellanea . . . . 3. . . 3.5. . . . . . . . . . . 115 4. .4 Balance-Pursuit Heuristic Algorithm . . . . . 115 4.1. . .3 Combinatorial Method . . 3. Construction of Uniform Designs via Resolvable Balanced Incomplete Block Designs . . . . . . . .2. . . . . . . . . . . . . . . . . . . . . .6 Descrepancy for Categorical Factors . . . . . . 119 4. . . 3. . . . . .1 Resolvable Balanced Incomplete Block Designs . . .2 Optimization Algorithms . . . . . . . .3. . .2. . . . . 116 4. 3. . . . .5 Stochastic Evolutionary Algorithm . . . . . 3. . . . . . . . . . .2. . . . .5 Expanding Orthogonal Array Method . . . . . . . . . . . . . . . 107 4. . . . .3. . . . . 106 4. 117 4. . .4 Latin Square Method . LLC .7 Construction of Uniform Designs by Optimization . 109 4. . . . . .4 Iteration Formulae . . . . . .2. . . . . . . . . . . . . . . . . . . . . . . . . . . . . Characteristics of the Uniform Design: Admissibility. .3. . Minimaxity. . . . . . . . . . .2 Neighborhood . .1. . . . . . . 3. .3. . . . . .3. . . 3. . .3 Replacement Rule . . . . . . .2 Symmetrical Uniform Designs . . . . . . . . . . . .2 RBIBD Construction Method . . . . . . . .3. .3. .2. 122 Part III Modeling for Computer Experiments 125 © 2006 by Taylor & Francis Group. . . . . . . . 106 4. . . . . . . . . . . . . .3 Lower Bounds of the Centered L2 -Discrepancy . . .1 Optimization Problem in Construction of Designs . . . 119 4. . . .4 3.2 Local Search Algorithm . .x Design and Modeling for Computer Experiments 3. . . . . .1 Pseudo-Level Technique . . . . . . . . . . . . . . . . . .

. .4 Gaussian Kriging Models . . . . . . . . 5. . . .3. .2 An Illustration . . 5. . . . . . . .7. . . . . .3. . . . . . . . . . . . . . . .5.2 Sensitivity Analysis Based on Regression Analysis . . . . . .4. . . . . . . . . . . . . . . . . .4 An Example: Borehole Model . . . .3 Use of Derivatives in Surface Prediction . . . . .4. . .5. . . . . . . . . . . . . .3.8. . . 5. . . .7 Local Polynomial Regression . . © 2006 by Taylor & Francis Group. . . . . . . . . . .7.3 Example of Sobol’ Global Sensitivity .6. . . . xi 127 127 127 130 133 139 140 142 144 145 146 147 153 159 159 160 162 165 167 168 172 177 180 180 183 184 184 185 187 187 188 188 191 193 193 195 198 199 202 205 . . . 5. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5. LLC . . . 5. . . . . . . . .2 Metamodeling via Local Polynomial Regression 5. . .2 Polynomial Models . . . .6 Example of FAST Application . .1 Prediction via Kriging . . 6. . . . . . . . . .1 Basic Concepts . . . . . 5. . . . .1. . . . . . . . . 5. . . . . . . . . . 6. . . . . . . . . 5.1 Introduction . . . . .4 Correlation Ratios and Extension of Sobol’ Indices 6. .1 Gaussian Processes . . .6. .8 Some Recommendations . . . . . . . . .3. . . . . . 5. . . . . . . . . .6.1 Multi-Layer Perceptron Networks .5 Fourier Amplitude Sensitivity Test . . . . . . . . . . . . . . . . . . . . .3 Spline Method . . . . . . . . . . . . . . . . . . . .1 Mean Square Error and Prediction Error . . . . . . . 6. . . . . .3 Other Bases of Global Approximation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5. . . . . . . . .5. . . . . . . . . . 6 Model Interpretation 6. . . . . . . . . 5.5. . . . . . . . . . . . 6. . . 5. . . . . . . . 5.3 Radial Basis Functions . . . 5.2 An Example . . . . . . . . .8. . . . . 5. . . . . . . .4. . . . . . .3. . . . . . . . . 6.1 Connections . . . . . . . . . . . . . . . . . .3 Sensitivity Analysis Based on Variation Decomposition . . .3.1 Motivation of Local Polynomial Regression . . . . . . . . . 6. .2 Computational Issues . . 5. . . . . . .6 Neural Network . . . . . . . . . . .2. . . . . .3 A Case Study . . . .2 Recommendations . . . .1 Functional ANOVA Representation . 5. . . . . . . . . . . . . . . . .2 A Case Study . . . . . 5. . . . . . . . . . 6. . .3. . 5. . . . . . . . . . . . . . . . . . . . . . . . . 5. . . . .1 Construction of Spline Basis . . . 5.2. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3. . . . . . . . . . . . . . . . . . . . . . . . .3. . . 5.1 Criteria . . . . . . . . . . 6. . . 5. . . .2 Regularization . .2 Bayesian Prediction of Deterministic Functions 5. 6. . .Contents 5 Metamodeling 5. .1. .5 Bayesian Approach . . . . . . . .2 Estimation of Parameters . . . . . . . . .

. . . . . . . .2. . . . .3.2 Eﬃcient Estimation Procedure . . 7. . . . © 2006 by Taylor & Francis Group. . . . . . . . . . .1 Partially Linear Model . 7. . . . . . . 7. . . . . . Appendix A. . . . . . . . . .4. . . . . . . . . .2. . 7. . . . .1 Random Variables and Random Vectors . . . . . . . . . . .4. . . . . 7. . . . .1 A Graphical Tool . . . . .3 An Illustration . . . . . . .5. .4. .1 Nonconvex Penalized Least Squares . . . . . . . A. . . . . . . . . . . . Acronyms References . .4 Functional Linear Models . . . . . . . . .5. . . . . . . . . . . . . .5. Process . A. . 7. . . . . . . . . .1 Computer Experiments with Functional Response . . . . . . . .3. . . . . . . . . . . . . .4. . . . .2 Functional Response with Intensive Sampling Rate 7. . . . . . . . . . . . . . . . . LLC . . . . . .3 Penalized Regression Splines . . . . . .4 An Illustration .3 An Illustration . . .2 Spatial Temporal Models . . . . . . .3.3 Analysis of Variance . . . . . 7. . A. . . . 7. . . . . . A. . . . . . . . .3 An Illustration . . 7. . . . .2 Partially Functional Linear Models . .4. . . . . A. .3 Linear Regression Analysis . . . . . . . 7. 7. . . . . . .2 Method of Least Squares . . . . . . . . . . . . . . . .5 Semiparametric Regression Models . . . . . . . . . . . . A. . . . . . .2 Some Statistical Distributions and Gaussian A. . . .1 Linear Models .xii Design and Modeling for Computer Experiments 207 207 215 215 218 219 222 223 224 226 230 230 234 236 241 241 244 244 247 249 250 251 252 253 256 257 258 259 261 263 7 Functional Response 7. . . . . . . . .2 Some Concepts in Probability and Statistics . . . . . A.1 Some Basic Concepts in Matrix Algebra . . . . . . . . . . . . . . . . . . . . . . .3. A. . . A. . . . . . . . . . . . . A. . . . . . .2 Iteratively Ridge Regression Algorithm . .2. . . . . A. . .2. . . .4. .4 Variable Selection for Linear Regression Models . . . . . . . . .1 Functional Response with Sparse Sampling Rate . .

Part I An Overview This part provides a brief introduction to the design and modeling of computer experiments. LLC . It also presents some interesting and heuristic examples of computer experiments that will serve as illustrations for methods presented later in the book. and guidelines for reading the rest of the book are suggested. including such experimental design concepts as orthogonal arrays and design optimality. An illustrative case study is also provided. 1 © 2006 by Taylor & Francis Group.

Existence of random errors creates complexity in data analysis and modeling. A good design is one which is optimal with respect to a 3 © 2006 by Taylor & Francis Group. • to improve the quality of products. 22: Statistics in Industry. Rao. the experimenter may use some powerful statistical experimental designs. LLC . • to reduce the development time. alternatively. an experiment is implemented in a laboratory. Experimental design. and • to reduce the overall costs. science. Experiments can be classiﬁed as • physical experiments.R. or an agricultural ﬁeld. the experimenter may choose one or few factors in the experiment so that it is easy to explore the relationship between the output and input.1 Introduction 1.R. Nowadays experiments are performed almost everywhere as a tool for studying and optimizing processes and systems. or • computer (or simulation) experiments. Therefore. Ghosh and C. a factory. where the experimenter physically carries out the experiment. The statistical approach to the design of experiments is usually based on a statistical model. Comprehensive reviews for various kinds of designs can be found in Handbook of Statistics. A good experimental design should minimize the number of runs needed to acquire as much information as possible.1 Experiments and Their Statistical Designs The study of experimental design is extremely important in modern industry. and engineering. 13. has enjoyed a long history of theoretical development as well as applications. Edited by R. Vol. edited by S. There always exist random errors in physical experiments so that we might obtain diﬀerent outputs under the identical experimental setting. Khattree and C. Rao as well as in Handbook of Statistics. The purpose of an experiment in industrial engineering is • to improve process yields. Traditionally. a branch of statistics. This is called physical experiment or actual experiment. Vol. such as to reduce variability and increase reliability.

The simulation of the random process is conducted by incorporating random inputs into the deterministic model. Williams and Notz (2003)). As computing power rapidly increasing and accessible. pressure. optimal design.. and variance of random error. One of the goals of computer experiments is to ﬁnd an approximate model (“metamodel” for simplicity) that is much simpler than the true one. In the past decades computer experiments or computer-based simulation have become topics in statistics and engineering that have received a lot of attention from both practitioners and the academic community.. and computer experiments. to Atkinson and Donev (1992) for optimal designs. (1978) and Dey and Mukerjee (1999) for factorial designs. we need related concepts. The underlying model in a computer experiment is deterministic and given. ratio of two raw materials. it has become possible to model some of these processes by sophisticated computer code” (Santner. interactions. Williams and Notz (2003) for computer experiments. which will be given in the next section. Good designs (e. Factor: A factor is a controllable variable that is of interest in the experiment. A quantitative factor is one whose values can be measured on a numerical scale and that fall in an interval. A factor may be quantitative or qualitative. For detailed introductions for the above three kinds of experiments.2 Some Concepts in Experimental Design This section introduces some basic concepts in factorial design.4 Design and Modeling for Computer Experiments statistical model under our consideration. For details the reader can refer to Box et al. but errors on the inputs are considered and assumed. and to Santner. regression coeﬃcients. 1. among which the fractional factorial design – based on an ANOVA model – and the optimum design – based on a regression model – are the most widely used in practice. orthogonal arrays and various optimal designs) may provide unbiased estimators of the parameters with smaller or even the smallest variance-covariance matrix under a certain sense. LLC . Many physical experiments can be expensive and time consuming because “physical processes are often diﬃcult or even impossible to study by conventional experimental methods. There are many statistical models for physical experiments. Useful concepts for design of (physical) experiments and basic knowledge of the orthogonal arrays and the optimal designs will be given in the next section.g. but it is often too complicated to manage and to analyze. temperature. Simulation experiments study the underlying process by simulating the behavior of the process on a computer. These models involve unknown parameters such as main eﬀects. e. The true model is deterministic and given as in a computer experiment. reaction © 2006 by Taylor & Francis Group.g.

A level-combination is one of the possible combinations of levels of the factors. Chapter 7 in this book will discuss computer experiments with functional response. Response: Response is the result of a run based on the purpose of the experiment. Experimental domain. σ 2 ) in most experiments. A factor may be chosen to have a few speciﬁc values in the experimental domain. Nevertheless. trial: A run or trial is the implementation of a level-combination in the experimental environment. some uncontrollable factors during regular operation are purposefully controlled and included during the experiment so that the product can be designed to perform well in varying operating environments. etc. thus. Because a computer experiment is deterministic. Run. © 2006 by Taylor & Francis Group. and can be a function that is called functional response. We might obtain diﬀerent results in two runs of the identical environment due to the random error. A run might be a physical or computer experiment. there is no random error. These selected values are called levels of the factor. The qualitative factor is also called a categorical factor or an indicator factor. Random error: In any industrial or laboratory experiment there exist random errors. These are known as noise factors. The variables that are not studied in the experiment are not considered as factors. In computer experiments the concept of level loses the original statistical meaning deﬁned in factorial plans. Levels are used in the ANOVA model where the experimenter wants to test whether the response y has a signiﬁcant diﬀerence among the levels. factors are usually called input variables. level.Introduction 5 time length. Therefore. a trial (in the sense of one of several runs at the same experimental point) is only meaningful in physical experiments. The response can be numerical value or qualitative or categorical. but we retain this concept to ease the construction of designs (see Chapter 2). however. multiple trials will produce identical outputs. A qualitative factor is one whose values are categories such as diﬀerent equipment of the same type. A level-combination can be considered a point in input variable space and called an experimental point. and they are set to ﬁxed values. The sensitivity analysis in Chapter 6 is such an example. In computer experiments responses are also called outputs. and level-combination: Experimental domain is the space where the factors (input variables) take values. etc. the statistical theory and methods that have been constructed to address random errors cannot be directly applied to analyze data from computer experiments. diﬀerent operators. LLC . The variance σ 2 measures magnitude of the random error. In computer experiments the experimental domain is also called input variable space. the ideas and principles of many statistical methods can be extended to those cases without random errors. several kinds of a material. In computer experiments. In robust design study. In computer experiments. A level-combination is also called a treatment combination. The random error can often be assumed to be distributed as a normal distribution N (0. at which the factor is tested. The variables that cannot be controlled at all are treated as the random error.

Totally. each having three levels. where n is the number of runs and s the number of factors. but the design D chooses only nine of them. and the jth column of the matrix has qj levels (entries). How to choose a good subset is the most important issue in FFD. 2) each entry-combination in any two columns appears equally often. A factorial design is called symmetric if all factors have the same number of levels. denoted by OA(n. LLC . A FFD can be expressed as an n × s matrix. the design in Example 1 below can be expressed as ⎤ ⎡ 80 90 5 a ⎢ 80 120 7 b ⎥ ⎥ ⎢ ⎢ 80 150 9 c ⎥ ⎥ ⎢ ⎢ 90 90 7 c ⎥ ⎥ ⎢ (1. Sloane and Stufken (1999) for the details. it is called asymmetric. is an n × s matrix with entries 1. Orthogonal array: An orthogonal array (OA) of strength t with n runs and s factors. Strength two orthogonal arrays are extensively used for planning experiments in various ﬁelds and are often expressed as orthogonal design tables. Fractional factorial design: A fraction of a full factorial design (FFD) is a subset of all level-combinations of the factors (Dey and Mukerjee (1999)). there are 81 = 34 possible level-combinations. we consider implementing a subset of all the level-combinations that have a good representation of the complete combinations. otherwise. A carefully selected combination known as the orthogonal array is recommended in the literature and has been widely used in practice. © 2006 by Taylor & Francis Group. q. the number of runs in a full factorial design should be n = k j=1 qj . The number of runs of a full factorial design increases exponentially with the number of factors.6 Design and Modeling for Computer Experiments Factorial design: A factorial design is a set of level-combinations with main purpose of estimating main eﬀects and some interactions of the factors. q. where qj is the number of levels of the factor j and k is the number replications for all the level-combinations. Full factorial design: A design where all the level-combinations of the factors appear equally often is called a full factorial design or a full design. n = kq s . 2. denoted by Ln (q1 × · · · × qs ). When all the factors have the same number of levels. · · · . For example. Orthogonal design tables: An orthogonal design table. is a FFD where any subdesign of n runs and m (m ≤ r) factors is a full design. s.1) D = ⎢ 90 120 9 a ⎥ ⎥ ⎢ ⎢ 90 150 5 b ⎥ ⎥ ⎢ ⎢ 100 90 9 b ⎥ ⎥ ⎢ ⎣ 100 120 5 c ⎦ 100 150 7 a where there are four factors. r). qj at the j th column such that: 1) each entry in each column appears equally often. Therefore. The reader can refer to Hedayat. s Clearly.

and permuting levels of one or more columns of the design matrix. TABLE 1. Example 1 Suppose that there are four factors in a chemical engineering experiment and each chooses three levels as follows: A (the temperature of the reaction): 80o C. a). q. 120 min. B (the time length of the reaction): 90 min. LLC .1 Orthogonal Design L9 (34 ) No 1 2 3 1 1 1 1 2 1 2 2 3 1 3 3 4 2 1 2 5 2 2 3 6 2 3 1 7 3 1 3 8 3 2 1 9 3 3 2 and Related Design 4 A B o 1 80 C 90 min. such as ‘α. it is denoted by Ln (q s )... where the design serves for an experiment of 9 runs and 4 factors. 100o C. s. 7%. 1 90o C 120 min.’ ‘2. 9%. The ﬁrst one is (80o C. There are nine level-combinations chosen.’ and ‘3.Introduction 7 When all the factors have the same number of levels. which is an orthogonal array OA(n. and D by any choices of 4! = 24 ones. each having three levels: ‘1. the columns of L9 (34 ) can be ﬁlled for factors A. Two orthogonal designs are called isomorphic if one can be obtained from the another by exchanging rows and columns. 150 min. 90o C.1 give an orthogonal table L9 (34 ). and the last one is (100o C. a). replace the ‘abstract’ three levels of the four columns by the factor’s levels and obtain a fractional factorial design on the right-hand side of Table 1. C 5% 7% 9% 7% 9% 5% 9% 5% 7% D a b c c a b b c a Isomorphism: Exchanging columns and rows of an orthogonal design table still gives an orthogonal design table. 7 %. In © 2006 by Taylor & Francis Group. 2).. 1 100o C 150 min. B. There are 34 = 81 level-combinations.’ In fact. 2 100o C 90 min. 3 80o C 150 min. 2 90o C 150 min.’ in L9 (34 ). 3 90o C 90 min. If we use L9 (34 ) to arrange a fractional factorial design.’ and ‘3. Such a factor is called a noise factors. 3 100o C 120 min. Note that the factor D (operator) is not of interest. The ﬁrst ﬁve columns of Table 1. D (operator): a.. 90 min. but we have to consider its eﬀect as the operator may inﬂuence the output of the experiment.1. Suppose that the four factors are put in columns in a natural order.’ ‘2. Then. 5 %.’ ‘β. c. b. we can use other three symbols.. 2 80o C 120 min. 150 min. C. C (the alkali percentage): 5%.’ and ‘γ’ to replace ‘1.

Then the mean μj can be decomposed into μj = μ + αj . The model (1.. can be expressed as yijk = μ + αi + βj + (αβ)ij + εijk . we may consider interactions among three factors. (αβ)ij = interaction between A and B at level-combination αi βj under the restrictions: p q αi = 0. For one factor experiments with levels x1 . © 2006 by Taylor & Francis Group. i = 1. (αβ)ij = i=1 j=1 (αβ)ij = 0. the model can be expressed as yij = μ + αj + εij = μj + εij . factor A and factor B. p. k = 1. For an s factor factorial experiment with q1 . Hierarchical ordering principle: Lower order eﬀects are more likely to be important than higher order eﬀects.2) is called an ANOVA model. j = 1. and the total number of main eﬀects and interactions s becomes ( j=1 qj −1). ANOVA models: A factorial plan is based on a statistical model. etc. (1. βj = main eﬀect of factor B at level βj . · · · . four factors. · · · .2) where μ is the overall mean of y. the sparsity principle and hierarchical ordering principle are proposed. (1. q. All the εij ’s are assumed to be independently identically distributed according to N (0. and eﬀects of the same order are equally likely to be important. The total number of these parameters is pq − 1. · · · . · · · . nj . σ 2 ). αi = main eﬀect of factor A at level αi . There are p−1 independent parameters αi s. · · · . and (p−1)(q−1) (αβ)ij s. · · · . where αj is called the main eﬀect of y at xj and they satisfy α1 + · · · + αq = 0. xq . i = 1. main eﬀects are more likely to be important than interactions. μj is the true value of the response y at xj . LLC . and εij is the random error in the ith replication at the jth level xj . Therefore.8 Design and Modeling for Computer Experiments the traditional factorial design two isomorphic factorial designs are considered to be equivalent. An ANOVA model for a two factor experiment. εijk = random error in the kth trial at level-combination αi βj . i=1 p j=1 q βj = 0. q. which exponentially increases as s increases. Sparsity principle: The number of relatively important eﬀects and interactions in a factorial design is small. · · · . K. qs levels. The number of runs in this experiment is n = n1 + · · · + nq . j = 1. q−1 βj s.3) where μ = overall mean.

.4) where gj (·) are pre-speciﬁed or known functions. and the quadratic model. · · · . xks ) can be expressed in a regression model: m m yk = j=1 βj gj (xk1 .6) The matrix G. Under the eﬀect sparsity assumption. . M = n G G is called the information matrix. where ⎤ g1 (x1 ) · · · gm (x1 ) ⎥ ⎢ . supersaturated designs can be eﬀectively used to identify the dominant factors.Introduction 9 Supersaturated design: Supersaturated designs are fractional factorials in which the number of estimated (main or interaction) eﬀects is greater than the number of runs. represents both the data and the 1 model. .4) involves many useful models. The design is called unsaturated if n − 1 > s(q − 1). the linear model. Optimal designs: Suppose the underlying relationship between the response yk and the input factors xk = (xk1 . g1 (xn ) · · · gm (xn ) ⎡ ⎤ β1 ⎢ . ⎥ β = ⎣ . Consider a design of n runs and s factors each having q levels.4) can be expressed in terms of vector and matrix as y = Gβ + . The model (1.7) © 2006 by Taylor & Francis Group. In industrial and scientiﬁc experiments. in many situations only a few factors are believed to have signiﬁcant eﬀects. s y = β0 + i=1 βi xi + 1≤i≤j≤s βij xi xj + ε. (1. LLC . ⎦. called the design matrix.5) (1. especially in their preliminary stages. The model (1. such as. and supersaturated if n − 1 < s(q − 1). . xks ) + εk = j=1 βj gj (xk ) + εk . · · · . and Var(ε) = σ 2 . very often there are a large number of factors to be studied and the run size is limited because of high costs. · · · . Note that functions gj can be nonlinear in x. k = 1. log(xj ). However. The covariance matrix of the least squares estimator is given by σ 2 −1 ˆ M . G=⎣ . saturated if n − 1 = s(q − 1). it can also be written as M(Dn ) to emphasize that it is a function of the design Dn = {xk = (x1 . βm ⎡ (1. xs )}. Cov(β) = n (1. y = β0 + β1 x1 + · · · + βs xs + ε. · · · . . 1/(a+ xi xj ). where G is the transpose of G. n. ⎦. such as exp(−xi ). etc. ε is random error with E(ε) = 0.

• A-optimality: minimize the trace of M−1 .3. denoted by φ(M(Dn )). optimal designs may have a poor performance. ordinary. · · · . the new design may have a good performance. Hence.8) where x consists of input variables. and it is often impossible to obtain an analytic solution for the equations. so we have to ﬁnd a scale criterion of maximization of M. the function f may or may not have an analytic formula. The A-optimality is equivalent to minimizing the sum of the ˆ ˆ variances of β1 . we wish Cov(β) to be as small as possible in a certain sense. The reader can ﬁnd more optimalities and related theory in Atkinson and Donev (1992). y is an output variable. Many authors have proposed many criteria. · · · . xs ) = f (x). Yue and Hickernell (1999) provide some discussion on this direction. etc.1 Computer Experiments Motivations Much scientiﬁc research involves modeling complicated physical phenomena using a mathematical model y = f (x1 . Optimal designs have many attractive properties. (1. Pukelsheim (1993). but they lack robustness to model misspeciﬁcation. LLC . including linear.3 1. such as • D-optimality: maximize the determinant of M. x = (x1 . and ‘eigenvalue’ of a matrix are reviewed in Section A. Engineers or scientists use models to make decisions by implementing the model to predict behavior of systems under diﬀerent input variable settings. 1. · · · . optimal designs are the best under the given optimality. xs ) ∈ T. βm .1. M is an m × m matrix. ‘trace’.10 Design and Modeling for Computer Experiments ˆ Clearly. and T is the input variable space. When the true model is known. and/or partial diﬀerential equations. That suggests maximizing M(Dn ) with respect to Dn . the © 2006 by Taylor & Francis Group. • E-optimality: minimize the largest eigenvalue of M−1 . D-optimality is equivalent to minimizing the generalized variance and also is equivalent to ˆ ˆ minimizing the volume of the conﬁdence ellipsoid of (β − β) M(β − β) ≤ a2 for any a2 > 0. The concepts of ‘determinant’. However.8) may be regarded as a solution of a set of equations. In multivariate statistical analysis the determinant of the covariance matrix is called the generalized variance. When the underlying model is unknown. nonlinear. and we can then ﬁnd a φ-optimal design that maximizes φ(M) over the design space. If one can combine eﬃciency of the optimal design and robustness of the uniform design. Model (1.

Among them. Thus. such as: • Computer experiments often involve larger numbers of variables compared to those of typical physical experiments. • Computer experiments are deterministic.Introduction 11 model is a vital element in scientiﬁc investigation and engineering design.1) by simulating the behavior of the device/process on a computer. Buck. Figure 1. © 2006 by Taylor & Francis Group. Mitchell and Wynn (1989). Welch. LLC .1 Computer experiments. Design of computer experiments has received a great deal of attention in the past decades. Hundreds of papers have been published on this topic. It is worthwhile to note the unique characteristics of computer experiments compared to physical experiments. Bates. xs system y = f (x) . which requires special design of experiment approach.y output 6 . Scientists and engineers make use of computer simulations to explore the complex relationship between inputs and outputs. . The availability of a model is often crucial in many situations because physical experiments to fully understand the relationship between response y and inputs xj ’s can be too expensive or time consuming to conduct. computer models become very important for investigating complicated physical phenomena. and Koehler and Owen (1996) give comprehensive reviews on this rapidly developing area and provide references therein. The distinct nature of computer experiments poses a unique challenge. samples with the same input setting will produce identical outputs. input x1 . • Larger experiment domain or design space is frequently employed to explore complicated nonlinear functions. One of the goals of computer experiments is ﬁrst to ﬁnd an approximate model that is much simpler than the true (but complicated) model (cf. Riccomagno and Wynn (1996). computer experiments necessitate diﬀerent approaches from those of physical experiments. Sacks. That is. To be better suited to the aforementioned characteristics.metamodel y = g(x) ˆ FIGURE 1. .

10) where Vol(T ) is the volume of T . I(g|T ) = 1 Vol(T ) g(x)dx. we may easily construct various plots of y against each input factor. global minimum/maximum of the response is required. one may be interested in predicting the value of y at untried inputs. Contour plots of the output variable against each pair of the factors can be used to examine whether there is interaction between the input factors. many applications can be done based on g. x∗ ) s 1 ∈ T such that g(x∗ . · · · . the metamodel g has continuous derivatives) or numerically by some computer programs. one may want to compute the overall mean of y over a region T . xs ). but a direct computation cannot be easily done. Such an approximate model is also called a “model of the model ” or “metamodel ” (Kleijnen (1987)).11) Very often. This will help us examining the direction of main eﬀects of each factor. one of the goals of computer experiment is to seek a suitable approximate model y = g(x1 . (1. and yields insight into the relationships between y and x.2 Metamodels As previously discussed. As the metamodel g is easier to compute and has an analytic formula. x∗ ) = min g(x1 . The above optimization frequently involves © 2006 by Taylor & Francis Group. The solution can be approximately found by ﬁnding a point x∗ = (x∗ .12 Design and Modeling for Computer Experiments 1. · · · . Prediction and Optimization Given a sample set from design of experiment. · · · . a metamodel can provide: 1. For example. x ∈ T. LLC . Preliminary study and visualization Graphical tools are very useful for visualizing the behavior of the function g. T (1. and give us a better understanding of the relationship between the input and output variables. 2. a metamodel can be used as an estimator. · · · . xs ). for example.3. T (1. 1 s x∈T The minimum/maximum point x∗ can be found analytically (for instance. 3-D plots or animation can intuitively show us how many local minimum/maximum of the true model may exist. Since g has an expressive form. E(y|T ) = 1 Vol(T ) f (x)dx. In this case.9) which is close to the real one but faster to run.

LLC . p(y) is the probability density function of y. Sensitivity Analysis Sensitivity analysis. multidimensional integration is involved. y. · · · . Sensitivity calculation such as quantiﬁcation of the proportion of variation of outputs. reliability and risk assessment applications often exercise computer models subjected to assumed stochastic inputs to assess the probability that output will meet a certain criterion (Haldar and Mahadevan (2000)). explained by variations of input variables xi (McKay (1995)) CRi = Var{E(y|xi )} Var(y) for i = 1. s. and p(x) is the joint density of x. where x are stochastic input variables. Probabilistic Analysis In many applications. Analysis of variance (ANOVA) and its generalization for nonlinear models such as global sensitivity index frequently require integration in high dimensional space using Monte Carlo techniques. In probabilistic analysis. f (x) is a deterministic function of the system model. f (x). one often is interested in studying the eﬀect of input uncertainty on the variability of the output variable.Introduction 13 multiple responses to be optimized and constraints to be satisﬁed. But a combination of nonlinear programming techniques and metamodel can handle this problem easily. the input x is assumed to follow random distributions with an interest to predict the probability of y to exceed a certain target value. can be easily done using the metamodel. ∞ R = P (y > y0 ) = y0 p(y)dy. there is rarely a closed form solution and it is also © 2006 by Taylor & Francis Group. or R = P {f (x) > y0 } = f (x)>y0 p(x)dx. Here. When direct computation of y is expensive. This kind of constrained optimization often cannot be done by using response surface methodology alone. y0 . In particular. the availability of a cheap-to-compute metamodel becomes an attractive numerical choice. which attempts to quantify and rank the eﬀects of input variables. 4. is one of the main interests in decision-making processes such as engineering design. Due to the complexities of the model. 3. one would like to quantify the leverage of each input variable to the output. Since the number of random variables in practical applications is usually large.

5. reducing the variation of a performance) by optimizing the so-called controllable input variables.. or Saddlepoint Approximation (Du and Sudjianto (2004)) are frequently applied. c s 1 Allen. deterministic variables for which the values are optimized to minimize the variability of response variable. and usage. the input variables.. x.e. Agrawal and Vora (1998). the so-called noise factors (Phadke (1989). Mourelatos and Meernik (2002)). to satisfy speciﬁed © 2006 by Taylor & Francis Group. i. It emphasizes the achievement of performance robustness (i. consist of three types of variables: • Control factors. xc ∈ T . x∗ ) ∈ T such that (Chen. The RBD approach focuses on searching x∗ = (x∗ . In robust design.e. Robust design problems can be formulated as a constrained variance minimization problem to ﬁnd x∗ = (x∗ . the metamodel becomes a popular choice for estimating probability. environment. Mahadevan. xs ∈ T . Wang. Taguchi (1993). i ∈ 1. f (x). Lin and Fang (1995). Therefore. Robust Design and Reliability-Based Design Robust design is a method for improving the quality of a product through minimizing the eﬀect of input variation without eliminating the source of variation.14 Design and Modeling for Computer Experiments often diﬃcult to evaluate the probability with numerical integration methods. Sorensen and Pourhassan (1993)) V ar{f (x∗ . while maintaining other constraints. Zou. xn . Taguchi (1986). xn . Jin. y. • Input signals. c Related to robust design is an approach called reliability-based design (RBD). Taam and Ye (2002). When function evaluations. d. · · · . ˆ p(x)dx R = P {g(x) > y0 } = g(x)>y0 (Sudjianto. xs )} = Y. yi . x∗ ) ∈ s 1 T such that the response variable yobj = fobj (x) ∈ R is optimized. Du and Chen (2003). deterministic variable for which the response must be optimized within a speciﬁed range of these variables (Nair et al. xn . Mistree and Tsui (1996). xs )} c xc ∈T subject to E{f (x∗ .g. · · · . Nair. Monte Carlo integration (Melchers (1999)) and its approximations such as the ﬁrst order reliability method (Hasofer and Lind (1974)). · · · . are computationally expensive. LLC . xs )} = min V ar{f (xc . (2002)). Parkinson. e. • Noise factors. Juneja. Second Order Reliability Method (Breitung (1984)). Wu and Hamada (2000)).. xn ∈ T stochastic variables representing uncontrollable sources of variations such as manufacturing variation.

Sudjianto and Chen (2004) presented a uniﬁed framework for both robust design and reliability-based design for optimization under uncertainty using inverse probability approach. a natural idea is to put points of Dn to be uniformly scattered on T . Considering the previously mentioned requirements of computer experiments. Choi and Young (1999). no preconceived rigid structural model is assumed during the model building).. Grandhi and Hopkins (1995).Introduction 15 minimum probabilistic levels. Because of the space-ﬁlling nature of the experimental design. the statistical approach for computer experiments involves two parts: 1. for which the availability of metamodels. denoted by Dn . g(x). Part III of this book introduces a variety of modeling techniques popular for computer experiments as well as sensitivity analysis techniques. 2. Note that both robust design and RBD require repetitive evaluations of system performances. Wang..e. one may need to employ a more sophisticated ANOVA-like global sensitivity analysis to interpret the metamodel.e. αi (Tu. Du. Wu and Wang (1998)). In summary. © 2006 by Taylor & Francis Group. Because of the complexity of highly adaptive models. Part II of this book provide in-depth discussion of many space-ﬁlling designs best suited for computer experiments. an adaptive model which can represent non-linearity as well as provide good prediction capability at untried points is very important. i. straightforward model interpretations are often unavailable. in the input space T so that an approximate model can be ‘best’ constructed by modeling techniques based on the data set that is formed by Dn and the output on Dn . or uniform design in the literature. d. · · · . Modeling: Fitting highly adaptive models by various modeling techniques. a design matrix. becomes crucial when the original models are computationally expensive. fobj (x∗) = min fobj (x) x∈T subject to P r{fi (x)} ≥ αi . LLC . which typically have the form of “model-free” non-parametric regression (i. Design: We wish to ﬁnd a set of n points. f (x). i ∈ 1. Such a design is called a space-ﬁlling design. To alleviate this problem.

3 Computer experiments in engineering The extensive use of models in engineering design practice is often necessitated by the fact that data cannot be obtained from physical tests of the system prior to its construction. FIGURE 1. We want to choose a set of experimental points that are presented in the ﬁgure and to calculate the corresponding output.3. In a complex product development environment. For example. yet the system cannot be built until the decisions are made.g. the crankshaft and engine block designs (see Figure 1. Reversing a decision at a later time can have very expensive consequences both ﬁnancially as well as in loss of valuable product development time. in internal combustion engine design. manufacturing facilities.. 1. There is a surface that is too complicated for applications. reliance on models is crucial when a series of design decisions must be made upfront prior to the availability of a physical prototype. design of interfacing systems).3) de- © 2006 by Taylor & Francis Group. LLC .16 Design and Modeling for Computer Experiments Figure 1.2 demonstrates the situation of computer experiments. because they determine the architecture of the product and aﬀect other downstream design decisions (e.2 Space-ﬁlling design.

Kawai. and droplet accumulation. By understanding © 2006 by Taylor & Francis Group. In this situation. FIGURE 1. Yang. oil droplet size distribution. Thus. some decisions about good design need to be made before the eﬀects of these decisions can be tested. in a running engine under various engine speeds and loads. For example. LLC . harmful. For example. (1997)) to make structural decisions to optimize durability and vibration characteristics prior to the availability of a physical prototype. a computational ﬂuid dynamics (CFD) model (see Figure 1. time consuming.5) is used to study the oil mist separator system in the internal combustion engine (Satoh. roof crush. and noise vibration and harshness to help design vehicle structures to absorb crash energy through structure deformation and impact force attenuation. such as gas ﬂow rate. is even prohibitive. or in some situations. Gu. Using this model. Ishikawa and Matsuoka (2000)) to improve its eﬃciency by reducing engine oil consumption and emission eﬀects. the engine designer frequently uses a crank and block interaction dynamic model (Loibnegger et al. to the oil mist velocity.3 Engine crankshaft and block interaction ﬁnite element and dynamic model. In other cases. the reliance on the model is necessitated because physical testing is expensive. oil mist mass ﬂow rate. The decisions about the crankshaft and engine block are also major determinants of the expense and development time of manufacturing processes. Models are often used by engineers to gain some insight into certain phenomena which may be lacking from physical experiments due to measurement system limitations and practicality. engineers study the relationships of various parameters. Tho and Sobieszczanski-Sobieski (2001) performed vehicle crash design optimization using a ﬁnite element model to simulate frontal crashes.Introduction 17 termine the architecture of an engine which other components designs are dependent upon. pressure loss.

engineers create oil separation design concepts and optimize their design parameters to achieve maximum oil separation eﬃciency. Yang et al. Recently.4 Vehicle crash model. Hoﬀman. For this purpose. the computation demand becomes even greater because both probabilistic computation (e.18 Design and Modeling for Computer Experiments FIGURE 1. and output postprocessing. since various design conﬁgurations must be iteratively evaluated. Du and Chen (2004).. (2001). Sudjianto and Chen (2004)) and have gained signiﬁcant adoption in industrial practices (e.5 Oil separator CFD model. Giunta. The computational requirement increases dramatically when the simulation models are used for the purpose of design optimization or probabilistic analysis.g.. for which a “double loop” © 2006 by Taylor & Francis Group. the behavior of the systems. Sudjianto.g. robust design and probabilistic-based design optimization approaches for engineering design have been proposed (see. Monte Carlo simulation) and optimization must be performed simultaneously. their eﬀectiveness to support timely design decisions in fast-paced product development is often hindered by their excessive requirements for model preparation. Pressure loss distribution Oil droplet accumulation While sophisticated computer models have become ubiquitous tools to investigate complicated physical phenomena. (2004)). computation. Booker. Koch and Yang (2000). Simpson. Wu and Wang (1998). Du et al. Du. Du and Stout (2003). LLC . Kalagnanam and Diwekar (1997). Velocity map FIGURE 1. for example. Ghosh.

6 Double loop procedure in probabilistic design. adapted from Li and Sudjianto (2005). nonlinearity in the true model). 19 FIGURE 1. Seraﬁni. The outer loop is the optimization process. Hoﬀman et al.. (2003). Torczon and Trosset (1999). and Du et al. In this approach.. Frank. In many industrial experiments. Because of the computational requirements and sophistication of the true models and the need for accurate metamodels.. the factorial design has been © 2006 by Taylor & Francis Group.g. sophisticated statistical modeling technique is required to analyze and represent the sample data into accurate and fast-to-compute metamodels.e.6. Accordingly.e. number of runs in design of experiment) and at the same time capture maximum information (i.Introduction procedure may be required. LLC . for example. an approximation model that is much simpler and faster to compute than the true model (see Figure 1. (2004)). Dennis. Booker. This computational challenge makes obvious the importance of a cheap-to-compute metamodel (Kleijnen (1987)) as a surrogate to a computationally expensive engineering simulation model. while the inner loop is probability calculations for design objective and design constraints. statistical techniques such as design of experiment play very important roles in the creation of metamodels such as to enable a minimum number of samples (i. This cheap-to-compute metamodel approach has become a popular choice in many engineering applications (e. as illustrated in Figure 1.1) is employed during engineering analysis and optimization.

therefore. Some of them will be used as examples throughout the book. the sparsity of eﬀects principle is not always valid. can be invoked whereby the response is dominated by few eﬀects and interactions (Montgomery (2001). by which the experimenter can estimate main eﬀects and interactions of the factors. the compliance in the components. oil drain. Therefore. Example 2 Engine block and head joint sealing assembly (Chen. involving multiple components (block and head structures. computer simulation is commonly used in the design process.20 Design and Modeling for Computer Experiments widely used (Montgomery (2001). Tripathy and Novak (2002)) The head and block joint sealing assembly is one of the most crucial and fundamental structural designs in the automotive internal combustion engine. 1.2. and fasteners) with complicated geometry to maintain proper sealing of combustion. and coolant. Wu and Hamada (2000)). The study of the ﬂow rate of water discussed in Section 1. When there are many factors. A similar diﬃculty will be encountered in the use of optimal designs (Kiefer (1959) and Atkinson and Donev (1992)). Zwick. The number of runs of a full factorial experiment increases exponentially with the number of factors. To support the decision. the experimenter may not know the underlying model well and may have diﬃculty guessing which interactions are signiﬁcant and which interactions can be ignored. gasket. the design and modeling techniques in computer experiments can also be applied in many factorial plans where the underlying model is unknown or partially unknown. However. and fasteners) cannot be analyzed separately because of strong assembly interaction eﬀects. Design decisions for this system must be made upfront in the product development stage prior to the availability of physical prototype. To best simulate the engine assembly process and its operation.2). Furthermore. The most basic experimental design is a full factorial design (see Section 1. The selection of design parameter settings in this assembly (block and head structures.2) is used when many factors are considered and experiments are costly in terms of budget or time (Dey and Mukerjee (1999)). a space-ﬁlling design is a useful alternative and various modeling techniques can be evaluated to get the best metamodel. a ﬁnite element model is used to capture the eﬀects of three-dimensional part geometry. fractional factorial design (see Section 1. the sparsity of eﬀects principle. The design of the engine block and head joint sealing assembly is very complex. non-linear gasket material © 2006 by Taylor & Francis Group. gasket. In this section we present several examples discussed in the literature. high pressure oil. In this case. LLC . Wu and Hamada (2000)).8 will give an illustration.4 Examples of Computer Experiments Computer experiments are required in many ﬁelds. introduced in Section 1.

The number of experimental runs must be small due to simulation setup complexity and excessive computing requirements. gasket. The computer model simulates the assembly process (e.g..7 Finite element model of head and block joint sealing assembly. We will continue this example in Section 5. The manufacturing variables include process variations and are referred to as noise factors. one is also interested in ﬁnding an optimal design setting that minimizes the gap lift sensitivity to manufacturing variation. thermal and cylinder pressure cyclical loads due to the combustion process). and fasteners. LLC . head bolt run down) as well as various engine operating conditions (e. and contact interface among the block. The factors investigated in the computer simulation model consist of several design and manufacturing variables. Example 3 Robot arm (An and Owen (2001)) The movement trajectory of a robot arm is frequently used as an illustrative example in neural network © 2006 by Taylor & Francis Group. An example of a ﬁnite element model for this system is shown in Figure 1. FIGURE 1.Introduction 21 properties. Additionally.2 in which a list of design factors and noise factors will be given. head. The objective of the design in this example is to optimize the design variables to minimize the “gap lift” of the assembly..g.7.

Due to the complexity of the system. The shoulder of the arm is ﬁxed at the origin in the (u. One might be interested in ﬁnding a closed form for an approximate model which would be much simpler than the original one (1. Winker and Zhang (2000)) A one-oﬀ ship-based ﬂight detector launch can be modeled as a dynamical system. m. are of interest. Schiller and Welch (1989)) In this example. · · · . 2π] and Lj ∈ [0. the experimenter wishes to ﬁnd a simple model to replace solving © 2006 by Taylor & Francis Group. t) = gj (η. L1 . the direction of launching support relative to the base system. Because of the high cost of the experiment. 2) two degrees of freedom for the circumgyration launching system. 4) choice-permitted launching position. yielding concentrations of ﬁve chemical species at a reaction time of 7 × 10−4 seconds. (1. θ3 . x. Ho (2001) gave an approximation model y = g(θ1 . plus other factors. For k = 2. 3) the guide launching.13).12) √ and the response y is the distance y = u2 + v 2 from the end of the arm to the origin expressed as a function of 2m variables θj ∈ [0. L2 . the rock of the ship. j = 1.13) where x is a set of rate constants. θ2 . · · · .13) can be obtained numerically for any input x by solving 11 diﬀerential equations. Another case study of a computer experiment on the kinetics of a reversible chemical reaction was discussed by Xu. L3 ) for the robot arm with 3 segments. LLC . (1. and 6) large-range direction of the launching support composed with the no-limitation azimuth and wide-range elevation angle. 11.22 Design and Modeling for Computer Experiments literature. ω1 and ω2 . Two responses. · · · . j = 1. 1]. Example 4 Chemical kinetics (Miller and Frenklach (1983) and Sacks. Lin. the corresponding ω1 and ω2 can be calculated by solving a set of diﬀerential equations. 5) two-period launching procedure. t). The end of the robot arm is at u= v= m j=1 Lj cos( m j=1 Lj sin( j k=1 θk ). m. chemical kinetics is modeled by a linear system of 11 diﬀerential equations: hj (x. The segments of this arm have lengths Lj . For a given set of values of the above six groups of parameters. the experimenter considered only two key factors: azimuth angle and pitching angle. the inputs to the system. Liang and Fang (2000). The ﬁrst segment is at angle θ1 with respect to the horizontal coordinate axis of the plane. segment k makes angle θk with segment k − 1. v)-plane. a series of coordinate systems are set up and involve many characteristics: 1) six degrees of freedom for ship rock. Example 5 Dynamical system (Fang. For this problem. A solution to (1. The launching parameters are determined by a combination of several factors: the motion of wave and wind. Consider a robot arm with m segments. the relative motion between the detector and launching support. j k=1 θk ).

Lin. and Lead (Pb). Winker and Zhang (2000) applied uniform design (see Chapter 3) to this problem and gave a detailed discussion on ﬁnding a suitable polynomial metamodel. · · · . R4 of R1 .8 shows the circuit of a night detector. we consider a regression model y = f (x) + ε. Here x denotes various circuit parameters. 1. The output current y is given by y= R3 + 2R4 R1 Vcc − Vbc . · · · . Figure 1. 20 (ppm).8. Example 7 Environmental experiment (Fang and Wang (1994) and Li (2002)) To study how environmental pollutants aﬀect human health.14) In a robust design study. an experiment was conducted. (R1 + R2 )R3 2R3 R4 (1. To ﬁt the obtained data. 0. Zinc (Zn). (1. R4 .3. 2. and y is the measurement of the circuit performance. Copper (Cu). 0. An experiment using uniform design was conducted at the Department of Biology. such as output voltage. 5. Chromium (Cr). · · · . Nickel (Ni). 10.4.15) where ε is random error with mean zero and an unknown variance. · · · . one would like to study the variation of the output current due to the eﬀect of external and internal noises (Vbe .01. 8. Schiller and Welch (1989) and Lo. Environmentalists believe that contents of some metal elements in water would directly aﬀect human health. The experimenters considered the mortality of cells due to high density of the chemical elements. Zhang and Han (2000)) The design of analog integrated circuit behaviors is commonly done using computer simulation. R3 . To address this issue.Introduction 23 the set of diﬀerential equations. The purpose of this study was to understand the association between the mortality of some kind of cell of mice and contents of six metals: Cadmium (Cd). © 2006 by Taylor & Francis Group. These adjustable value resistors are called controllable factors. Vbe . we will use this example to illustrate the strategy of linear regression analysis. 0.05. 18. one wishes to ﬁnd the best combination of R1 .1. The aim of night detector design is that the circuit current remains stable around a target value (180 mA) via adjustment of the resistances R1 . the investigator took 17 levels for the content of each metal: 0. In Section A. · · · . Fang. and Vcc . R4 . 0. Hong Kong Baptist University. R2 . respectively) and the biases R1 . LLC . R4 such that the output response variability is minimized under various levels of variability due to tolerance variations of R1 . Vcc . 12. Example 6 Integrated circuit (Sacks. Once the values of the controllable factors are set. This example also shows applications of design and modeling techniques for computer experiments to industrial experiments with an unknown model. such as transistor characteristics. and R4 are resistors inside the circuit. 16. R4 . where R1 . 0.2. and y is the output current.

LLC . has been adopted by many authors to search for the best sampling (i. considering all the deviations in (1. where xi ∈ C s .16) is as small as possible for all x ∈ C s . where y = f (x) is the true model and y = g(x) is a metamodel. (1. xn }. such that the deviation Dev(x.e. design of experiment) scenario. · · · . which aims to ﬁnd the best estimator of the overall mean. 1]s . consider an experimental region in an s-dimensional unit cube C s = [0.24 Design and Modeling for Computer Experiments FIGURE 1. g) = f (x) − g(x).18) © 2006 by Taylor & Francis Group. Therefore.16) becomes too diﬃcult. the overall mean model. 1.17) The sample mean method suggests using y (Dn ) = ¯ 1 n n f (xi ) i=1 (1. For a given number of runs. The most preliminary aim of the design is to obtain the best estimator of the overall mean of y E(y) = Cs f (x)dx. n. one is interested in how to ﬁnd a good design Dn = {x1 . Because there are many metamodel alternatives.8 Night detector. (1..5 Space-Filling Designs Without loss of generality. f.

f (x2 )). Let x1 . One wants to ﬁnd a design such that the estimator y (Dn ) is optimal in a certain sense. · · · . xn be independent samples of the uniform distribution on C s . xs ) is monotonic in each variable. denoted by U (C s ).19) Obviously. Thus. to randomly choose x1 . ¯ McKay. (1979) showed that this covariance is negative whenever f (x1 . Beckman and Conover (1979) proposed the so-called Latin hypercube sampling (LHS). where x is uniformly distributed on C s . The star discrepancy is a measure of uniformity used in quasi-Monte Carlo methods and is just the Kolmogorov-Smirnov statistic in goodness-of-ﬁt tests.96 ¯ Var(f (x))/n. and V (f ) is the total variation of the function f in the sense of Hardy and Krause (see Niederreiter (1992) and Hua and Wang (1981)). LHS has been widely used in computer experiments and many modiﬁcations of LHS have been proposed. From the central limit theorem with an approximate probability 0.20) will be as small as possible. LLC . Unfortunately. ¯ (1. the variance Var(f (x))/n is too large for many case studies. The details of the above concepts will be further explained in Section 3. It is easy to see that in this case the sample mean is unbiased with variance Var(f (x))/n.2.Introduction 25 as an estimator of the overall mean E(y). Var(¯(Dn )) < Var(f (x))/n if and only if the covariance of f (x1 ) y and f (x2 ) is negative. There are two kinds ¯ of approaches to assessing a design Dn : (A) Stochastic Approach: From the statistical point of view we want to ﬁnd ¯ a design Dn such that the sample mean y (Dn ) is an unbiased or asymptotically unbiased estimator of E(y) and has the smallest possible variance. denoted by x ∼ U (C s ). not depending on g. (B) Deterministic Approach: From a deterministic point of view. many authors proposed diﬀerent ways to lower the variance of y (Dn ). n n (1. McKay et al.21) where D(Dn ) is the star discrepancy of Dn . · · · . the better uniformity the set of points has. The lower the star discrepancy. that is. The Koksma-Hlawka inequality in quasi-Monte Carlo methods gives an upper bound of the diﬀ-mean as follows diﬀ-mean = |E(y) − y (Dn )| ≤ V (f )D(Dn ). We shall introduce LHS and its versions in Chapter 2. In this case Var(¯(Dn )) = y n−1 1 Var(f (x)) + Cov(f (x1 ). xn such that they are dependent and have the same marginal distribution.95 we have diﬀ-mean = |E(y) − y (Dn )| ≤ 1. one wishes to ﬁnd a sampling scenario so that the diﬀerence diﬀ-mean = |E(y) − y (Dn )| ¯ (1. The Koksma-Hlawka © 2006 by Taylor & Francis Group. · · · .1.

It has been demonstrated that these space-ﬁlling designs have a good performance not only for estimation of the overall mean. The LHS and UD were motivated by the overall mean model. Sacks and Ylvisaker (1984). to ﬁnd a set of n points uniformly scattered on C s .6 Modeling Techniques This section presents a brief overview of popular metamodeling methods in the literature. Sack and Ylvisaker (1985). Chapters 2 and 3 will give a detailed introduction to these designs and their constructions. The Koksma-Hlawka inequality also indicates that the uniform design is robust against model speciﬁcation. for example. It was proposed by Fang and Wang (Fang (1980) and Wang and Fang (1981)) based on the three big projects in system engineering. but it provides a simple way to develop methodology and theory. © 2006 by Taylor & Francis Group. Surprisingly. if two models y = g1 (x) and y = g2 (x) have the same variation V (g1 ) = V (g2 ). Of course. For example.” Both LHS and UD belong to frequentist experimental designs. the overall mean model is far not enough in practice. and optimal Latin hypercube designs are Bayesian designs according to their philosophy. The latter is called a uniform design (UD for short). a uniform design may have the same upper bound of the diﬀ-mean for these two models. The purpose of the modeling is to ﬁnd some useful metamodels.4. plays a crucial role in analysis of computer experiments. possibly assessing the uncertainties associated with the modeling process and with the outcome of the model itself (see Saltelli et al. and Chang and Notz (1996). UD and their modiﬁcations are good space-ﬁlling designs. but also for ﬁnding a good approximate model. one based on Bayesian statistics and a frequentist one based on sampling techniques. Good modeling practice requires that the modeler provides an evaluation of the conﬁdence in the model. that is. Chapter 5 is devoted to introducing various modeling approaches in details accompanied by illustrations using case study examples to add clarity and practical applications. (2000)). the LHS. LLC . For a given true model y = f (x) there are many possible metamodels. Many authors have discussed model robust designs. y = f (x). Koehler and Owen (1996) proposed a diﬀerent way to classify approaches to computer experiments:“There are two main statistical approaches to computer experiments. 1. The construction of an accurate and simple metamodel y = g(x) to approximate the true model. The argument for the latter will be further discussed in Section 3.26 Design and Modeling for Computer Experiments inequality suggests minimizing the star discrepancy D(Dn ) over all designs of n runs on C s . Very often it is diﬃcult to say which metamodel is the best.

A metamodel g usually can be written in the following general form: g(x) = β1 B1 (x) + β2 B2 (x) + · · · . (x − κ1 )p . for example. Stone. Let {B1 (x).2 for construction of orthogonal polynomial bases. For example. there are many local minimums/maximums.22) g(x) = β0 + i=1 βi xi + i=1 j=i βij xi xj . xp . + + (1.22). Kooperberg and Truong (1997)). In the presence of high-order polynomials. · · · . has been recommended ¯ in the literature. so splines can be employed to construct a set of basis for expansion in (1. rs are non-negative integers. . See Section 5. LLC . so one needs high-degree polynomials to approximate the true model. For a univariate x-variable. where βj ’s are unknown coeﬃcients to be estimated. When the domain T is large and the true model is more complicated. B2 (x). ¯ ¯ (1. Polynomial models have traditionally enjoyed popularity in many modeling contexts including computer experiments. one usually centralizes the input variables before building polynomial models. (x − κK )p . In such situations. . spline approaches mainly include smoothing splines (Wahba (1990)) and regression splines (see. .e. s s s g(x) = β0 + i=1 βi (xi − xi ) + ¯ i=1 j=i βij (xi − xi )(xj − xj ). To make numerical computation stable. . the power spline basis has the following general form: 1. for example. · · · . orthogonal polynomial models are recommended to overcome this diﬃculty of multi-collinearity. The number s 1 of polynomial basis functions dramatically increases with the number of input variables and the degree of polynomial. Polynomial metamodels are usually used to discover the overall trend of the true model. Hansen. (1. there are high correlations among regressors. The term response surfaces (Myers and Montgomery (1995) and Morris and Mitchell (1995)) in the literature refers to this second-order polynomial model. The models employ a polynomial basis xr1 · · · xrs where r1 .24) where xi is the sample mean of the i-component of x. Low-order polynomials such as the second-order polynomial model.. In the literature. x. } be a set of basis functions deﬁned on the experimental domain T . In such cases the collinearity becomes a serious problem.Introduction 27 Most metamodels can be represented using a linear combination of a set of speciﬁc bases. s s s (1. · · · . i.23) are the most popular polynomial metamodels for computer experiment modeling.25) © 2006 by Taylor & Francis Group. the centered quadratic model. x2 . the polynomial basis may cause a collinearity problem.

however. people often use the following Fourier metamodel (see Riccomango. Similar to polynomial basis. They also may be useful when input variables are low-dimensional and the function can be adequately represented using low-order basis functions. the Fourier basis is commonly used to model responses with a periodic function. a+ = aI(a > 0).27) where μ is the overall mean of y(x). Metamodels built on polynomial bases. spline bases. Thus. and z(x) is a Gaussian process with mean E{z(x)} = 0 and covariance function Cov(z(xi ). these extensions may be diﬃcult to implement because the number of terms in a set of multivariate basis functions exponentially increases as the dimension of input variables increases. techniques such as Kriging models. and Antoniadis and Oppenheim (1995) for systematic introductions to wavelets. cos(2kπx). Daubechies (1992). · · · . the true model f (x) in Example 6 is appropriately modeled using Fourier regression. sin(2πx).. As an alternative. In practice. and R is a correlation function with a pre-speciﬁed functional form and some unknown parameters. A Kriging model assumes that y(x) = μ + z(x). z(xj )) = σ 2 R(xi . cos(2πx). sin(2kπx).26) Wavelets bases have also been proposed in the literature to improve the Fourier approximation. and local polynomial models may provide a more natural construct to deal with multivariate basis functions. The tensor product method has been widely employed to extend these basis functions from univariate inputs to multivariate inputs. Schwabe and Wynn (1997)): s g(x) = β0 + i=1 {[αi cos(2πxi ) + βi sin(2πxi )] + [αi cos(4πxi ) + βi sin(4πxi )] (m) (2) (2) + · · · + [αi cos(2mπxi ) + βi (m) sin(2mπxi )]}. Fourier bases.3 explains the spline model in more detail. and wavelets bases are powerful when the input variable is one-dimensional. i. For example. and a+ stands for the positive part of a. · · · . Section 5. In practice. xj ). the number of terms in full multivariate Fourier bases dramatically increases as the dimension of x increases. (1. A typical © 2006 by Taylor & Francis Group. Multivariate spline basis may be constructed from the univariate spline basis using the tensor product approach.28 Design and Modeling for Computer Experiments where κ1 . See Chui (1992). especially in cases where the function being approximated may not be smooth. by using the tensor product method (see Chapter 5). κK are a set of selected knots (locations at which the (p + 1)-th derivative of g is allowed to be discontinuous). · · · . neural networks.e.28) where σ 2 is the unknown variance of z(x). LLC . (1. (1. Multivariate Fourier bases usually are constructed from univariate Fourier basis: 1.

Mitchell.5 for further discussions. xn } consist of a design. · · · . of the basis functions are estimated using “least square criteria” © 2006 by Taylor & Francis Group. βj . Chapter 5 gives a thorough discussion of the Kriging approach. (1993) demonstrated how to use a Bayesian Kriging method to create computer models that can provide both the response and its ﬁrst partial derivatives. Furthermore. The “weight” parameters. · · · . and y1 . Morris and Ylvisaker (1991) and Morris. xi ) (1. Mitchell and Ylvisaker (1993). Morris et al. the Gaussian Kriging approach admits a Bayesian interpretation. and output layers with nonlinear input transformation as follows: d y= ˆ j=1 zj (vj )βj + β0 . wji . in the linear combination and the regression coeﬃcient. xi ) to be a basis function.Introduction choice for the correlation function is s 29 r(x1 . it can be shown that the resulting metamodel of the Kriging approach is the best linear unbiased predictor (see Section 5. The network consists of input. zj (vj ) = 1 1 + exp(−λvj ) or zj (vj ) = tanh(λvj ). One of the advantages of the Kriging approach is that it constructs the basis directly using the correlation function. Let {x1 . The resulting metamodels of the Kriging approach can be written as n g(x) = i=1 βi r(x.1). in the following forms. yn are their associated outputs. where vj ’s are linear combinations of the input variables p vj = i=1 wji xi + wj0 . x2 ) = exp{− k=1 θi (xk1 − xk2 )2 }. where βj is the “regression coeﬃcient” of the j-th basis functions zj (vj ). For instance. Under certain conditions. Bayesian interpolation can be beneﬁcial in that it easily incorporates auxiliary information in some situations. (1.) The multilayer perceptron network (MLP) is the most ubiquitous neural network model.22) regarding r(x. Bayesian interpolation was proposed to model computer experiments by Currin. (See Section 5.30) which is of the form (1. hidden.4. LLC .29) where θi ’s are unknown.

yn are their associated outputs.6. an intuitive estimator for the regression function f (x) is the running local average. An improved version of the local average is the locally weighted average: n g(x) = i=1 wi (x)yi . · · · . we will introduce the fundamental ideas of local modeling and © 2006 by Taylor & Francis Group. There are numerous training algorithms in the literature (e. the density function of normal distribution.29) where θi = θ. Fan (1992) and Fan and Gijbels (1996) give a systematic account in the context of nonparametric regression. xn .30) with the correlation function (1. the weight function depends on x1 . and this approach is often called a back-propagation network. and y1 . The reader can refer to Section 5. · · · . Radial basis function methods have been used for neural network modeling and are closely related to the Kriging approach. the resulting metamodel has the same form as that in (1.7.” The resulting metamodel by local polynomial models can be represented in the form of (1. n. · · · .g. Haykin (1998).. The idea of local polynomial regression is that a datum point closer to x carries more information about the value of f (x) than one which is far away. i. xn } consist of a design. Of course. Local polynomial modeling has been applied for nonparametric regression in statistical literature for many years. i = 1. Taking the kernel function to be the Guassian kernel function. ˆ k=1 The network “training” process (i. Therefore. a radial basis function has the following form: K( x − xi /θ).31) where wi (x). n (1. The application of this model for computer experiment will be discussed in Section 5. Among them is a learning algorithm known as the “Backpropagation” (Rumelhart. · · · .6 for a systematic discussion on neural network modeling. In Section 5. In general.. LLC . where K(·) is a kernel function and θ is a smoothing parameter.e..e. Hinton and Williams (1986)). · · · . i = 1. Hassoun (1995)). parameter estimation) involves nonlinear optimization for the above least square objective function.30 Design and Modeling for Computer Experiments between the model ﬁt and the true values of the response variables E= 1 2 n (yk − yk )2 . n are weights with i=1 wi (x) = 1. Let {x1 .22). It further depends on a smoothing parameter which relates to how we deﬁne “close” or “far. The resulting metamodel is as follows: n g(x) = i=1 βi K( x − xi /θ).

model structures. 2001) and Saltelli. Understanding the relationships between inputs and output can be motivated by several reasons. traditional decomposition of sum of squares or traditional analysis of variance (ANOVA) decomposition can be directly employed to rank the importance of variables and interactions. such as Sobol’ (1993.. (1. are used. have comprehensively studied this topic. the radial basis function approach.7 Sensitivity Analysis Sensitivity analysis studies the relationships between information ﬂowing in and out of the model. like the Bayesian approach and neural network. quantitatively.Introduction 31 demonstrate the connections among the Kriging method. Let D be the total variation of the output y = f (x) = f (x1 .e. The metamodel here is intended to give insight into the relationships of input variables with the response variable. This is particularly important when one is dealing with a very complicated model with numerous input variables. such as Kriging or neural networks.” When a low-order polynomial model is used to ﬁt the computer experiment data. 1. SA is used to increase the conﬁdence in the model and its predictions. one can cross check the results produced by the model and compare them with known general physical laws. (2000) stated: “Sensitivity analysis (SA) is the study of how the variation in the output of a model (numerical or otherwise) can be apportioned. and the magnitude of the scaled regression coeﬃcient also explains the eﬀect of each term. However. More modeling techniques. one may want to do a sanity check on the model. Chan and Scott (2000). be they data used to calibrate it. to diﬀerent sources of variation. especially in engineering design. xs ). it would be diﬃcult to directly understand the metamodels if sophisticated basis functions. will be introduced in Chapter 5. Saltelli et al. As a whole. To address these questions. the so-called sensitivity analysis (SA) was proposed. i. These needs lead us to consider the following questions: how to rank the importance of variables and how to interpret the resulting metamodel.. Many authors. By quantifying the sensitivity. the model-independent variables. LLC . or factors. . First. The other reason.32) © 2006 by Taylor & Francis Group. We can split the total variation D into s s D= k=1 t1 <···<tk Dt1 ···tk . by providing an understanding of how the model response variables respond to changes in the inputs. and of how the given model depends upon the information fed into it. · · · . is that one would like to make design decisions by knowing which input variables have the most inﬂuence on the output. and local polynomial modeling.

32 Design and Modeling for Computer Experiments where Di gives the contribution of xi to the total variation D. dxk − f0 − fi (xi ) − fj (xj ). The Sobol’ functional ANOVA method suggests D= where f0 = And Dt1 ···tk = where fi (xi ) = f (x) k=i (f (x) − f0 (x))2 dx = 2 f 2 dx − f0 . xs ).j fij (xi . St1 ···tk = The ratios © 2006 by Taylor & Francis Group. or • which level-combination of the input variables can reach the maximum or minimum of the output y. and so on.j the contribution of interaction between xi and xj . 1 dxk − f0 . we can carry out the above process by the use of g to replace f . the true model has the decomposition s s f (x) = f0 + i=1 fi (xi ) + 1=i<j fij (xi . · · · . In Section 6. k=i. (1. xj ) = f (x) and so on. Therefore. The investigator may be concerned with • whether the metamodel resembles the system or the true model. correlation ratios and Fourier amplitude sensitivity test (FAST) are three popular measures to quantify the importance of input variables in the literature. ft2 ···tk dxt1 · · · dxtk . we will illustrate how to use the Sobol’ indices to rank the importance of input variables and their interactions and will further introduce the correlation ratio. Di. The Sobol’ indices.33) Dt1 ···tk D are called Sobol’ global sensitivity indices. x2 . If one uses a metamodel g to replace the true one f .3. The aim of sensitivity analysis is to investigate how a given metamodel responds to variations in its inputs. xj ) + · · · + f12···n (x1 . f (x)dx. and FAST in a systematical way. • whether we can remove insigniﬁcant input factors of some items in the metamodel to improve the eﬃciency of the metamodel. • which input variables and which interactions give the greatest contribution to the output variability. LLC .

Choose a suitable factor space. Therefore. Construct a design for the experiment: Selecting a design with an appropriate number of runs and levels for each variable to ensure suﬃcient design space coverage is one of the important early steps in design and modeling computer experiment. It typically takes a sampling approach to distributions for each input variable. 1.hk/UniformDesign The number of runs and levels depends on the complexity of the true model. it is often diﬃcult to incorporate a large number of levels. • Global SA: this emphasizes apportioning the output uncertainty to the uncertainty in the input variables. http://www. © 2006 by Taylor & Francis Group. There are many high quality space-ﬁlling designs in the literature. In addition.hkbu. for example. similar to physical experiments. experimental domain. where the aim is to identify inﬂuential factors in the true model.Introduction 33 In the literature there are many diﬀerent approaches for SA.math. • Local SA: this emphasizes the local impact of the input variables on the model. Chapter 6 will provide details on SA and the application of SA to some examples in the previous chapters of the book.. factor screening.e. LLC . the aforementioned types of models often are also time consuming to construct and. an optimized design which has a limited number of runs and levels yet still has suﬃcient space-ﬁlling property may be needed (see Chapter 4).. the local SA helps to compute partial derivatives of the metamodel with respect to the input variables. The experimenter has to screen out the inactive factors so that one can focus on a few active factors.8 Strategies for Computer Experiments and an Illustration Case Study This section gives an illustrative example of computer experiment where a space-ﬁlling design and related modeling will be discussed.e. These can generally be divided into three classes: • Screening methods: i. See. i. One may want to consider several modeling technique alternatives for a given data set (see Chapter 5 for a detailed discussion on modeling techniques). The following are the typical steps in a computer experiment: Factor screening: There are many possible related factors according to the purpose of the experiment. Some models such as ﬁnite element analysis and computational ﬂuid dynamics are very expensive to run.edu. Metamodel searching: This step includes the selection of appropriate modeling techniques as well as estimation of the model parameters.

Readers should refer to Section 5. The MSE is given by M SE = 1 N N (y(xk ) − y (xk ))2 .1. Further study: If one cannot ﬁnd a satisfactory metamodel based on the data set. Example 8 (Demo Example) For illustration of design and modeling for computer experiments. In this case. y(xk ). one may consider other more suitable modeling techniques discussed in Chapter 5.34 Design and Modeling for Computer Experiments Verify the metamodel: One may consider several alternatives of modeling methods to ﬁnd the most suitable model. The smaller the MSE value. The ability to interpret the metamodel to understand the relationship between inputs and output is often as important as getting the best prediction accuracy from a metamodel. The procedure is as follows: Randomly choose N points xk . the following example gives the reader a brief demonstration. Conducting comparisons among these models may be necessary to select the ‘best’ one. we can verify the model using the mean square error (MSE) of prediction at untried points. For the purpose of generalization capability. A brief introduction of design and modeling for this case study is given as follows: © 2006 by Taylor & Francis Group. when the computer model is computationally expensive. one may want to consider either augmenting the existing data set or creating a new experiment with better design structure to ensure a suﬃcient number of levels (to capture nonlinearity) and a suﬃcient number of runs to improve the space ﬁlling of the design. Selecting the ‘best’ model can be a diﬃcult task.7 is an important step in computer experiments especially for complicated metamodels such as Kriging models. the better the metamodel. satisfactory predictions at untried points. An and Owen (2001).1 for a detailed discussion of model selection. when the poor result is due to the former. LLC .1. Interpret the model and perform sensitivity analysis: Interpretation of the chosen metamodel as discussed in Section 1. users will not have the luxury of performing cross validation with a large sample size.34) The MSE or the square root of MSE is a departure of the metamodel from the true model.1 for details).e. i.. ˆ k=1 (1. i = 1. Ho and Xu (2000). If the latter is the problem. and Fang and Lin (2003). · · · . N on the domain T and calculate their ˆ output y-values. and estimated values y (xk ) using the recommended metamodel. veriﬁcation will be performed using limited samples or in some cases using approximation methods such as k-fold cross validation (see Section 5. However. Chapter 6 gives a detailed discussion on this issue. Morris et al. A case study was investigated by many authors such as Worley (1987). In general. it may be because of either poor design selection or metamodel choice. (1993).

(cf. readers should refer to Chapter 5. 0. Although this is a simple example and does not represent a typical case of computer experiment in complex engineering design practice. 2. The ﬂow rate through the borehole can be studied using an analytical model derived from Bernoulli’s law using an assumption that the ﬂow is steady-state laminar and isothermal. 115600]. Kw ∈ [9855. i. In particular. it is a good demonstrative example for computer experiments. 1110]. 820]. Tu ∈ [63070.2. Hu ∈ [990. Choose factors and experimental domain.1). respectively. Hl ∈ [700. x8 ) and y(x). Section 1. the ﬂow rate through the borehole in m3 /yr.3). For a detailed treatment of modeling techniques. 12045]. we are using a simple engineering example of ﬂow rate of water through a borehole from an upper aquifer to a lower aquifer separated by an impermeable rock layer. LLC . Section 2.4. Chapter 3) that is given in Table 1. r ∈ [100. In the following example.116].05. (1993). In this section. Morris et al.6 gives a brief review of various modeling techniques. and modiﬁed maximin design. Construct design of experiment matrix Worley (1987) chose a 10-run Latin hypercube sampling (cf. maximin design (cf. Welch. L ∈ [1120.2..35) where the 8 input variables are as follows: rw (m) = radius of borehole r(m) = radius of inﬂuence Tu (m2 /yr) = transmissivity of upper aquifer Tl (m2 /yr) = transmissivity of lower aquifer Hu (m) = potentiometric head of upper aquifer Hl (m) = potentiometric head of lower aquifer L(m) = length of borehole Kw (m/yr) = hydraulic conductivity of borehole and the domain T is given by rw ∈ [0. while Ho and Xu (2000) chose a uniform design (cf. The input variables and the corresponding output are denoted by x = (x1 . · · · .e. maximin Latin hypercube design. Section 5. Metamodel searching In terms of statistical modeling. (1993) employed four types of 10-run designs. Mitchell and Wynn (1989) and Morris et al. 50000]. Section 2. 3. (1. 1680].4). one wants to ﬁt highly adaptive models by various modeling techniques.Introduction 35 1. for the purpose of friendly © 2006 by Taylor & Francis Group. Section 2.5 revisits this example using the Bayesian modeling method proposed by Sacks. is determined by y= 2πTu [Hu − Hl ] ln( rr ) w 1+ 2LTu 2 ln(r/rw )rw Kw + Tu Tl . The response variable y. Latin hypercube design.15].1. The y-values for all 32 runs are listed in the last column of Table 1. Tl ∈ [63.

36

TABLE 1.2

Design and Modeling for Computer Experiments

**Uniform Design and Related Output
**

No 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 rw 0.0500 0.0500 0.0567 0.0567 0.0633 0.0633 0.0700 0.0700 0.0767 0.0767 0.0833 0.0833 0.0900 0.0900 0.0967 0.0967 0.1033 0.1033 0.1100 0.1100 0.1167 0.1167 0.1233 0.1233 0.1300 0.1300 0.1367 0.1367 0.1433 0.1433 0.1500 0.1500 r 33366.67 100.00 100.00 33366.67 100.00 16733.33 33366.67 16733.33 100.00 16733.33 50000.00 50000.00 16733.33 33366.67 50000.00 16733.33 50000.00 16733.33 50000.00 100.00 33366.67 100.00 16733.33 16733.33 33366.67 100.00 50000.00 33366.67 50000.00 50000.00 33366.67 100.00 Tu 63070 80580 98090 98090 115600 80580 63070 115600 115600 80580 98090 115600 63070 115600 80580 80580 80580 80580 98090 63070 63070 98090 115600 63070 80580 98090 98090 98090 63070 115600 63070 115600 Tl 116.00 80.73 80.73 98.37 80.73 80.73 98.37 116.00 80.73 80.73 63.10 63.10 116.00 116.00 63.10 98.37 63.10 98.37 63.10 98.37 116.00 98.37 80.73 63.10 116.00 98.37 63.10 116.00 116.00 63.10 98.37 80.73 Hu 1110.00 1092.86 1058.57 1110.00 1075.71 1058.57 1092.86 990.00 1075.71 1075.71 1041.43 1007.14 1075.71 1007.14 1024.29 1058.57 1024.29 1058.57 1024.29 1041.43 990.00 1092.86 1092.86 1041.43 1110.00 1110.00 1007.14 1024.29 990.00 1007.14 990.00 1041.43 Hl 768.57 802.86 717.14 734.29 751.43 785.71 768.57 700.00 751.43 802.86 717.14 768.57 751.43 717.14 820.00 700.00 700.00 820.00 717.14 802.86 785.71 802.86 734.29 785.71 768.57 820.00 820.00 700.00 785.71 734.29 751.43 734.29 L 1200 1600 1680 1280 1600 1680 1200 1360 1520 1120 1600 1440 1120 1360 1360 1120 1520 1120 1520 1600 1280 1680 1200 1680 1280 1280 1440 1200 1440 1440 1360 1520 Kw 11732.14 10167.86 11106.43 10480.71 11106.43 12045.00 11732.14 10793.57 10793.57 9855.00 10793.57 11419.29 11419.29 11106.43 9855.00 10480.71 10480.71 10167.86 10793.57 12045.00 12045.00 9855.00 11419.29 12045.00 11732.14 10167.86 10167.86 10480.71 9855.00 11732.14 11419.29 11106.43 y 26.18 14.46 22.75 30.98 28.33 24.60 48.65 35.36 42.44 44.16 47.49 41.04 83.77 60.05 43.15 97.98 74.44 72.23 82.18 68.06 81.63 72.54 161.35 86.73 164.78 121.76 76.51 164.75 89.54 141.09 139.94 157.59

introduction, we only consider polynomial regression models. A careful study by Ho and Xu (2000) suggested the following model: log(y) = 4.1560 + 1.9903(log(rw ) + 2.3544) − 0.0007292 ∗ (L − 1400) − 0.003554 ∗ (Hl − 760) + 0.0035068 ∗ (Hu − 1050) + 0.000090868 ∗ (Kw − 10950) + 0.000015325 ∗ (Hu − 1050) ∗ (Hl − 760) + 0.00000026487(L − 1400)2 − 0.0000071759 ∗ (Hl − 760)2 − 0.0000068021 ∗ (Hu − 1050)2 − 0.00087286 ∗ (log(r) − 8.8914). (1.36) 4. Verify the model Since the response values are easily acquired from the ﬂow through the borehole formula, we randomly generated new samples of N = 2000 to evaluate the model in (1.36). The MSE based on this model equals 0.2578156.

© 2006 by Taylor & Francis Group, LLC

Introduction

37

5. Interpret model and perform sensitivity analysis In this example, we are also interested in ranking the importance of input variables to the output. While Chapter 6 gives a detailed discussion on this issue, here we employ the most straightforward approach using the sum of squares decomposition: SSTO = SSR + SSE, where the sum of square total (SSTO) of output variation can be decomposed into the sum of square regression (SSR), the portion that can be an explainable component by the model, and the sum of square errors (SSE), the portion that is an unexplainable component by the model. We have

n

SSTO =

i=1

(yi − y )2 , ¯

**where y is the average response; ¯
**

n

SSR =

i=1

(ˆi − y )2 , y ¯

**where yi is the ﬁtted value of yi ; and ˆ
**

n

SSE =

i=1

(yi − yi )2 . ˆ

In conventional physical experiments, SSE represents lack of ﬁt and/or random errors from the experiment. In computer experiments, there is no random error and SSE is a result of lack of ﬁt either due to the limitations of the metamodel itself or the exclusion of some input variables from the model. The increase of regression sum of squares can be used as a measure of the marginal eﬀects of adding one or several covariates in the regression equation. Here, we decompose the regression sum of squares by sequentially adding new terms in the regression equation and calculating the corresponding increase in SSR. Let SSR(xi ) be the regression sum of squares when only xi is included in the regression mode. Thus, SSR(xi ) stands for marginal reduction of total variation due to xi . Readers should consult Section 6.2 for a detailed discussion. Here, we can use sequential SSR to rank the importance of variables. Table 1.3 presents the summary of the sequential SSR decomposition, where the last column shows the relative importance of the term with respect to SSTO, based on the SSR decomposition. In other words, the SSR decomposition shows rank order of importance of the input variables.

© 2006 by Taylor & Francis Group, LLC

38

**Design and Modeling for Computer Experiments
**

TABLE 1.3

Sum of squares decomposition Terms Seq.SSR rw 11.2645 0.7451 L 0.4583 Hl 0.5170 Hu 0.1401 Kw 0.0064 Hu ∗ Hl L2

2 Hu 2 Hl r

%Seq.SSR 85.74 5.67 3.49 3.94 1.07 0.05 0.03 0.01 0.01 0.00 100.00

0.0036 0.0007 0.0013 0.0001 13.1373

SSTO

1.9

Remarks on Computer Experiments

This section gives some general remarks on relationships between design and analysis of computer experiments (DACE) and other statistical designs. A computer experiment has its own model y = f (x1 , · · · , xs ) = f (x), x = (x1 , · · · , xs ) ∈ T, (1.37)

where the known true model f is known but is complicated or has no analytic formula; the input space T is large in most experiments. For a given input x ∈ T , one can ﬁnd the output without random error. The model (1.37) is deterministic, although calculating the output may involve round-oﬀ errors. Santner et al. (2003) distinguish three types of input variables: Control variables: The variables are of interest in the experiment and can be set by an engineer or scientist to “control” the system. Control variables are presented in physical experiments as well as in computer experiments. Environmental variables: The variables are not the main interest, but we have to consider their eﬀects on the output. Temperature outside and highway quality when a car is running are examples of environmental variables. Environmental variables are also called noise variables. Model variables: The variables are parameters to describe the uncertainty in the mathematical modeling that relates other inputs to output. In this book we prefer to use “model parameters,” because we can estimate these parameters using the collected data. Here control variables and environmental variables correspond to control factors and noise factors in the robust design and reliability-based design mentioned in Section 1.3.2. The number of input factors, s, can be very large in computer experiments. Santner, Williams and Notz (2003) consider two types of the inputs:

© 2006 by Taylor & Francis Group, LLC

Introduction

39

Homogeneous-input: All components of x are either control variables or environmental variables or model variables. Mixed-input: x contains at least two of the three diﬀerent input variables: control, environmental, and model. However, when the intent of the experiment is to build a metamodel, we make no distinction between homogeneous and mixed input in setting up the design experiment or between the types of input variables. There is a close relation between computer experiments and physical experiments with a model unknown. In Example 7, the experimenter does not know the true model representing the relationships between the mortality of the cells and the contents of six chemicals. For the nonparametric model y = f (x) + ε, (1.38)

the model f (x) is unknown and ε is the random error. The experimenter wish to estimate the model f by a physical experiment. Clearly, the two kinds of models, (1.37) and (1.38), have many common aspects: 1) the true model is complicated (known or unknown) and needs a simple metamodel to approximate the true model; 2) the experimental domain should be large so that we can build a metamodel that is valid for a larger design space; 3) space-ﬁlling designs are useful for both kinds of experiments; and 4) various modeling techniques can be applied to the modeling for the both cases. The diﬀerence between the two models (1.37) and (1.38) is that the former does not have random error, but the latter involves random error. When we treat model (1.38) all the statistical concepts and tools can be employed. However, from a quick observation, one may argue that the statistical approach may no longer be useful for the model (1.37). On the other hand, due to common aspects of the two models (1.37) and (1.38), we may also argue that some useful statistical concepts/tools based on model (1.38) may still be useful for model (1.37). As an example, many ideas and concepts in sensitivity analysis come from statistics. The statistical approach is also useful to deal with complex experiments. In particular, when the number of input variables is large, the “sparsity principle” in factorial design mentioned in Section 1.2 can be applied. It is reasonable to believe that the number of relatively important input variables is small, and a statistical model is developed to include only these important variables; the relatively small eﬀects of other input variables are not included in the model and are treated as random errors from unobserved variables. Under this consideration, a model (1.37) can be split into two parts: one relates to the set of important variables and the other relates to the set of insigniﬁcant variables. Thus, model (1.37) becomes model (1.38). Computer-based simulation and analysis, such as probabilistic design, has been used extensively in engineering to predict performance of a system. Computer-based simulations have a model with the same format of the model (1.38), where the function f is known, and ε is random error simulated from a known distribution generated by a random number generator in the computer. Despite the steady and continuing growth of computing power and

© 2006 by Taylor & Francis Group, LLC

40

Design and Modeling for Computer Experiments

speed, the high computational costs in analysis of the computer simulation is still a crucial problem. Like computer experiments, computer-based simulation also involves a) a design to sample the domain of interest and b) ﬁnding a suitable metamodel. Response surface methodology (RSM) has been successfully applied to many complex processes to seek “optimal” processing conditions. RSM employs sequential experimentation based on lower-order Taylor series approximation. RSM approach may be successfully applied to computer experiments when low-order polynomials (i.e., second order) are suﬃcient to capture the inputoutput relationship. However, in general, the approach may not be the best choice due to the following reasons: a) RSM typically considers only few factors while computer experiments may consider many input variables; b) the experimental domain in RSM is small while computer experiments need a large experimental space; c) the statistical model in RSM is well deﬁned and simple while the model in most computer experiments is too complicated and has no analytic formula; d) RSM requires some prior knowledge to choose a good starting point in a reasonable neighborhood of the optimum while computer experiments are intended to provide a more global study on the model. Readers can refer to Myers and Montgomery (1995) for a comprehensive review on RSM.

1.10

Guidance for Reading This Book

This book is divided into three parts. Part I provides an introduction of design and modeling of computer experiments and some basic concepts that will be used throughout the book. Part II focuses on the design of computer experiments; here we introduce the most popular space-ﬁlling designs (Latin hypercube sampling and its modiﬁcations, and uniform design), including their deﬁnition, properties, construction, and related generating algorithms. Part III discusses modeling of data from computer experiments; here various modeling techniques as well as model interpretations (i.e., sensitivity analysis) are presented. Readers may elect to read each part of the book sequentially or readers interested in a speciﬁc topic may choose to go directly to a certain chapter. Examples are provided throughout the book to illustrate the methodology and to facilitate practical implementation. Some useful concepts in statistics and matrix algebra are provided in the Appendix to make this book self-contained. The appendix reviews basic concepts in statistics, probability, linear regression analysis, and matrix algebra that will be required to understand the methodology presented in the book. The appendix may be useful as a reference even for a reader who has suﬃcient background and familiarity with the subject of the design of experiments.

© 2006 by Taylor & Francis Group, LLC

The chapter starts with the fundamental concepts of prediction errors and model regularization. These heuristic optimization algorithms can be employed to search for high-quality space-ﬁlling designs (e. radial basis function. A Monte Carlo technique of Sobol’ functional decompositions is introduced as © 2006 by Taylor & Francis Group. readers who are interested in practical usage may choose to pay more attention to a particular modeling technique. symmetric Latin hypercube sampling. Kriging. and local polynomials as many references will be made to the Kriging model. Because of the complexity of the structure of metamodels discussed in Chapter 5. Chapters 2 and 3 can be read independently.g. The most notable criteria are modiﬁed L2 -discrepancies. and local polynomials. The techniques are very useful when one would like to construct uniform design without employing combinatorial optimization process. such as randomized orthogonal array. Readers interested in practical aspects of uniform design may want to focus on the construction of the design and skip the theoretical discussions. It is highly recommended that readers study this chapter. and optimal Latin hypercube designs. The chapter starts with the traditional sum of squares decomposition in linear models. Simulated Annealing. especially those who are interested in understanding the sensitivity of input variables to the output. Readers may elect to read and implement one algorithm among the available alternatives. Column-Pairwise. Algebraic approaches for constructing several classes of uniform design are presented. The former introduces the concept of Latin hypercube sampling and its various modiﬁcations. Threshold Acceptance. and Stochastic Evolutionary) are presented. special techniques for model interpretations are presented in Chapter 6. The approach then is generalized to sequential sum of squares decomposition for general models.. optimal LHDs and uniform designs) under various optimality criteria discussed in Chapters 3 and 4. splines.. Chapter 4 presents various stochastic optimization techniques for constructing optimal space-ﬁlling designs. Understanding the Kriging model may be necessary before reading about the Bayesian approach. Again. These concepts are crucial especially for dealing with the complexity of highly adaptive metamodels which are typically in the form of non-parametric regressions. Bayesian approaches. These techniques can be viewed as generalizations of the traditional Analysis of Variance (ANOVA) in linear models. Several popular algorithms (i. neural networks (multilayer perceptron networks and radial basis function).Introduction 41 Readers interested in applying the design of a computer experiment should read Part II carefully. LLC . A uniﬁed view using basis function expansions is provided to relate the variety of modeling techniques. A logical progression with increasing complexity of modeling techniques is presented including the well-known polynomial models. Various measures of uniformity are introduced in the context of the construction of space-ﬁlling designs. Chapter 5 introduces various modeling techniques. Design optimality criteria for computer space-ﬁlling design are also introduced in the chapter. Chapter 3 discusses a diﬀerent approach to space-ﬁlling design known as the uniform design.e.

where response data are collected over a range of time interval. statistics. Real-life case studies will be used to introduce the concept. Chapter 7 is necessary only for readers interested in experiments in which the response is a curve. © 2006 by Taylor & Francis Group. an experiment measuring engine radiated noise at a range of speeds. The idea of analyzing functional response in the context of the design of experiments is a relatively new area. but we give related references for interested readers. Most of the theoretical proofs are omitted. The following ﬂowchart ﬁgure gives some ﬂexible strategy for reading this book. or a trajectory over time The reader who needs some basic knowledge in matrix algebra. The reader may start to read Chapter 1 to see the motivation and strategies for computer experiments. Readers can skip Chapter 4 if they do not need to construct LHSs and UDs. Here the response is in the form of a curve. for example. Additionally. The book emphasizes methodology and applications of design and modeling for computer experiments.42 Design and Modeling for Computer Experiments a generalization of ANOVA decomposition. An alternative computational technique known as the Fourier Amplitude Sensitivity Test (FAST) is also introduced. LLC . We introduce several possible approaches to deal with such cases. regression analysis. or operation interval. The last chapter discusses computer experiments with functional responses. Then the reader can go to Chapter 2 or Chapter 3 or both chapters. analytic functional decomposition is also provided to take advantage of typical tensor product metamodel structures. and selection of variables in regression models will ﬁnd the Appendix useful. space interval. Chapters 5 and 6 are the most important chapters to read for modeling computer experiments. probability. a function. Functional responses are commonly encountered in today’s computer experiments.

© 2006 by Taylor & Francis Group.Introduction 43 FIGURE 1.9 Flowchart for reading this book. LLC .

cutting method. selecting an experimental design is a key issue in building an eﬃcient and informative model. (b) it can serve various numbers of runs and input variables. and related generating algorithms. Chapter 2 introduces Latin hypercube sampling and its modiﬁcations: randomized orthogonal array. each having q levels. A good method of construction for space-ﬁlling designs should satisfy the following criteria: (a) it is optimal under the underlying statistical model. balanced) design of n runs and s factors. LLC . symmetric Latin hypercube sampling. a notation may stand for either a design or the set of the same type of designs without confusion. A space-ﬁlling design can also be deterministic. we introduce the most popular space-ﬁlling designs. Two space-ﬁlling designs are called equivalent if one can be obtained by permuting rank of factors and/or runs of another. including local search. s) can be a Latin hypercube sample. Chapter 3 concerns uniform design and its construction.Part I I Designs for Computer Exp eriments In this part. Optimization plays an important role in the design and modeling of computer experiments. We shall use the same notation for this design class.5 is the basis for the motivation of the Latin hypercube sampling and the uniform design. properties. like the uniform design. We use U (n. How best to construct a design depends on the underlying statistical model. For computer experiments. A space-ﬁlling design can be randomly generated. or the set of all such designs. Several measures of uniformity are introduced. such as the good lattice point method. and combinatorial design methods are considered. q s ) for a symmetrical U-type (i. and stochastic evolutionary methods. as in Latin hypercube sampling. expending orthogonal array method. In the text. In this case a design is a sample from a set of proposed designs. so we may use the same notation for them. or the set of all such samples. threshold accepting. LHS(n. Various methods of design construction.e. We do not distinguish between equivalent designs. 45 © 2006 by Taylor & Francis Group. For example. n stands for the number of runs and s for the number of factors. The overall mean model introduced in Section 1. constructions. (c) it can be easily generated with a reasonable computing time. Thus. and optimal Latin hypercube designs. simulated annealing. Latin square method. Optimal Latin hypercube designs under various criteria are given. their deﬁnitions.. Chapter 4 gives a detailed introduction to known heuristic combinatorial optimization algorithms for generating optimal design.

e. symmetric Latin hypercube sams pling. Deﬁnition 1 A Latin hypercube design (LHD) with n runs and s input variables. E(y) on C s in (1.8) and suppose one wants to estimate the overall mean of y. is an n × s matrix. xn } in C . Obviously. · · · . assume that f (x)dx < ∞ There are many ways to generate the design Dn .1 Latin Hypercube Sampling Consider model (1. based on a set of experimental points Dn = ¯ s {x1 . n}. s). i.2 Latin Hypercube Sampling and Its Modiﬁcations This chapter introduces Latin hypercube sampling and its various modiﬁcations. C s = [0.. (B) Latin hypercube sampling: Divide the domain C s of each xk into n strata of equal marginal probability 1/n. the corresponding sample mean is unbiased with variance Var(f (x))/n. In fact. A natural estimator would be the sample n 1 mean. LLC . and optimal Latin hypercube designs. in which each column is a random permutation of {1. for instance.1) 47 © 2006 by Taylor & Francis Group. xn are independently identical samples from the uniform distribution U (C s ). and sample once from each stratum. · · · . For development of the theory. 2. 2. where x ∼ U (C s ). 1] is chosen as the experimental domain. · · · . such as randomized orthogonal array. (2. y (P) = n i=1 f (xi ). For simplicity. denoted by LHD(n. From the statistical point of view one would like the sample mean to be unbiased and have minimal variance. the LHS can be deﬁned in terms of the Latin hypercube design.17). the latter stands for the uniform distribution on C s . (A) Simply random sampling: Experimental points x1 .

8913 0.6154 ⎥ ⎥ ⎢ ⎢ ⎥ ⎢ 7 6 ⎥ ⎢ 0. · · · .9218 ⎥ . ⎢ ⎢ ⎥ ⎢ 4 1 ⎥ ⎢ 0.7477 ⎤ 0. 8} as (2. 0.48 Design and Modeling for Computer Experiments An LHS can be generated by the following algorithm (LHSA). Uk ) determines where xk is located within that cell. 2.7382 ⎥ ⎥ ⎢ ⎢ ⎥ ⎣ 3 7 ⎦ ⎣ 0. Then we generate 16=8*2 random numbers to form an 8 × 2 matrix on the right below: ⎤ ⎡ ⎡ ⎤ 0.5223 0.6068 ⎥ ⎢ ⎢⎢ 1 ⎢⎢ 7 6 ⎥ ⎢ 0.8214 2 5 ⎢ 5 8 ⎥ ⎢ 0. s. s). j = 1.8143 ⎥⎥ = ⎢ 0.8214 0.7919 ⎥ ⎥. Take ns uniform variates (also called random numbers in Monte j Carlo methods) Uk ∼ U (0.6.6) and (5. Independently take s permutations πj (1). Example 9 For generating an LHS for n = 8.9444 ⎥ ⎥ 0.7621 0.7382 ⎥⎥ ⎢ 0. i.1 gives plots of the design.2311 ⎥ ⎢ ⎢⎢ ⎢⎢ 1 3 ⎥ ⎢ 0. Step 2. s. n Then Dn = {x1 .0185 6 2 ⎤⎤ ⎡ 0. where k k xj = k j πj (k) − Uk . generate an LHD(n.8913 ⎥ ⎢ ⎢⎢ ⎢⎢ 8 4 ⎥ ⎢ 0.4565 0. ⎥ ⎢ ⎢ ⎥ ⎢ 8 4 ⎥ ⎢ 0. n.7919 ⎥⎥ ⎢ 0.9501 2 5 ⎢⎢ 5 8 ⎥ ⎢ 0.4447 ⎥ ⎥ ⎢ ⎢ ⎥ ⎢ 1 3 ⎥ ⎢ 0. n. in the ﬁrst step we generate two permutations of {1. 1).5961 ⎥⎥ ⎢ 0.8.3.1312 0.4860 ⎥−⎢ ⎢⎢ 8 ⎢⎢ 4 1 ⎥ ⎢ 0. 2) that is the matrix on the left below.2) to form an LHD(8.1.1. where eight points are assigned in a grid of 64 = 82 cells.4565 0.6154 ⎥⎥ ⎢ 0.4077 ⎥ ⎥ 0. · · · .3886 ⎥⎥ ⎢ 0.7.1763 ⎦ 0. · · · . and each point is uniformly distributed in the corresponding cell.6068 0.8.3179 0. s. k = 1.2981 ⎥ ⎥ 0. 1 2 Note that (π1 (k).e. xn } is a LHS and is denoted by LHS(n.9218 ⎥⎥ ⎢ 0.4057 6 2 Now an LHS is given by ⎤ ⎡ ⎡⎡ 0. · · · . πj (n) of the integers 1. · · · .0491 ⎥⎥ ⎢ 0. © 2006 by Taylor & Francis Group. · · · . n for j = 1.6510 ⎥ ⎥. Let xk = (x1 . which are mutually independent. satisfying that each row and each column has one and only one point.2311 0.. · · · .4.4447 ⎥⎥ ⎢ 0. · · · .7621 ⎥ ⎢ ⎢⎢ ⎣⎣ 3 7 ⎦ ⎣ 0. · · · . Algorithm LHSA Step 1.0098 ⎥ ⎥ 0.0185 0.1993 Figure 2.9501 0. LLC . s).8530 ⎦ 0.9047 ⎥⎥ ⎢ 0.4057 0.7.3. s = 2.4860 0.4. π2 (k)) determines in which cell xk is located and (Uk . xs ).1763 ⎦⎦ ⎣ 0.5. · · · . k = 1. j = 1.

4 0.Latin Hypercube Sampling and Its Modiﬁcations 49 1 0. ¯ Theorem 1 Denote by ySRS and yLHS the sample mean of n points gen¯ erated by simple random sampling and LHS.1 0 0 0.1 0. Welch. y Stein (1987) and Owen (1992a) found an expression for the variance of yLHS ¯ and showed that Var(¯LHS ) = y c 1 1 Var(f (x)) − + o( ).6 0.8 0.1 Two LHSs with six runs. Mitchell and Wynn (1989)). · · · .8 0.7 0. Welch. Mitchell and Wynn (1989) commented on the advantages of the LHS: “Iman and Helton (1988) compare Latin hypercube sampling with Monte Carlo sampling for generating a response surface as an empirical surrogate for a computer model. LHS is an extension of stratiﬁed sampling which ensures that each of the input variables has all portions of its range represented ( Sacks. The response surface was ﬁtted by least squares to data from a fractional-factorial design. Beckman and Conover (1979) showed that the LHS has a smaller variance of the sample mean than the simply random sampling. If the function f (x1 . LHS was proposed by McKay.2 0.2 0. then y Var(¯SRS ) ≥ Var(¯LHS ). xs ) in model (1. respectively.9 0.4 0.9 1 FIGURE 2.8) is monotonic in each of its arguments. n n n where c is a positive constant.5 0. McKay. Sacks. Beckman and Conover (1979) in what is widely regarded to be the ﬁrst paper on design of computer experiments.3 0. They found in a number of examples that the response surface could not adequately represent the complex output of the computer code but could be © 2006 by Taylor & Francis Group.5 0.3 0.6 0.7 0. LLC .

2q (2. Tables 2. i = 1. it means that U is a design U (n. U (n. · · · . U-type design is also called balanced design (Li and Wu (1997)) or lattice design ( Bates. q s ) and the notation U (n. · · · .2) Then the set xi = (xi1 . LLC . 34 ). q s ).50 Design and Modeling for Computer Experiments useful for ranking the importance of the input variables. xi. n. · · · . q s ) also denotes the set of all the U-type U (n. we denote it by U (n.1]. n is a design on C s and called the induced design of U . 2q 2q 2q There is a link between U-type designs and LHD. it is said to be symmetrical.5 . · · · . Schwabe and Wynn (1995)). . s. q s ) designs. Many authors suggested that LHS has a lattice structure π j (k) − 0. The corresponding LHS is called midpoint Latin hypercube sampling (MLHS) or centered Latin hypercube sampling. U (12. Many situations require that the elements of U fall in [0.··· . Riccomagno. is an n × s matrix such that the qj levels in the jth column appear equally often. denoted by U (n. This requirement leads to the following concept. · · · .s ). When rm some qj ’s are equal. It is clear that each qj should be a divisor of n. otherwise it is asymmetrical. i = 1. . q r1 × · · · × qm ) where integers r1 + · · · + rm = s. Deﬁnition 2 A U-type design with n runs and s factors.” j In fact. j = 1. Very often the q entries take the values {1. it can also be a systematic way of discovering scientiﬁcally surprising behavior as noted in Iman and Helton (1988). qs levels. When we write U ∈ U (n. q}. If all the qj ’s are equal. The midpoint LHS has a close relationship with the so-called U-type design (Fang and Hickernell (1995)). q1 × · · · × qs ). q} or {(2i − 1)/2q. 124 ) and U (9. q s ) with entries 2q − 1 1 3 . · · · . q s ). each having respective q1 . to put experimental point xk at the center of the cell.2 are two U-type designs. Let xij = 2uij − 1 . Because Latin hypercube sampling excises the code over the entire range of each input variable. In fact. q n ) with entries {1. the second step for the use of random numbers Uk can be removed to reduce the complexity of the sampling. · · · . xj = k n that is. This design is denoted by U (n. denoted by DU . i = 1. respectively. © 2006 by Taylor & Francis Group. · · · . Deﬁnition 3 Let U = (uij ) be a U-type design. DU is a U-type design U (n. A midpoint LHS with n runs and s input variables is denoted by M LHS(n. s). q}.1 and 2.

However. i. ns ) with entries { 2n . 34 ) No 1 2 3 4 5 6 7 8 9 1 1 1 1 2 2 2 3 3 3 2 1 2 3 1 2 3 1 2 3 3 1 2 3 2 3 1 3 1 2 4 1 2 3 3 1 2 2 3 1 LEMMA 2. such as: (a) it is computationally cheap to generate. Owen (1992b. 2. Many authors tried hard to improve LHS. (c) its sample mean has a smaller variance than the sample mean of a simply random sample. · · · . n} and 1 3 the set of M LHS is just the set of U (n. LHS does not reach the smallest possible variance for the sample mean. LLC . 2q−1 }. 2n . 124 ) 2 3 4 10 4 7 5 11 3 1 7 9 6 1 5 11 10 11 9 8 1 4 5 12 2 3 2 7 12 8 12 6 4 8 2 10 3 9 6 U (9.e. 2n Scatter-box plots are useful for visualization of the projection of the distribution of all the design pairs of any two variables.2 51 U-type No 1 2 3 4 5 6 7 8 9 10 11 12 Design 1 1 2 3 4 5 6 7 8 9 10 11 12 in U (12. · · · .2. We can see that the two ﬁgures are very close to each other. 3).1 The set of LHD(n. ns ) with entries {1. The concept of orthogonal array was introduced in Section 1. The box plots are generated automatically by the software but this is not necessary as points in each column of the design are uniformly scattered. 1994b) and Tang (1993) independently considered randomized orthogonal arrays. Tang (1993) called this approach orthogonal array-based Latin hypercube design (OA-based LHD).3 gives scatter-box plots for an LHS(30. Figures 2.Latin Hypercube Sampling and Its Modiﬁcations TABLE 2. we should generate another LHS instead of the previous one. If some scatter plots do not appear reasonably uniform. by reducing of the variance of the sample mean. 3) and its corresponding LHD(30. 2.2 and 2. s) is just the set of U (n. The so-called randomized orthogonal array or OA-based LHD is an LHD that is © 2006 by Taylor & Francis Group.1 TABLE 2.2 Randomized Orthogonal Array The LHS has many advantages. (b) it can deal with a large number of runs and input variables. In the next section we shall discuss this issue further.

LLC .5 1 0 0 1 Column Number FIGURE 2.5 0.5 1 Values 0 0.5 0 0 0.3 The scatter plots of LHD(30.5 0.5 0.5 0.5 0.5 1 0 1 0 0. © 2006 by Taylor & Francis Group.5 1 1 1 Column Number 0 1 0 0.5 1 0 0 0 0. 1 Values 1 1 0.5 0 1 Column Number 1 0 1 Values 0 0.5 1 0 1 0 0.5 1 0.5 0 1 0 0 0.5 0. 3).5 0 1 0 0 0.52 Design and Modeling for Computer Experiments 1 Values 1 1 0.5 1 0.5 1 1 1 Column Number 0 1 Values 0 0.5 0.5 1 0.5 0.5 0 1 Column Number 1 0 1 Values 0 0.5 1 0.5 0 0 0. 3).2 The scatter plots of LHS(30.5 0.5 0.5 0.5 1 1 Column Number FIGURE 2.5 0.

while their Latin hypercube property means that they use q 2 distinct values on each variable instead © 2006 by Taylor & Francis Group. 94 ). denoted by OH(9. 3} of {1. Begin with the ﬁrst column of L9 (34 ) in Table 1. Randomly choose a permutation. r) as needed and let A be this orthogonal array and let λ = n/q r . but the new OH(9. Obviously. (k − 1)λq r−1 + 2. Step 2. 2. 94 ).Latin Hypercube Sampling and Its Modiﬁcations 53 generated from an orthogonal array. 8. q. The replacement process results in an LHD that is an OA-based LHD. for k = 1. Figure 2. {2. 3. The latter spreads experiment points more uniformly on the domain. Now. The latter is an OA-based LHD. TABLE 2. we can write an algorithm to generate an OA-based LHD. Let us start with an illustrative example. for example. and a permutation {9. s. The experimental points of the former are denoted by ‘•’ and of the latter by ‘×. q. 7. replace the λq r−1 positions with entry k by a random permutation of {(k − 1)λq r−1 + 1. OH(n. · · · . ns ) is an LHD. 6} to replace the three level 2’s. For each column of A. 8} of {7. with 9 runs and 4 factors. 5.’ We can see that the original L9 (34 ) has three levels for each factor. 1. Tang further shows that these designs achieve the same order of variance reduction as orthogonal arrays of strength 2. Owen (1992b) said “Tang (1991) independently and contemporaneously had the idea of using orthogonal arrays to construct designs for computer experiments. OH(n. ns ). 9} to replace the three level 3’s. · · · .3. and 4 of Table 1.1 results in a new design. 3} to replace the three level 1’s and a permutation. LLC . (k − 1)λq r−1 +λq r−1 = kλq r−1 }.1. each having 9 levels in Table 2. Choose an OA(n. 4. 5} of {4. He constructs Latin hypercube samples with q 2 runs that become orthogonal arrays of strength 2 after grouping observations using q bins on each axis.4 shows plots of the ﬁrst two columns of L9 (34 ) and OH(9. 94 ) has nine levels for each factor. Similarly. applying the above process to columns 2.3 OH9 (94 ) Design from an L9 (34 ) Design No 1 2 1 1→2 1→2 1 2 1→1 2→6 2 3 1→3 3→8 3 4 2→6 1→3 2 5 2→4 2→5 3 6 2→5 3→7 1 7 3→9 1→1 3 8 3→7 2→4 1 9 3→8 3→9 2 3 → → → → → → → → → 1 5 8 6 9 2 7 3 4 1 2 3 3 1 2 2 3 1 4 → → → → → → → → → 2 4 8 9 1 5 6 7 3 Algorithm ROA Step 1. {6.

s)-net. s)-nets. Figure 2.4 Experimental points of two factors by OA and OA-based LHD. m. He concludes that for any square-integrable integrand f.3 Symmetric and Orthogonal Column Latin Hyp ercub es Since the LHS is only a form of stratiﬁed random sampling and is not directly related to any criterion. it may perform poorly in estimation and prediction of the response at untried sites (Ye. are superior to those of an orthogonal array. s)-nets have a better performance than that of the corresponding randomized orthogonal array.54 Design and Modeling for Computer Experiments FIGURE 2. In fact. For larger experiments one might prefer the orthogonal arrays on account of their better balance. 2. the balance properties of a (t. the resulting variance surpasses any of the usual variance reduction techniques.” Randomized orthogonal arrays can improve performance of LHD by achieving good combinatorial balance among any two or more columns of an orthogonal array. m. of the q used by the straight orthogonal array. Li and Sudjianto (2003)).5 gives © 2006 by Taylor & Francis Group. This advantage becomes more important with the smaller computer experiments appropriate for the more expensive to run simulators. proposed by Niederreiter (1992). It can be expected that randomized or scrambled (t. m. LLC . Owen (1997) found a variance for the sample mean over randomized (t.

the latter may pose diﬃculty in modeling.5 0 0 0. The so-called symmetric Latin hypercube design (SLHD) and orthogonal column Latin designs (OCLHD) are two such modiﬁed LHDs. and D6−2 ⎢ 3 3 ⎥ . It was reported by Park (1994) and Morris and Mitchell (1995) that many optimal LHDs they obtained have some symmetric property that leads to the following concept. 2) designs below: ⎤ ⎤ ⎡ ⎡ 15 13 ⎢2 4⎥ ⎢2 6⎥ ⎥ ⎥ ⎢ ⎢ ⎥ ⎢ ⎢3 2⎥ ⎥ . however. 1 1 0. the second design is obviously deﬁcient. which has some good “built-in” properties in terms of sampling optimality that will be discussed in the next section. one factor is almost confounded with the other. Generating such design. ⎢ D6−1 = ⎢ ⎢4 2⎥ 4 5⎥ ⎥ ⎥ ⎢ ⎢ ⎣5 1⎦ ⎣5 1⎦ 66 64 55 The design on the left is perfect. If we delete some poor candidates and narrow the design domain for sampling. Another consideration is to ﬁnd a special type of LHD. Because of these large possibilities. © 2006 by Taylor & Francis Group.5 0.5 Two LHSs with six runs. Therefore. 1]2 .5 1 FIGURE 2.Latin Hypercube Sampling and Its Modiﬁcations plots of the two LHS(6. As a result. several authors suggested narrowing the candidate design space. LLC . it is logical to further enhance LHD not only to ﬁll space in a one-dimensional projection but also in higher dimensions. the corresponding LHD can have better performance in practice. requires a large combinatorial search of (n!)s . while the design on the right has no experimental points in a certain area of the domain C 2 = [0.5 1 0 0 0. but viewed in two dimensions. A one-dimensional projection would make both designs look balanced and adequate.

Ye (1998) proposed the orthogonal column LHD. there exists another row such that it is the reﬂection of xk through the center n+1 . 5): ⎡ 1 ⎢ 2 ⎢ ⎢ 3 ⎢ ⎢ 4 ⎢ ⎢ 5 ⎢ ⎢ 6 ⎢ ⎢ 7 ⎢ ⎢ 8 ⎢ ⎣ 9 10 ⎤ 6 6 5 9 2 3 2 4⎥ ⎥ 1 9 7 5⎥ ⎥ 3 4 10 3 ⎥ ⎥ 7 1 8 10 ⎥ ⎥. 5. −1. Two column vectors in an LHD are called column-orthogonal if their correlation is zero. 0. if it has the reﬂection property: for any row xk of the matrix. 2) into a∗ = (1. then the vector (n + 1 − a1 . By permuting rows of a symmetric LHD we can always assume that the kth row is the reﬂection of the (n − k + 1)th row and the summation of these two rows forms a vector (n + 1. as ) is a row of an M LHD(n. 2m−1 }.. Table 2. 2. −1. 3. or equivalently a and b are column-orthogonal. It is easy to ﬁnd that a∗ b∗ = 0. as the estimates of linear eﬀects of the input variables are only slightly correlated. n + 1. we mean that (a1 . · · · . −2) and b∗ = (−2. To easily check whether two columns are column-orthogonal or not. In this case n levels are {−2m−1 . 2 By reﬂection. denoted by SLHD(n. ns ) with n = 2m + 1. 11. 1. Deﬁnition 5 An n × s matrix is called an orthogonal column LHD. · · · . 3 & 8. 5. 4 & 7. · · · . Based on this consideration. 2. 94 ). 11). s). 1. 3. ns ). For example. 1) and b = (1. denoted by OCLHD(n. 11. 4 10 3 1 ⎥ ⎥ 8 7 1 8⎥ ⎥ 10 2 4 6 ⎥ ⎥ 9 8 9 7⎦ 5 5 6 2 where the summation of row pairs 1 & 10. n + 1 − as ) should be another row in the design matrix. · · · . a∗ and b∗ are column-orthogonal. 1) .4 shows an OCLHS(9. we convert the levels of each column to be symmetric about zero. 4. we convert two column-orthogonal columns a = (4. 11. 0. n + 1). 2 & 9. · · · . or 5 & 6 is (11. where m is a positive integer. 2). Such designs for computer experiments would beneﬁt linear regression models. The following is another example for SLHD(10. 2. LLC . i. if it is an LHD and its column-pairs are orthogonal. 11. Ye (1998) suggested an algorithm for construction of OCLHD(n. Note that the design D6−1 is an SLHD(6.56 Design and Modeling for Computer Experiments Deﬁnition 4 A midpoint LHD is called a symmetric Latin hypercube design. but the design D6−2 is not.e. 0. s). Note that the bottom half of the design is precisely © 2006 by Taylor & Francis Group. Iman and Conover (1982) and Owen (1994a) proposed using LHD with small oﬀ-diagonal correlations.

Latin Hypercube Sampling and Its Modiﬁcations the mirror image of the top half.e. ⎣ 1 −1 1 −1 ⎦ 1⎦ 2 1 1 1 1 1 ⎢2 M=⎢ ⎣3 4 2 1 4 3 4 3 2 1 where M is the matrix of which each entry is the absolute value of the corresponding entry in T..4 OCLHD(9. Section A. It is easy to see that T is the element-wise product (also known as the Hadamard product) of M and H. Denote the top half of the design by ⎤ 1 −2 −4 3 ⎢ 2 1 −3 −4 ⎥ ⎥ T=⎢ ⎣ 3 −4 2 −1 ⎦ 4 3 1 2 ⎡ 57 and note that it can be expressed in terms of two same size matrices: ⎡ ⎤ ⎡ ⎤ 3 1 −1 −1 1 ⎢ ⎥ 4⎥ ⎥ and H = ⎢ 1 1 −1 −1 ⎥ .3) © 2006 by Taylor & Francis Group. LLC .1). TABLE 2. 94 ) No 1 1 1 2 2 3 3 4 4 5 0 6 −4 7 −3 8 −2 9 −1 2 −2 1 −4 3 0 −3 4 −1 2 3 −4 −3 2 1 0 −1 −2 3 4 4 3 −4 −1 2 0 −2 1 4 −3 Notice that M is an LHD(4. i. and H is the matrix of which each entry is taken as 1 or −1 according to whether the corresponding entry in T is positive or negative. T = M H (cf. 10 (2. 43 ) that can be constructed by two matrices I= 10 01 and Q = 01 .

· · · . m2 = I ⊗ Q · 4 = ⎣ 0 0 0 1⎦⎣3⎦ ⎣4⎦ 0010 4 3 ⎡ ⎤⎡ ⎤ ⎡ ⎤ 0001 1 4 ⎢0 0 1 0⎥⎢2⎥ ⎢3⎥ ⎢ ⎥⎢ ⎥ = ⎢ ⎥. · · · . h1 . h2 = b0 ⊗ b1 = (−1.4) The ﬁrst column of H is a 4-vector of 1’s. −1. Section A. From the above illustration example. 3. Ak n.58 Let Design and Modeling for Computer Experiments ⎤⎡ ⎤ ⎡ ⎤ 0100 1 2 ⎢1 0 0 0⎥⎢2⎥ ⎢1⎥ ⎢ ⎥⎢ ⎥ = ⎢ ⎥. m3 = Q ⊗ I · 4 = ⎣ 0 1 0 0⎦⎣3⎦ ⎣2⎦ 1000 4 1 ⎡ where ⊗ denotes the Kronecker product (cf.4). h1 h2 ). deﬁne the vector hk as hk = q1 ⊗ q2 ⊗ · · · ⊗ qm−1 . Let n = 2m + 1. −1. 1) . Section A.1) and 4 = (1. h2 . n = (1. m − 2}. m − 1. where qk = b0 and qi = b1 . the remaining three columns of H are h1 = b1 ⊗ b0 = (−1. · · · . m − 1.1). m−k−1 k Step 2. 1. 1) . that is. k = 1. respectively. 1 1 (2. · · · . i = k. Now. we now introduce the following algorithms proposed by Ye (1998). h1 . For k = 1. Algorithm OCLHD with n = 2m + 1 Step 1. 4) . and b0 and b1 are deﬁned in (2. j = 1. 1. 2. h3 ) = (1. let us introduce the construction of a Hadamard matrix H of order 4 (cf. © 2006 by Taylor & Francis Group. and h3 = h1 h2 = (1. The last column of M is the product of [I⊗Q][Q⊗I]·4. Step 3. 2. −1. Am−1 Aj n. m − 1. LLC . k = 1. and b1 = . 1) . Let b0 = −1 1 . −1. respectively. The ﬁrst column of M is 4 and the second and third columns of M are m2 and m3 . · · · . H = (1. Columns of the matrix M are {n. h2 . 2m−1 ) and Ak = I ⊗ · · · ⊗ I ⊗ Q ⊗ · · · ⊗ Q.

Generate an OCLHd(n + 1. The design with n = 2m + 1 runs and m + 1 factors. · · · . · · · . for an OCLHD. nm+1 ). 1. −2. and βij to be the bilinear interaction between xi and xj . j = 1. hi . Finally. This results in an OCLHD(n. However.5 OCLHD(8. (2. the top half of the design. 59 Step 5. OCLHD(n.5 where levels 1. Columns of the matrix H are {1. · · · . nm+1 ). h1 hj +1 . Algorithm OCLHD with n = 2m Step 1. Step 2. From Table 2. Ye (1998) pointed out that the OCLHDs ensure the independence of the estimates of linear eﬀects and the quadratic eﬀects of each factor and bilinear interactions are uncorrelated with the estimates of the linear eﬀects. −3. 8 replace the original levels −4. and make the mirror image of T the bottom half of the design. is generated. TABLE 2. the number © 2006 by Taylor & Francis Group. 3.Latin Hypercube Sampling and Its Modiﬁcations Step 4. add a (m + 1)-vector of 0’s between the top half and the bottom half. i = 1. LLC . m − 1. (n + 1)m+1 ) and delete the midpoint row of 0’s. 2. −1. 4. m − 2}.5) +··· + 1≤i1 ≤i2 ≤···≤iq ≤s Deﬁne βi to be the linear eﬀect of xi . 84 ) that is listed in Table 2. 84 ) No 1 1 5 2 6 3 7 4 8 5 1 6 2 7 3 8 4 2 3 5 1 7 2 8 4 6 3 1 2 6 5 4 3 7 8 4 7 1 4 6 3 5 8 2 Consider a polynomial regression model s y = β0 + ˆ i=1 βi xi + 1≤i≤j≤s βij xi xj βi1 ···iq xi1 · · · xiq . Make the remaining n rows with equidistant levels. Calculate the matrix T = M H. βii to be the quadratic eﬀect of xi .4 we can easily ﬁnd an OCLHD(8.

and z(x) is a stationary Gaussian random process (cf. ns ) or some subset. the resulting design space. For a given number of runs and number of input variables.1 IMSE criterion In this section we introduce the integrated mean squares error (IMSE) criterion based on Sacks. denoted by D. where hj (x)s are known functions. s). In this case. The function R may have many choices. where σ 2 is the unknown variance of the random error and the correlation function R is given. such as entropy (Shewry and Wynn (1987)). the SLHDs and uniform designs introduced in Chapter 3 enjoy the advantage of a more ﬂexible run size. we used a strategy for improving the performance of LHDs by constructing an LHD from a narrow set of candidates with desirable properties. Consider a Kriging model (see Sections 1. Consider the linear predictor y (x) = c (x)yD .4 Optimal Latin Hyp ercub e Designs In the previous section. ˆ (2. the βj s are unknown coeﬃcients to be estimated.4) m y= ˆ i=1 βi hj (x) + z(x). (n.60 Design and Modeling for Computer Experiments of runs must be a power of two or a power of two plus one. An alternative idea is to adopt some optimality criterion for construction of LHS. Schiller and Welch (1989). for example.6) © 2006 by Taylor & Francis Group. z(xj )) = σ 2 R(xi − xj ). Moore and Ylvisaker (1990)).6 and 5. can be the set of U (n. 2. a Gaussian correlation function is given by R(d) = exp{−θd2 }. integrated mean squared error (IMSE)(Sacks. where θ is unknown. In contrast.2) with mean E(z(x)) = 0 and covariance Cov(z(xi ). 2. Section A. and maximin or minimax distance (Johnson. LLC . z(xj )) = σ 2 exp[−θ(xi − xj ) (xi − xj )].4. this increases the number of runs dramatically as the number of columns increases. Schiller and Welch (1989)). An optimal LHD optimizes the criterion function over the design space. we have Cov(z(xi ).

9) where h(x) = (h1 (x). · · · . VD = (Cov(Z(xi ).2) is given by −1 ˆ ˆ y (x) = h (x)β + vx VD (yD − HD β). vx = (Cov(Z(x). y(xn )) is the response column vector collected according to the design Dn = {x1 .7) (2. and −1 −1 hpb = [HD VD HD ]−1 HD VD yD .3. then the mean square error can be expressed as MSE(ˆ(x)) = σ 2 1 − (h (x). The column vector yD can be regarded as an observation taken from y = (Y (x1 ). ˆ (2. HD = (hj (xi )) : n × m. · · · . where yD = (y(x1 ). Cov(Z(x). vx ) y 0 HD HD VD h(x) vx . The mean square error of y (x) is ˆ MSE(ˆ(x)) = σ 2 − (h (x). which is the generalized least squares estimate of β. Let r(x) = v(x)/σ 2 and RD = VD /σ 2 . xn }. Z(xn ))) : n × 1. Z(xj ))). (2. LLC . The best linear unbiased predictor (BLUE) is obtained by choosing the vector c(x) to minimize the mean square error MSE(ˆ(x)) = E[c (x)yD − Y (x)]2 y subject to the unbiased constraint E(c (x)yD ) = E(Y (x)). Y (xn )) . · · · . Z(x1 )). n The IMSE criterion chooses the design Dn to minimize IMSE = T 0 HD HD RD h(x) rx .10) MSE(ˆ(x)φ(x)dx y (2. · · · .Latin Hypercube Sampling and Its Modiﬁcations 61 at an untried x. · · · . Section A. hm (x)) : m × 1.8) (2. rx ) y The maximum likelihood estimate of σ 2 is 1 ˆ ˆ ˆ σ 2 = (yD − h (x)β) R−1 (yD − h (x)β). n × n.11) © 2006 by Taylor & Francis Group. The BLUE of y(x) (cf.

From (2. y where the maximum is taken over all the candidate designs. To ﬁnd the optimal design.12) ( An optimal IMSE-optimal LHD minimizes IMSE over the set of U (n. so there are two input variables only. Sacks. LLC .4 will give a systematic study of the Kriging model and its estimation. This criterion chooses a design to minimize max MSE(ˆ(x)). they considered Example 4.4. 2 2 (2. Various optimization techniques will be introduced in Chapter 4. Shewry and Wynn (1987) described this as the classical idea of ‘the amount of information in an experiment.2. The lower the entropy. For example.10) the IMSE can be expressed as σ 2 1 − trace 0 HD HD RD −1 h(x)h (x) h(x)r (x) r(x)h (x) r(x)r (x) φ(x)dx . Σ) (see Section A.14) © 2006 by Taylor & Francis Group. As the parameter θ is unknown. The multivariate normal distribution of Np (μ.2) has entropy 1 p (1 + log(2π)) + log|Σ|. Schiller and Welch (1989) gave a detailed discussion on how to choose a suitable θ. Section 5.’ Let p(x) be a distribution in Rs . its entropy is deﬁned by Ent(p(x)) = − p(x)log(p(x))dx. This criterion is computationally more demanding. Rs (2. Readers can ﬁnd plots for many two-factor optimal IMSE LHDs in Koehler and Owen (1996). ns ) for a speciﬁc θ. or the higher the entropy. 2. the more precise the knowledge is. the more uncertainty there is. x2 ) = β0 + β1 x1 + β2 x2 + β11 x2 + β22 x2 + β12 x1 x2 + z(x1 .2 Entropy criterion The entropy criterion proposed by Shannon in 1948 has been widely used in coding theory and statistics. we need a powerful optimization algorithm due to the large combinatorial design space. 1 2 Another criterion related to MSE is maximum mean squared error (MMSE).62 Design and Modeling for Computer Experiments for a given weight function φ(x). Miller and Frenklach (1983) took a nine-point design and approximated each of the ﬁve log concentrations with a quadratic model y(x1 . in which all but two rate constants are hypothesized to have been established by previous work. 2. x2 ). Entropy measures the amount of information contained in the distribution of a set of data.13) It is easy to show Ent(p(x)) ≥ 0.

(2. D∈D (2. Using the classical decomposition we have Ent(yT ) = Ent(yD ) + EyD [Ent(yD |yD )]. where “∝” means “is proportional to. YN ) . ¯ (2.16) The entropy criterion maximizes g(e).17) The second item on the right-hand side of (2. denoted by T .Latin Hypercube Sampling and Its Modiﬁcations 63 When p is ﬁxed. LLC . Let e be an experiment on a random system X with sampling density p(x|θ) and parameter θ. π(θ)logπ(θ)dθ. Let yDn and yDn (or yD and yD for simplicity) be the vectors of outcomes from tried ¯ ¯ and untried points.29).15) The expected value of (2. Since Ent(yT ) is ﬁxed. the information increase is Ent(π(θ)) − Ent(π(θ|x)). If x is the data obtained on performing e.’ For the Kriging model (1.18) © 2006 by Taylor & Francis Group. minimizing the second item is equivalent to maximizing Ent(yD ). where N is the cardinality of T and Yi is attached to a point xi ∈ T . · · · . Let Dn be a subset of n runs of T to be ¯ ¯ implemented and Dn be the complement of Dn . is discrete (such as lattice points where each factor has a ﬁnite number of levels and the experimental domain is composed of all level-combinations). Assume the prior distribution for θ is π(θ). i. the posterior density for θ is π(θ|x) ∝ π(θ)p(x|θ). then the domain T is described by a random vector yT = (Y1 . the entropy criterion suggests maximizing max log|R|.e. Thus. If the experimental domain. (2.. The Bayesian approach combines prior information and experimental data using an underlying model to produce a posterior distribution.27) with covariance functions (1. maximizing the entropy is equivalent to maximizing log|Σ| or |Σ|.15) over the full joint distribution of X and θ is given by g(e) = Ex Eθ log π(θ|x) π(θ) . respectively. Prediction will then be based on the posterior distribution.17) is just the −g(e).” The amount of information on θ before and after the experiment is measured by their entropies Ent(π(θ)) = − and Ent(π(θ|x) = − π(θ|x)logπ(θ|x)dθ. hence the term ‘maximum entropy design.28) and (1. Dn ∪ Dn = T .

d(u. LLC .. and d(u. proposed by Johnson et al. ∀u. w ∈ T .e. (2.19) if the correlation function (1. D∗ ). D u. D) = max{d(x. Hence we can make reasonable predictions anywhere in the domain. i = 1. v) = d(v. min max d(x.. d(x. n 2n © 2006 by Taylor & Francis Group.. v).4. xn )}. v).20) A maximin distance design (Mm) D∗ maximizes the minimum inter-site distance minu. v) = min d(u. D) = max d(x. v) ≤ d(u. i.64 Design and Modeling for Computer Experiments where D is the design space and R = (rij ) is the correlation matrix of the design matrix D = (x1 . Consider a design D = {x1 .v∈D d(u. Johnson et al. · · · . Designs based on these criteria ensure that no point in the domain is too far from a design point.v∈D u. v). v) be a distance deﬁned on T × T satisfying d(u. xn } on T . Shewry and Wynn (1987) demonstrated a linear model with stationary error (a simple Kriging model) and showed the corresponding maximum entropy designs. The latter criterion has been used by Currin et al. v) ≥ 0. They also gave many minimax and maximin distance designs. i. v. xn ) . · · · . (1990).1] is 2i − 1 . (1990) explored several connections between certain statistical and geometric properties of designs. measure how uniformly the experimental points are scattered through the domain. x1 ). Readers can ﬁnd some maximum entropy designs for two factors in Koehler and Owen (1996).e. w) + d(w. 2. They established equivalence of the maximin distance design criterion and an entropy criterion motivated by function prediction in a Bayesian setting. (1990)) With L1 -distance the minimax distance design on [0. d(x.3 Minimax and maximin distance criteria and their extension Let d(u.v∈D∗ (2. where s rij = exp − k=1 θk (xki − xkj )2 . (1991) to design experiments for which the motivating application is to ﬁnd an approximation of a complex deterministic computer model. among which the simplest are as follows: Example 10 (Johnson et al. max min d(u. · · · . D x∈T x∈T (2. u).29) is chosen.21) Minimax or maximin distance criteria. A minimax distance design D∗ minimizes the maximum distance between any x ∈ T and D. · · · .

where Ji is the number of pairs of points in the design separated by distance di . among the available designs. .4 Uniformity criterion If you choose a uniform measure as the criterion and U (n. J2 . . Interestingly. When the design domain is all U-type designs U (n. Jm ). For a given design with n runs sort the distinct inter-site distances by d1 < d2 < · · · < dm .4. LLC .22) where p is a positive integer. Their associate indices are denoted by (J1 . Morris and Mitchell (1995) pointed out that for the case in which n = 4m + 1 where m is a positive integer. q s ) as the design space. . m 1/p φp (Dn ) = i=1 Ji d−p i . (1b) minimizes J1 among all designs meeting this criterion. Deﬁnition 3) of orthogonal arrays such as Plackett-Burman designs (see Hedayat et al. (mb) minimizes Jm . Although this extended deﬁnition of Mm is intuitively appealing.(rectangular) or L2 -distance. · · · . This concept © 2006 by Taylor & Francis Group. our more elaborate deﬁnition for Mm optimality essentially only breaks ties among multiple designs which would be Mm (and of minimum index) by their deﬁnition. 2. it (1a) maximizes d1 among all designs meeting this criterion. practically it would be easier to use a scalar-value criterion. Morris and Mitchell (1995) pointed out “Because statements (1a) and (1b) alone specify the deﬁnition of Johnson et al. the induced designs (cf. This leads to the φp -criterion. i = 1. Morris and Mitchell (1995) proposed using a simulated annealing search algorithm to construct Mmlh’s. · · · . (2a) maximizes d2 among all designs meeting this criterion.1] is given by i . n n−1 with the maximin distance 1/(n − 1).Latin Hypercube Sampling and Its Modiﬁcations 65 with the minimum distance 1/2n. Morris and Mitchell (1995) extended the deﬁnition of the maximin distance criterion to propose that a design Dn is called a maximin design if. (1999)) are maximin designs for C 4m with respect either to L1 . the maximin LHD is denoted by MmLH. (ma) maximizes dm among all designs meeting this criterion. while the maximin distance design on [0. Mm means maximin. (1990) of a Mm design. ns ). the corresponding optimal designs are uniform designs. Chapter 4 gives a detailed discussion on various powerful optimization algorithms.” Here. (2b) minimizes J2 among all designs meeting this criterion. and Ji and di characterize the design Dn . (2.

Different measures of uniformity may lead to diﬀerent uniform designs.2 will introduce many existing measures of uniformity. As we will see.66 Design and Modeling for Computer Experiments will be deﬁned and discussed further in Chapter 3. Section 3. LLC . © 2006 by Taylor & Francis Group. however. uniform design was motivated by a quite diﬀerent philosophy from optimal LHD.

It involves various measures of uniformity. or uniform design for short. · · · . LLC .1 Intro duction The uniform experimental design.1) where D(Dn ) is the star discrepancy of the design Dn .. they needed to ﬁnd a way of experiment so that as much information as possible could be found using relatively few experimental points. and characteristics of the uniform design. The star discrepancy is a measure of uniformity and will be deﬁned in the next section. but each run required a day of calculation since the computer speed was so slow at that time. construction of symmetrical and asymmetrical uniform designs. With the Koksma-Hlawka inequality. The sample mean of y on Dn is denoted by y (Dn ) and the overall mean of y is E(y).3 Uniform Experimental Design This chapter gives a detailed introduction to uniform design. Let Dn = {x1 . Unlike Latin hypercube sampling the uniform design is motivated by a deterministic approach to the overall mean model mentioned in Section 1. one kind of space-ﬁlling design. Therefore. 67 © 2006 by Taylor & Francis Group. For instance. 3. The lower the star discrepancy. we should ﬁnd a design with the smallest star discrepancy. which we assume is the s-dimensional unit ¯ cube C s . so that uniformity would be violated and the sample mean would represent the population mean rather poorly. a uniform design (UD) with n runs and s input variables. an upper bound of the diﬀerence diﬀ-mean = |E(y) − y (Dn )| is given by ¯ diﬀ-mean = |E(y) − y (Dn )| ≤ V (f )D(Dn ). is a major kind of space-ﬁlling design that was proposed by Fang (1980) and Wang and Fang (1981) were involved in three large projects in system engineering in 1978. not depending on f . i. Obviously. if all of the points were clustered at one corner of the sphere. xn } be a set of n points in the experimental domain.e.5. the better uniformity the set of points has. and V (f ) is the total variation of the function f in the sense of Hardy and Krause (see Niederreiter (1992)). ¯ (3. the star discrepancy would be very large. In those projects the output of the system had to be obtained by solving a system diﬀerential equation.

Therefore. In this chapter we shall introduce many useful measures of uniformity. Fang and Hickernell (1995).2. Winker and Zhang (2000). and robustness. many construction methods. Fang and Wang (1994).3. for example. xks ). · · · .4 describes characteristics of uniform design. Liang. A reasonable measure of uniformity should be invariant under reordering the runs and relabeling the factors. k=1 (3. Fang (2002). xks ≤ xs }. the Latin square method. Fang. xn } be a set of n points in the s-dimensional unit cube C s .2 Measures of Uniformity In this section we shall introduce various measures of uniformity. n. and the cutting method. In this case q is much smaller than the number of runs. 3. FDn (x) = 1 n n I{xk1 ≤ x1 . have been proposed.68 Design and Modeling for Computer Experiments In fact. They can be used for generating symmetrical and asymmetrical uniform designs. These designs give the user more ﬂexibility as in many experiments. i. For construction of a uniform design with n runs and s factors. and Fang and Lin (2003). these are introduced in Section 3. Several papers gave comprehensive reviews on uniform design. such as the good lattice point method. 3. such as the centered L2 discrepancy. minimaxity. The last section of this chapter introduces several methods for construction of asymmetrical uniform designs. Lin. Often Dn is expressed as an n × s matrix.1 The star Lp -discrepancy Let F (x) be the uniform distribution on C s and FDn (x) be the empirical distribution of the design Dn . where xk = (xk1 . we may deﬁne more measures of uniformity that satisfy the KoksmaHlawka inequality. the number of levels for the factors has to be small. LLC . Two designs are called equivalent if one can be obtained from the other by relabeling the factors and/or reordering the runs. Optimization techniques have played an important role in constructing optimal Latin hypercube designs and uniform designs. it is an NP hard problem in the sense of computation complexity.5 presents a new construction method for uniform designs via resolvable balanced incomplete block designs. Chapter 4 will give a detailed introduction to various powerful optimization algorithms. Fang and Xu (2001). See. Section 3. Let Dn = {x1 . such as admissibility.e. and the categorical discrepancy.. The most attractive advantage of this method is that it does not require optimization computations for constructing UDs. Section 3. wrap-around L2 -discrepancy. and ﬁnd the related uniform designs. the expending orthogonal array method. · · · .2) © 2006 by Taylor & Francis Group. · · · .

(3. · · · . the star L2 -discrepancy is much easier to calculate numerically than the star Lp -discrepancy. LLC . let [0. xji )]. Obviously.[0.5) k=1 j=1 i=1 k=1 l=1 where xk = (xk1 .4) to give an approximation of the star discrepancy. x)) − Vol([0.3) Cs The following special cases have been used in the literature: (i) The star discrepancy: Let p → ∞ in (3. xs ) and I{A} = 1 if A occurs. where p > 0. The star Lp -discrepancy has been widely used in quasi-Monte Carlo methods (Hua and Wang (1981) and Niederreiter (1992)) with diﬀerent statements of the deﬁnition. · · · .1).x)) been shown that the star L2 -discrepancy ignores diﬀerences | N (Dnn − Vol([0. (3. xs ) in C s . x)) − Vol([0. The star discrepancy has played an important role in quasi-Monte Carlo methods and statistics.3) and we have D(Dn ) ≡ lim Dp (Dn ) = max s p→∞ x∈C N (Dn . For each x = (x1 . [0. However. s (3. [0. x)) be the number . (ii) The star L2 -discrepancy: Warnock (1972) gave an analytic formula for calculating the star L2 -discrepancy: (D2 (Dn ))2 = =3 −s Cs s N (Dn . The ratio N (Dnn volume of the rectangular [0. · · · . The Lp -norm average of all the discrepancies over C s is just the star Lp -discrepancy and is given by Dp (Dn ) = N (Dn . The diﬀerence between the above ratio and volume N (Dn . The star Lp -discrepancy is the Lp -norm of ||FDn (x) − F (x)||p . x))| on any low-dimensional subspace and is not suitable for design of computer experiments. An algorithm for exact calculation of the star discrepancy in small dimensions is given by Bundschuh and Zhu (1993). © 2006 by Taylor & Francis Group. x). x)). or 0 otherwise. x)) n p 1/p . denoted by Vol([0. [0. [0. x)) n (1 − x2 ) kl 1 + 2 n n n s 2 − 2 1−s n n [1 − max(xki . xks ).4) The star discrepancy was ﬁrst suggested by Weyl (1916) and is the KolmogorovSmirnov statistic in goodness-of-ﬁt testing. x).Uniform Experimental Design 69 where x = (x1 . x1 ) × · · · × [0. [0. x)) x∈C n = max |FDn (x) − F (x)|. x)) − Vol([0. x) = [0. Section 4.x)) should be close to the of points of Dn falling in [0. xs ) be the rectangle determined by 0 and x and let N (Dn . x)) n is called the discrepancy at point x. but it is not easy to compute.[0. it has . Winker and Fang (1997) employed the threshold accepting method (cf. x)) − V ol([0.2. if points in Dn are uniformly scattered on C s (see Figure 3.

[0.1 Discrepancy and data rotation 3. Observe the set of points in Figure 3. A measure of uniformity is reasonable if it satisﬁes the following requirements: [C1] It is invariant under permuting factors and/or runs. It is known that the lower dimensional structure of a design is more important. The star discrepancy is not sensitive enough. [C4] There is some reasonable geometric meaning. · · · . Lin.1(a) with the star discrepancy D = 0.2). s}. from the so-called hierarchical ordering principle: lower-order eﬀects are more likely to be important than higher-order eﬀects. [C3] It can measure not only uniformity of P over C s . [C2] It is invariant under rotation of the coordinates. the star discrepancy of the set in Figure 3. main eﬀects are more likely to be important than interactions.19355. x))| on any low-dimensional subspace. the Lp -discrepancy is not invariant under rotating coordinates of the points as the origin 0 plays a special role. Winker and Zhang (2000) found that both the star discrepancy and star L2 -discrepancy are unsuitable for searching UDs. we have to put more requirements on the measure of uniformity. LLC . Section 1. Therefore.70 Design and Modeling for Computer Experiments FIGURE 3.x)) ences | N (Dnn − Vol([0.2 Modiﬁed L2 -discrepancy Fang.16687. Moreover. After rotation clockwise 900 . and eﬀects of the same order are equally likely to be important (cf.1(b) becomes D = 0. where u is a non-empty subset of {1.2. © 2006 by Taylor & Francis Group. while the star Lp -discrepancy (p < ∞) ignores diﬀer. but also the projection uniformity of P over C u .

0105. respectively. 2s cells).7) k=1 j =1 i=1 1 1 1 1 + |xki − 0.2 gives an illustration of Jx for two-dimensional case. Consider a point x ∈ C s and its corresponding rectangle Jx depends on the cell in which x falls.3 are 0. the more uniform. There is a summation for u = ∅ in (3. The CD-value of the two designs D6−1 and D6−2 in Section 2.0081 and 0. s}. LLC . 2 2 2 For comparing designs. C u is the |u|-dimensional unit cube involving the coordinates in u. n 2 (3. Dnu is the projection of Dn on C u . Obviously. Figure 3. the central point ( 1 . Jxu ) − Vol(Jxu ) du.5|2 2 2 (3. thus. then. relabeling factors.6).6) where u is a non-empty subset of the set of coordinate indices S = {1. Four cases in two-dimensional case are demonstrated in Figure 3. Furthermore. the new discrepancy measure considers discrepancy between the empirical distribution and the volume of Jxu in any low dimensional rectangles Jxu . 1 ) 2 2 and others.2.3 The centered discrepancy The centered L2 -discrepancy is considered due to the appealing property that it becomes invariant under reordering the runs. and in that sense more desirable. The choice of Ju ensures invariance of rotating coordinates. [C7] It is consistent with other criteria in experimental design. Obviously. [C6] It satisﬁes the Koksma-Hlawka-like inequality.Uniform Experimental Design 71 [C5] It is easy to compute. and Jxu is the projection of Jx on C u . © 2006 by Taylor & Francis Group.5| − |xkj − 0. the lower the CD-value is. the CD considers the uniformity not only of Dn over C s . Corner points of each cell involve one corner point of the original unit cube. or reﬂecting the points about any plane passing through the center of the unit cube and parallel to its faces.5| + |xji − 0.2. · · · . 3. among which the centered L2 -discrepancy (CD) has good properties. The latter is equivalent to the invariance under coordinate rotations. several modiﬁed Lp -discrepancies were proposed by Hickernell (1998a). The CD and some other L2 -discrepancies are deﬁned by (Dmodif ied (Dn ))2 = u=∅ Cu N (Dnu . CD has a formula for computation as follows: (CD(Dn ))2 = + 1 n2 n n s 13 12 s − 2 n n s k=1 j =1 1 1 1 + |xkj − 0. the design is. the design D6−1 is better. but also of all the projection uniformity of Dn over C u . The unit cube is split into four square cells (in general. Jx is a rectangle uniquely determined by x and is chosen with the geometric consideration for satisfying [C2]. |u| denotes the cardinality of u.5| − |xki − xji | . Therefore.

5 x 0.0 0. 3.0 x 0.0 x 0.0 1. If we want the deﬁned discrepancy to have the property [C2].3 gives an illustration.2 Illustration for centered discrepancy.2. As a consequence the rectangle Jx1 . In this case each rectangle is determined by two points x1 and x2 in C s .0 (c) (d) FIGURE 3.0 (a) (b) 1. is another modiﬁed L2 -discrepancy and has nice properties. x2 .5 1.3(a)).0 0. This consideration leads to the so-called unanchored discrepancy in the literature.5 1. we should allow wrapping the unit cube for each coordinate.5 1.0 x 0. denoted by Jx1 . The wrap-around L2 -discrepancy (WD).5 0. proposed also by Hickernell (1998b). The corresponding discrepancy is called the wrap-around discrepancy. LLC .5 1.72 Design and Modeling for Computer Experiments 1.5 0. Its analytical © 2006 by Taylor & Francis Group.4 The wrap-around discrepancy Both the star Lp -discrepancy and the centered L2 -discrepancy require the corner points of the rectangle Jx to involve one corner point of the unit cube. Figure 3.5 1. A natural idea allows that the rectangle J falls inside of the unit cube and does not involve any corner point of the unit cube (see Figure 3. x2 can be split into two parts.

8) 2 The WD-values of designs D6−1 and D6−2 are 4.2. the design D6−1 is better in the sense of lower W D-value. (3.2)). LLC . Deﬁnition 3 in Section 2. · · · . 3.8421. (3.0 0.7 will show that these two discrepancies also have close relationships with many criteria. The categorical discrepancy discussed in Section 3.5 1.5 1. Section 3. It does not cause a serious problem if the reader decides to skip this section. For a U-type design U .Uniform Experimental Design formula is given by (W D(P))2 = 4 3 s 73 + 1 n2 n s k. M denotes a measure of uniformity deﬁned on C s and M denotes the space of all signed measures M .5 A uniﬁed deﬁnition of discrepancy All the discrepancies mentioned so far can be deﬁned by the concept of the reproducing kernel of a Hilbert space.5 1.0 1. L2 -discrepancy.6 is deﬁned in terms of the reproducing kernel of a Hilbert space. or other measure of uniformity discussed in the literature. we always deﬁne M (U ) = M (DU ).5 0. respectively.j =1 i=1 3 − |xki − xji |(1 − |xki − xji |) . Hickernell (1998a) proved that the centered L2 -discrepancy and wrap-around L2 -discrepancy satisfy requirements [C1]–[C6]. where DU is the induced design of U (cf. such as resolution and minimum aberration. 1.5 1. WD.2. The reader who skips this section only needs to be aware of formula (3. in fractional factorial designs.8137 and 4. Let Dn = {x1 . It may be diﬃcult to understand for nonmathematicians/statisticians. The measure M can be the star discrepancy.0 (a) (b) (c) FIGURE 3.0 0.2. This approach can be easily used to develop new theory and to propose new discrepancies. Again.5 0.0 0. From now on.3 Illustration for centered discrepancy with wrap. xn } be a design in the canonical experimental domain C s and FDn (x) be its empirical distribution (cf.1).13). CD. Let F (x) be the © 2006 by Taylor & Francis Group.0 0.

When the L∞ -norm is chosen. xj ) > 0. H ∈ M. w) be a symmetric and positive deﬁnite function on T × T . w) = K(w. w ∈ T. The discrepancies mentioned in this section can be deﬁned by a norm ||FDn (x) − F (x)||M with a diﬀerent deﬁnition.. then the kernel K is called a reproducing kernel. F is the uniform distribution on C s . (3. we have the star discrepancy. f ]K .j=1 ai aj K(xi . Let W be a Hilbert space of real-valued functions on T . K) = ||F − FDn ||2 K = C s ×C s K(x. w)dF (x)dF (w) = C s ×C s n − 2 n K(x. xi )dF (x) + i=1 1 n2 n K(xi . for any x. w)d(F − FDn )(x)d(F − FDn )(w) K(x. Saitoh (1988) and Wahba (1990)) is a very useful tool for this purpose. and x ∈ T. w)dG(x)dG(w). w)dG(x)dG(w) < ∞. In fact. ·]K is deﬁned with the kernel K by [G.j=1 (3. for all f ∈ W. x).and wrap-around L2 -discrepancy can be expressed as a norm of the diﬀerence FDn (x)−F (x). · · · .10) © 2006 by Taylor & Francis Group. n i. the centered L2 . LLC . A norm of ||FDn (x)−F (x)||M under the measure M measures diﬀerence between the empirical distribution and the uniform distribution. i. but we need some tools in functional analysis. n. for any G. w)dG(x)dH(w). i = 1. K(x. Let T be the experimental domain in Rs and K(x. H]K = T ×T K(x. T ×T When T is the unit cube C s . A norm of G under the kernel K is deﬁned by ||G||K = K(x.. If the kernel K satisﬁes f (x) = [K(. xj ). for any ai ∈ R.9) and ai cannot all equal zero. and G is the empirical distribution of Dn . when the Lp -norm is used. i. where M is the space of all signed measures G on T such that T ×T K(x. it leads to the Lp -discrepancy. x). An inner product [·.e.74 Design and Modeling for Computer Experiments uniform distribution on C s . xi ∈ T. The concept of a reproducing kernel (cf. the L2 -discrepancy with respect to a given K is deﬁned by D2 (Dn .

(3. More precisely. By this uniﬁed approach Hickernell (1998a. (3. the CD and WD have the respective kernels s Kc (x. qj }. 3.b) proposed many new discrepancies. However. ws ). b if x = w. i.12) then Kd (x. for any x.Uniform Experimental Design 75 Diﬀerent kernels imply diﬀerent measures of uniformity. . · · · . . w ∈ T. Choosing Kj (x. 2 2 2 and s Kw (x. Fang. discrepancy measures are also available for categorical factors.5| + |wi − 0. . (3. ..9). For example.2.6 Descrepancy for categorical factors The discrepancy measures mentioned so far have assumed that the factors were continuous. q1 } × · · · × {1.11) Kd (x. The next section discusses another discrepancy. let T = {1. 3 where x = (x1 . . wj ). LLC . qs } comprising all possible level combinations of the s factors and let F be the discrete uniform distribution on T . w) = i=1 2 − |xi − wi | + |xi − wi |2 . the possible design points can be considered as points in a discrete lattice rather than in a continuous space. w) is a kernel function and satisﬁes conditions in (3. a > b > 0. . . xs ) and w = (w1 . · · · . Here. the deﬁnition of the total variation of the function f depends on the measure M . Lin and Liu (2003) gave a computational formula for categorical discrepancy and the lower bound of categorical discrepancy for any U-type design as follows: © 2006 by Taylor & Francis Group.e. w ∈ {1. . w) = and s a if x = w. The corresponding discrepancy. . . w) = i=1 1 1 1 1 + |xi − 0. including the CD and WD we have mentioned.5| − |xi − wi | . . . for x. w) = j=1 Kj (xj . All the discrepancies deﬁned in this way satisfy the Koksma-Hlawka type inequality (cf. was suggested by Hickernell and Liu (2002). In this case we assume that the experimental domain is all possible combinations of factor levels.1)). Lattice points have been widely considered in the construction of LHDs and UDs. called categorical discrepancy or discrete discrepancy.

qj (3. The categorical discrepancy and its lower bound in Theorem 2 play an important role in the connection between combinatorial designs and uniform designs. Dn ∈ U (n.14) a n−1 s a + b n n b s − j=1 a + (qj − 1)b . 3.5 and 3.15) where γ = [λ] is the integer part of λ.l=k a b λ λkl s − j=1 a + (qj − 1)b .l=1. in the past decade many authors found links between the measures of uniformity and some criteria in factorial designs and supersaturated designs. However.14) can be achieved s if and only if λ = ( j=1 n/qj − s)/(n − 1) is a positive integer and all the λkl s for k = l are equal to λ. The categorical discrepancy is essentially useful for two. × qs ).15) can be achieved if and only if all λij s take the same value γ. while the measure of uniformity focuses on geometrical consideration of the experimental points.2 raises seven requirements for a good measure of uniformity.14) can be improved into D2 (Dn . LLC . and the lower bound on the right-hand side of (3. It appears at ﬁrst that they are unrelated.7 Applications of uniformity in experimental designs Section 3. .13) qj (3. q1 × .2.or three-level designs.6. Remark: The lower bound in (3.76 Design and Modeling for Computer Experiments Theorem 2 For a U-type design. we have D2 (Dn . A comprehensive discussion on the categorical discrepancy and relationships between the categorical discrepancy and centered/wrap-around discrepancy is given by Qin and Fang (2004). As a result. Kd ) = ≥ bs as + 2 n n s n k. many applications of the © 2006 by Taylor & Francis Group.2. The existing criteria in factorial designs are related to statistical inference. we have λ = s(n − q)/q(n − 1) and the above lower bound becomes as n−1 s a + b n n b λ − a + (q − 1)b q s . We will introduce this connection and its application to the construction of uniform designs in Sections 3. qj where λkl is the number of coincidences between the the kth and lth columns of Dn . The requirement [C7] speciﬁes that the measure of uniformity should be consistent with other criteria for experimental design. or take only two values γ and γ + 1. . (3. The lower bound on the right-hand side of (3. Kd ) ≥ n−1 s as a + b (γ + 1 − λ) n n b s γ + (λ − γ) a b γ+1 − j=1 a + (qj − 1)b . For the symmetrical designs q1 = · · · = qs = q.

that there are at least 382 pairwise inequivalent Hadamard matrices of order 36. we have CDn. s factors each having q levels are called isomorphic if one can be obtained from the other by relabeling the factors. resolution and minimum aberration.Uniform Experimental Design 77 concept of uniformity to factorial designs have been explored. Fang and Mukerjee (2000) found an analytic connection between two apparently unrelated areas. reordering the runs. They conjectured that any orthogonal design is a uniform design under a certain discrepancy.q (d2 ). Fang and Ge (2004) proposed an algorithm based on uniformity for detecting inequivalent Hadamard matrices. © 2006 by Taylor & Francis Group. it is easy to check whether two isomorphic factorial designs have the same discrepancy. Uniformity and orthogonality: Fang and Winker (1998) and Fang.m. Two Hadamard matrices are called equivalent if one can be obtained from the other by some sequence of row and column permutations and negations. A. Fang and Lin (2002) proved that the conjecture is true in many cases. Uniformity and aberration: The word length pattern for a fractional factorial design is one of the most important criteria for describing the estimation ability for the main eﬀects and interactions of the factors. they are not isomorphic. Using the above facts Ma. Winker and Zhang (2000) found that many existing orthogonal designs (see Section 1. and Fang and Ma (2002). but it is not always true. Analogously to the word length pattern. Later. the uniformity and the aberration.m. for example. This connection has opened a new research direction and inspired much further work. Fang and Lin (2001) proposed a powerful algorithm based on uniformity that can easily detect non-isomorphic designs.1) have played an important role in experimental design and coding theory. This section gives a brief review. i = 1. If d1 and d2 are isomorphic.2 for the deﬁnition) are of uniform design under the centered L2 -discrepancy. For the details the reader can refer to Fang (2001) and Fang and Lin (2003). denoted by CDn. we can deﬁne the so-called uniformity pattern. for example.m. LLC . However. and they discovered. Ma and Mukerjee (2001). Uniformity and isomorphism: Two factorial designs with n runs. For identifying two such designs a complete search must be done to compare n!(q!)s s! designs and it is an NP hard problem. 2. or switching the levels of one or more factors. B. C. Hadamard matrices (see Section A. Ma. Tang (2005) provided some further discussion.q (d1 ) = CDn. Fang. the same CD-value. If two designs have diﬀerent CD-values. Ma and Mukerjee (2001) also describe some relationships between orthogonality and uniformity. Fang and Qin (2004a) gave such a version for the two-level designs and Hickernell and Liu (2002) proposed an alternative version. For a given m < s and two designs d1 s and d2 there are m subdesigns whose CD-values form a distribution. in regular fractions of two-level factorials. Fang. For further discussions the reader can refer to Ma and Fang (2001). Lin. Based on the word length pattern two criteria. have been widely used in the literature as well as in applications.q (di ).

In this section we choose the CD as the measure of uniformity for illustration. Uniform designs with mixed levels will be considered in Section 3. Classical orthogonality. Wu (1993).5. Let D be the design space. proposed criteria for comparing and constructing supersaturated designs. a subset of U (n.3. and uniformity criteria are uniﬁed by choosing combinatorial and exponential kernels. q s ). uniformity has been recognized as a useful criterion and many ways have been suggested for constructing multi-level supersaturated designs under uniformity criteria.3. Ge. Fang. Liu and Hickernell (2002). A design Dn ∈ D is called a uniform design under the measure of uniformity M . and uniform designs are all based on U-type designs.6. Lin and Liu (2003). including Lin (1993). if it minimizes the M -value of the design over D. and the construction of Un (q s ) will be discussed in Section 3. Cheng (1997). and Fang.2–3. Uniformity and supersaturated design: Supersaturated designs introduced in Section 1.3. Recently.3 Construction of Uniform Designs Consider the canonical experimental domain C s .2.1] for one-factor experiments. which includes a stringent criterion of majorization via pairwise coincidences and ﬂexible surrogates via convex functions. This approach links the three kinds of designs.1 gives Un (n1 ) for the one-factor case. Ma and Winker (2000) pointed out that the set of equidistant points is the unique UD on [0. 3. and Yamada and Lin (1999). A uniform design of n runs and s factors each having q levels is denoted by Un (q s ). Sections 3. For detailed discussions the reader can refer to Fang. more precisely. E. Majorization framework: Note that fractional designs.1 One-factor uniform designs Fang. 3. Many authors. © 2006 by Taylor & Francis Group. aberration. The theory is based on the so-called majorization theory that has been widely used in various theoretic approaches. Lin (1999). supersaturated designs. They present a general majorization framework for assessing designs. Fang. and would help a user who is already familiar with factorial and/or supersaturated designs feel more comfortable using uniform design.6 introduce several methods for constructing Un (ns ). Liu and Qin (2004). Zhang. Li and Wu (1997).3. Lin and Ma (2000). Li and Sudjianto (2005) proposed a general theory for all of these designs. LLC .78 Design and Modeling for Computer Experiments D. Section 3. Lin (1995). In this chapter the design space is a subset of U-type designs.

A design of n runs each belonging to Ln. From the above discussion the design space can be changed as follows: Set of U (n.s is called a uniform design if it has the smallest CD-value among all such designs. Section 3.··· . 79 (3. 2n 3. 2n . the design space Ln. 2n 2n 2n 2 with CD2 = 1 12n2 . Ma and Winker (2000) gave some justiﬁcation of Fang-Hickernell’s suggestion. even for moderate values of n and s. Now the canonical experimental domain becomes Ln. In fact. Therefore.4). n}. we want to ﬁnd a design Dn ∈ U (n. where each experimental point belongs to Ln.16) In fact.s is just the class of U-type designs U (n.3. ns ) 2k−1 with n points ⇒ with n runs ⇒ with entries 2n ⇔ with entries on C s on Ln. the unique UD on [0. · · · . n k = 1. ns ) such that it has the smallest CD-value on U (n. From Theorem 3 the lattice points in C s are suggested.1] is 1 3 2n − 1 .2 Symmetrical uniform designs The following facts are noted in construction of multi-factor UDs: • The UD is not unique when s > 1. ns ).s (there are nn points to form the © 2006 by Taylor & Francis Group. ns ) as the design space. Ma and Winker (2000) give a detailed discussion on this point. i = 1. For the reduction of the computer computation complexity. We deﬁne CD(U ) = CD(DU ) (cf. · · · . · · · .16). Ln. · · · .s is the lattice point set where each marginal design is the uniform design (3. ns ) Set of designs Set of designs Set of U (n. Fang. 2n 2n 2n s . its induced design DU is a U-type design with entries {(2i − 1)/2n. 2n . ns ). Ln. that is.··· . . For any symmetrical design U ∈ U (n. From Theorem 3 in this section. LLC . Fang and Hickernell (1995) suggested using the U-type design set U (n.s 1.1 of Fang and Wang (1994)).Uniform Experimental Design Theorem 3 For a one-factor experiment.2). Section 3. We do not distinguish between equivalent designs (cf. • Finding a uniform design is a complex task and can be computationally intractable. the assertion of the above theorem holds if the measure of uniformity is the star discrepancy (see Example 1. ns ). or many others. we can narrow down the design space. Here.2.s in the next section. where T = 1 3 2n−1 s . n The ﬁrst step reduces the design space D into a lattice point set.s = 1 3 2n − 1 . Fang. .

13. h2 .80 Design and Modeling for Computer Experiments experimental domain). Deﬁnition 6 A design in U (n. and the third step changes the design space into U (n. generate an n × s matrix U = (uij ) where uij = ihj (mod n) and the multiplication operation modulo n is modiﬁed as 1 ≤ uij ≤ n. 2.b). searching for a uniform design for moderate (n.s with respect to the generating vector h. The design space 2q s U (n.s that minimizes the CD-value over the set GLPn. 2q−1 . hs ) is called the generating vector of the U .3. To further reducing the computational complexity for searching UDs based on U-type designs. A good approximation to uniform design is also called a nearly uniform design (Fang and Hickernell (1995)). and discussed by many authors such as Hua and Wang (1981). Denote U by U (n. 5. · · · .16). Example 11 For n = 21 and s = 2. The remaining of the section focuses on construction of UDs with q = n. 16. · · · . In the literature a good nearly uniform design is just simply called a uniform design. LLC . Step 3. ns ) with entries in (3. 20}. 4. where h = (h1 . Shaw (1988). For any s distinct elements of Hn . the expending orthogonal array method. However. the cutting method. Joe and Sloan (1994). there are several methods such as the good lattice method. ns ) with entries 1. Find the candidate set of positive integers for given n Hn = {h : h < n. hs . the great common divisor of n and h is one}.3 Good lattice point method The good lattice point (glp) method is an eﬃcient quasi-Monte Carlo method. In this case the marginal 1 3 experimental points for each factor are 2q . proposed by Korobov (1959a. n. h1 . the design space can be chosen as U (n. · · · . we have H21 = {1. Find a design U(n. the second step further reduces the design space into U (n. · · · . 19. Denote by GLPn. q s ) is called a uniform design if it has the smallest CD-value on the set of U (n. 7. For the rest of the book. and the optimization method that can provide a good approximation to the uniform design. s) is still computationally costly.s the set of all such matrices U (n. h). 2q . q s ). h∗ ) is a (nearly) uniform design Un (ns ). where q is a divisor of n. the Latin square method. n ) is its special case of q = n. © 2006 by Taylor & Francis Group. 11. 8. 10. h). q s ). h∗ ) ∈ GLPn. 17. 3. Step 2. The design U(n. More generally. 15. We write it as Un (q s ). the uniform design is understood as in the following deﬁnition. and Fang and Wang (1994). Algorithm GLP Step 1.

The cardinality of Hn : Obviously. If the CD is chosen as the measure of uniformity. pt are t 1 distinct primes and r1 . n for any Dn in D. · · · . as a consequence there are 66 = 12 candi2 date designs in GLP21. For example. let Pn.e. where p1 . Let n = pr1 · · · prt be the prime decomposition of n. It is impossible to implement such comparisons to ﬁnd the best generating vector. From number theory (Hua and Wang (1981)) the cardinality of Hn is just the Euler function φ(n). k 2 .s . s). h ) ∈ GLPn. i. the design matrix U(21.2 and is a (nearly) uniform design U21 (212 ) given as follows: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 13 5 18 10 2 15 7 20 12 4 17 9 1 14 6 19 11 3 16 8 21 The generating vector (1. © 2006 by Taylor & Francis Group. · · · . 13) has the smallest D-value over GLP21. and φ(n) < n/2 if n is even.s = {k : k. The generating vector domain power Gn. rt are positive integers. D(Dn ) ≥ c(s) (logn)s−1 .s = {hk = (1. Therefore. k. the power generator is recommended. A vector in Gn. k2 . The maximum number of factors is less than n/2 + 1 if the UD is generated by the glp method. For given (n. · · · . For example.s } power is suggested. The power generator: The glp method with the best generating vector can be applied only for the moderate (φ(n). φ(n) = n − 1 if n is a prime. for the case of n = 31 and s = 14 there are 30 ≈ 3 × 1021 generating vectors to compare as 14 φ(31) = 30. Then φ(n) = n(1 − 1 1 ) · · · (1 − ). the computation complexity of using the glp method depends on the cardinality of Hn . 1 ≤ k ≤ n − 1}. It has been shown that uniform designs generated by the glp method with a power generator have a lower discrepancy (Hua and Wang (1981)).Uniform Experimental Design 81 There are 12 integers in H21 . It is conjectured that the best possible order of the star discrepancy of Dn is n−1 (log(n))s−1 .s with a power generator is denoted by GLPn.s s (nearly) uniform design Un (n ).208 of Fang and Wang (1994)).s is a U(n.2 .s is called a power generator. Further. LLC . The set of power designs in GLPn. k ∈ Pn. A design ∗ power power minimizing the CD-value over the set GLPn. B. · · · . p1 pt .3) is called the best generating vector. s). k s−1 (mod n) are distinct. k s−1 )(mod n). The number of columns of U-type designs generated by the glp method is limited to φ(n)/2 + 1 or φ(n + 1)/2 + 1 (p. A. h∗ ) with h∗ = (1. φ(21) = 21(1 − 1 )(1 − 1 ) = 12 as 21 = 3 · 7 is the prime 3 7 decomposition of 21.

5}. GLPn.82 Design and Modeling for Computer Experiments where the constant c(s) depends on s only. say ∗ U . 5. 6}. Wang and Fang (2005) proposed a novel approach to the construction of uniform designs by the glp method. That is why the glp method has been appreciated by many authors. The left plot in Figure 2. 2. The number of elements in H6 = {1. h1 . which is also calculated by Dr. see Example 12 below. hs ) to form an n × s matrix.5). LLC . 5} is 2. The best order of the star discrepancy of Dn generated by the glp method is O(n−1 (logn)s−1 ). Example 12 Consider constructing a UD with 6 runs and 2 factors.2. s ≤ 6 can be generated based on H7 . Theoretical results show that the wrap-around discrepancy of such uniform design has the optimal order of convergence. Note H7 = {1. Modiﬁcation of the glp method: When the cardinality of Hn is too small. 3. the power generator can be applied to this modiﬁcation.s .s . In Step 3. C. Ma. we work on Hn+1 in Steps 1 and 2 of Algorithm GLP and delete the last row of U(n + 1.5 gives plots for the above designs. gives the generating vector and CD-value of UDs by the use of this modiﬁed glp method.s ∗ is replaced by GLPn. 4. which can be computed rapidly by a simple formula if the design is from the glp method. One modiﬁcation was suggested by Wang and Fang (1981) and Fang and Li (1994). and is an open problem for high dimensional cases. The nearly uniform design U6 (62 ) is not uniform as can be seen from its plot (see the right plot in Figure 2. They use the wrap-around discrepancy as the measure of uniformity. which is slightly worse than the previous one. 5 and was computed by Dr.1 gives the generating vector and CD-value of UD’s for 4 ≤ n ≤ 31 and s = 2. The conjecture is true for s = 2.X. it is impossible to obtain U6 (6s ) for s > 2 based on H6 = {1. All nearly uniform designs U6 (6s ). Empirical studies also show that the forward © 2006 by Taylor & Francis Group. Ma. The cost of searching the generating vector with n points for all dimensions up to s requires approximately O(nφ(n)s2 ) operations where φ(n) is the Euler function of n. Besides. Table 3. All such matrices U form a set. where the generating vector is found by a forward procedure. 4. The theoretical ﬁndings are supported by empirical evidence. denoted by GLPn. C. When both n and s are large. Clearly.X. Most of UDs used in the past are generated by the glp method with a power generating vector. Table 3. · · · . In this case some modiﬁcations of the glp method are necessary. 3. the nearly uniform design obtained by the glp method may be far from the uniform design. and the best order of the star discrepancy of Dn generated by the glp method with a power generator is O(n−1 (logn)s−1 log(log n)). Instead of Hn . C.

0713 (1 5) 0.1628 (1 5 7 11) 0.0381 (1 7) 0.0455 (1 11 17) 0.0849 (1 6 13 20 27) 0.0997 (1 5 9 13) 0.0456 (1 11 17) 0.0432 (1 9) 0.1023 (1 5) 0.1749 (1 3 7 9) 0.1422 (1 5 7 13 23) 0.0240 (1 16) 0.0480 (1 7 18) 0.1336 (1 2 3) 0.0808 (1 5 7 13) 0.0614 (1 3) 0.0703 (1 9 11 15) 0.1112 (1 5 7) 0.1762 (1 2 3) 0.0357 (1 18 24) 0.1191 (1 4 5 11) 0.0670 (1 8 20 22) 0.0272 (1 9) 0.1044 (1 4 7) 0.0676 (1 5 9) 0.0946 (1 6 11 16 21) 0.2490 (1 2 3 4) s=3 s=4 s=5 © 2006 by Taylor & Francis Group.0506 (1 5) 0.1350 (1 3) 0.1418 (1 3 5 13) 0.1186 (1 4 7 13) 0.1653 (1 3 4 5 11) 0.1125 (1 2) 0.0437 (1 6 16) 0.0292 (1 13) 0.0583 (1 6 14 22) 0.1013 (1 8 20 22 23) 0.0710 (1 9 11) 0.0796 (1 4 6) 0.0679 (1 11 17 19) 0.1784 (1 5 7 11 13) 0.0420 (1 8 10) 0.0321 (1 8) 0.1364 (1 2 5 7) 0.1088 (1 4 7 17 18) 0.0901 (1 7 16 20 24) 0.0341 (1 7) 0.0214 (1 18) 0.0400 (1 11) 0.0606 (1 8 17 18) 0.0700 (1 5 11 17) 0.0362 (1 10) 0.0672 (1 6 11 16) 0.0563 (1 9 13) 0.2435 (1 2 4 5 7) 0.0472 (1 5) 0. LLC .2021 (1 3 5 9 11) 0.1301 (1 7 11 13 29) 0.0248 (1 11) 0.0626 (1 4 10) 0.0446 (1 9 11) 0.0835 (1 9 13 17) 0.1580 (1 3 7 11 19) 0.0381 (1 9 17) 0.1384 (1 4 10 14 15) 0.0528 (1 7) 0.1919 (1 2 3 5 7) 0.0258 (1 17) 0.0559 (1 17 19 23) 0.0560 (1 6 8) 0.1751 (1 2 4 7) 0.1904 (1 3 5 7) 0.0284 (1 13) 0.0369 (1 17 19) 0.0570 (1 4 5) 0.0857 (1 6 8 14) 0.0812 (1 4 7) 0.0745 (1 7 18 20) 0.1157 (1 3 5 7 13) 0.1666 (1 3 5 9 13) 0.1310 (1 4 10 13 16) 0.Uniform Experimental Design 83 TABLE 3.0650 (1 4) 0.0529 (1 5 13) 0.1049 (1 3 5 11 17) 0.0812 (1 3) 0.1993 (1 2 3 5) 0.2729 (1 2 3 4 5) 0.1228 (1 6 8 14 15) 0.1247 (1 5 7 13) 0.0238 (1 11) 0.0234 (1 11) 0.1197 (1 3 7) 0.0917 (1 5 8 19) 0.1904 (1 2 4 7 13) 0.0313 (1 9) 0.1 Generating Vector and CD-Value of UDs by the glp Method n 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 s=2 0.0993 (1 9 11 15 23) 0.0958 (1 4 5 14) 0.0206 (1 19) 0.0206 (1 22) 0.0580 (1 7 13) 0.0879 (1 5 7) 0.1315 (1 3 5) 0.

0284 (1 8) 0.0570(1 7 9 15) 0.0292 (1 5) 0.0492(1 5 7) 0.0496(1 3 5) 0.0615 (1 3) 0.0268 (1 7) 0.1050(1 3 5 7 9) 0.0714(1 5 7 11) 0.1914(1 2 4 5) 0.0906(1 3 5 11 13) 0.0544(1 5 7) 0.0637(1 3 7) 0.1392 (1 5) 0.0902 (1 2) 0.1359(1 5 7 11) 0.1397(1 2 3 4) 0.0592(1 4 13 14) 0.0385(1 8 10) 0.0559(1 8 9 12) 0.2194(1 3 5 7) 0.1150(1 2 4) 0.0660(1 4 6 9) 0.0225 (1 8) 0.1075(1 3 4 5 7) 0.1656(1 2 3 4 5) 0.1022(1 3 5 9 11) 0.0742(1 3 5 7) 0.0318 (1 7) 0.0576 (1 3) 0.0385(1 7 9) 0.0901(1 4 6 9 11) 0.0855(1 4 5 6 14) 0.0445 (1 4) 0.84 Design and Modeling for Computer Experiments TABLE 3.0194 (1 12) 0.1568(1 3 5 9) 0.0251 (1 5) 0.0497 (1 5) 0.0534(1 3 5) 0.1295(1 2 5 6 8) 0.0339 (1 5) 0.1371(1 7 11 13 17) 0.0790(1 3 5) 0.0965(1 2 3) 0.2616(1 2 4 5 7) 0.0209 (1 11) 0.0496(1 3 7) 0.0333 (1 9) 0.0538(1 4 5) 0.0937(1 3 4 5) 0.0387 (1 7) 0.0920(1 3 7 9) 0.0357(1 8 12) 0.0737(1 3 5 11) 0.2033(1 3 7 9) 0.0492(1 4 10) 0.0357 (1 5) 0.2140(1 2 3 4) 0.1407(1 3 7) 0.0247 (1 7) 0.1305(1 5 7) 0.0764(1 4 5 7) 0.1275 (1 2) 0.1951(1 5 7 11 13) 0.1724(1 3 5 7 9) 0.1270(1 2 4 7) 0.1382(1 2 4 5 8) 0.0521(1 7 11 13) 0.2083(1 2 3) s=4 0.0641(1 3 5) 0.1365(1 2 3) 0.0782(1 3 4) 0.0738 (1 2) 0. LLC .0201 (1 7) 0.0889(1 2 4) 0.1512(1 5 7 11 13) 0.1103(1 3 5 7 9) 0.1406(1 2 3 5 8) 0.1858(1 2 3 4 5) 0.0456 (1 5) 0.1692(1 3 7 9 11) 0.2 The Generating Vector and CD-Value of UDs by the Modiﬁed glp Method n 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 s=2 0.2029(1 2 4 7 8) 0.0725(1 4 5 7) 0.0973(1 2 5 8) 0.0763 (1 3) 0.1211(1 2 3 5) 0.0245 (1 7) 0.0954(1 3 5 7) 0.1088(1 2 5 7 8) 0.0892(1 4 5 7 13) 0.0485 (1 3) 0.2211(1 3 5 9 11) 0.1892(1 5 7 11) 0.0208 (1 12) 0.0263 (1 7) 0.0353(1 7 9) 0.0916(1 3 4 5) 0.0811(1 3 5 7) 0.2881(1 2 3 4 5) s=3 0.0665(1 3 5) 0.2858(1 2 3 4) s=5 © 2006 by Taylor & Francis Group.0519(1 7 8) 0.1539(1 3 5) 0.0351(1 7 11) 0.0442(1 4 11) 0.

to form a U-type design U (n. i = 1. 2. An example for generating a left cyclic Latin square of order 8 is given in Example 13. · · · . a left cyclic Latin square of order n is an n × n Latin square such that xi+1 = Lxi . · · · . a Latin square is always available.Uniform Experimental Design 85 procedure does surprisingly well. We start at a row vector (1 2 5 4 7 3 8 6). 3. The following algorithm was proposed by Fang. an ) = (a2 . ln ) that has the smallest CD-value among all n! left cyclic Latin squares of order n. LLC . 2. We want to ﬁnd a left cyclic Latin square with the lowest discrepancy among all the n! left cyclic Latin squares. This design is a (nearly) uniform design Un (ns ). n). · · · . the second shift gives (5 4 7 3 8 6 1 2). · · · . the uniform design generated by the forward procedure has a favorable property: a uniform design U of “large size” is suited to all experiments having a number of factors no larger than the number of the columns of U . an . where Pn denotes the set of all n! permutations of (1. Algorithm LUD Step 1.4 Latin square method Deﬁnition 7 An n × n matrix with n symbols as its elements is called a Latin square of order n if each symbol appears on each row as well as on each column once and only once. for a given n there are n! left cyclic Latin squares. then we only need to ﬁnd further columns. lis . Step 2. · · · .3. Search s of n columns of the square. Any Latin square of order n is a U-type design U (n. ns ) designs. Randomly choose an initial permutation. ns ) such that this design has the smallest CD-value among all such U (n. P 0 . However. a2 . · · · . a3 . eventually we have a Latin square. · · · . nn ) and any s columns of a Latin square form a U-type design U (n. and shifting this row again and again. Chapter 4) in order to ﬁnd a solution. where xi is the ith row of the square and L is the left shift operator deﬁned by L(a1 . n − 1. keeping all the existing ones unchanged. Moreover. n) and generate a left cyclic Latin square L0 . Given n. a1 ). Furthermore. the ﬁrst shift gives row (2 5 4 7 3 8 6 1). Shiu and Pan (1999). and outperforms the existing space-ﬁlling experimental designs. li1 . Both steps need a powerful optimization algorithm (cf. The concept of a “Latin hypercube” is just the higher-dimensional generalization of a Latin square. For example. if a uniform design of larger size is needed. ns ). Apply an optimization algorithm introduced in Chapter 4 and ﬁnd a Latin square. It is easy to see that both the optimal Latin square in © 2006 by Taylor & Francis Group. L = (lij ) = (l1 . of (1.

3. among which the design with the smallest discrepancy is a nearly uniform design. The reader can refer to Fang and Hickernell (1995) for a discussion on the strengths and weaknesses of the above three methods in Sections 3.3–3.3).0696 s = 3: columns 1. Example 13 For n = 8 and s = 3 we ﬁnd the ⎡ 1 2 5 4 7 3 8 ⎢2 5 4 7 3 8 6 ⎢ ⎢5 4 7 3 8 6 1 ⎢ ⎢4 7 3 8 6 1 2 ⎢ ⎢7 3 8 6 1 2 5 ⎢ ⎢3 8 6 1 2 5 4 ⎢ ⎣8 6 1 2 5 4 7 6 1 2 5 4 7 3 cyclic Latin square ⎤ 6 1⎥ ⎥ 2⎥ ⎥ 5⎥ ⎥ 4⎥ ⎥ 7⎥ ⎥ 3⎦ 8 with the smallest CD-value 0. 2. The above algorithm was proposed by Fang (1995). ns ) designs (cf. Algorithm ROA in Section 2. some of which will be introduced in Chapter 4. such as • Many orthogonal arrays are uniform designs. columns of the design are chosen from columns of the above Latin square: s = 2: columns 1 and 4 with CD-value 0. we can ﬁnd one with the smallest discrepancy.86 Design and Modeling for Computer Experiments step 1 and the obtained UD in Step 2 may be a local minimum.5 Expanding orthogonal array method There are some relationships between orthogonal arrays (cf. this was discovered by Fang. and 6 with CD-value 0. 3. 6. In © 2006 by Taylor & Francis Group. 2.1601 s = 5: columns 1. 2. 5. One of the advantages of the Latin square method is that it can generate UDs for all s ≤ n.2207 3. • Any orthogonal design Ln (q s ) can be extended to a number of U (n. Winker and Zhang (2000). and 6 with CD-value 0. ns )s. For U8 (8s ).2) and uniform designs. Lin.6 The cutting method It is known that the good lattice point (glp) method has been appreciated by many authors due to its economic computation and good performance. By some optimization algorithm in Chapter 4. LLC .4358. and 7 with CD-value 0.3. Section 1. Table 2. 3. The expanding orthogonal array method requires powerful optimization algorithms.3.5.1123 s = 4: columns 1.2 can produce all possible U (n. In fact.3.

the latter is easy to generate. A uniform design U30 (302 ) was obtained by Fang. Section 3. Let Up be a uniform design Up (ps ). where n is not a prime and the uniform design constructed by the traditional glp method may have a poor uniformity. the glp method with a power generator has the lowest quasi-Monte Carlo computation complexity among comparable methods and also has good performance in the sense of uniformity (cf. Ma and Winker (2000) as follows: U30 = 24 23 1 9 5 11 19 6 21 3 12 15 20 18 17 25 6 12 18 16 7 4 9 11 3 14 20 30 15 24 26 7 4 28 27 25 13 14 29 22 8 2 16 30 10 2 29 21 13 28 17 27 1 8 19 5 26 10 22 23 . we choose a rectangle as V such that there are exactly n points of Dp falling in this rectangle. Let us consider an illustration example.0543 and is given by CU10 = 1 2 3 4 5 6 7 8 9 10 .Uniform Experimental Design 87 particular. For each coordinate we can wrap the square such that position 0 and position 1 are at the same position. Denote by Dn these n points.4(d)). LLC . Finally. and let Dp be its induced design. where n < p or n << p and p or p + 1 is a prime. we can choose ten successive points according to the ﬁrst coordinate (Figure 3. These n points will form a (nearly) uniform design Un (ns ) by some linear transformations. So we have 60 sets of ten points. we choose one design with the smallest CD-value among these 60 designs.3). Speciﬁcally.4(a). By this wrapping consideration. especially generated by the glp method with a power generator. Ma and Fang (2004) proposed the cutting method that cuts a larger uniform design.4(b)) or according to the second coordinate (Figure 3.4(c)). ten successive points can be separated in two rectangles: some points are near to 0 while others are near to 1 (Figure 3. 5 9 1 7 3 8 4 10 2 6 © 2006 by Taylor & Francis Group. Then the points in Dn are approximately uniformly scattered on V. There are 60=30*2 such subsets of ten points. Suppose that the centered L2 -discrepancy (CD) is chosen as measure of uniformity. Example 14 Construction of a U10 (102 ) design. This is the key idea of the cutting method. Its reduced design on C 2 is plotted in Figure 3. or 60 designs with 10 runs. By a linear transformation the ten points in each cutting can be transformed into a unit square and the resulting points form a U-type design of ten runs on the unit square. Suppose that we want to have a uniform design Un (ns ). This design is a nearly uniform design U10 (102 ) with CD=0. from the theory of quasi-Monte Carlo methods. If we wish to have a U10 (102 ) from this U30 . Let V be a proper subset of C s such that there are exactly n points of Dp falling in V.3. The ten points in each cutting are uniformly scattered in the related (wrapped) rectangle.

4 Illustration of the cutting method.88 Design and Modeling for Computer Experiments Its induced design is plotted in Figure 3. This experiment shows the advantages of the cutting method compared with the traditional glp method.4 0. Note that CU10 obtained by the cutting method is a uniform design U10 (102 ).5(b).0543 by the threshold accepting algorithm as follows: U10 = 1 2 3 4 5 6 7 8 9 10 .4 0.5 (d) 1 1 0.6 0.8 0. Fang.6 0.4 0.0614. (a) 1 0.5 1 0 0. 3 6 9 2 5 8 1 4 7 10 with CD = 0. which is larger than the CD-value of CU10 .4 0. as it has the same CD-value as U10 . Ma and Winker (2000) found a uniform design U10 (102 ) with CD=0.6 0. LLC .6 0.8 0. © 2006 by Taylor & Francis Group. If the glp method is applied directly.2 0 0 0. another nearly uniform design is found as follows Uglp.5 (c) 1 0 0.5 1 FIGURE 3.10 = 1 2 3 4 5 6 7 8 9 10 .5(a). 5 9 1 7 3 10 4 6 2 8 Its induced design is plotted in Figure 3.8 0.2 0 1 0.8 0.2 0 1 0.2 0 (b) 0 0.

Uniform Experimental Design 89 (a) 1 1 (b) 0. k = 1. · · · . s.4 0. m > n. For l = 1.m) = (ckj (l.m) ⎧ (l) ⎪ ck+m−n−1 j . · · · . LLC . Step 4. · · · . s reorder the rows of C by sorting the jth column of C such that the elements in this column are sorted from small to large and (l) denote the reordered matrix by C(l) = (ckj ). compare the ps designs U (l. m − 1.8 1 0 0 0. m ≤ n.8 1 FIGURE 3. ⎪ (l) ⎩ ck+p−n j .8 0.m) obtained in the previous step and choose the one with the smallest M -value. denoted by C = (cij ). cp }.2 0. That one is a nearly uniform design Un (ns ). For given (n.6 0. The algorithm for the cutting method is given below. The design Up (ps ) or Dp is called the initial design. n. s). Step 5. · · · .4 0. n according to the magnitude of these elements. where ckj (l. ns ) and is denoted by U (l. · · · .m) ). j = 1. M .6 0.4 0. 2. · · · .6 0. · · · . let C(l.2 0. For m = 1. where p >> n. There are some advantages to the cutting method: (a) the uniformity of the resulting designs is often better than designs directly generated by the glp © 2006 by Taylor & Francis Group.m) by 1. m ≤ n.6 0. · · · . p. Relabel elements of the jth column of C(l. n. There are ps U -type designs with such a structure.5 Plots of the induced designs of two U10 (102 ) s. The resulting matrix becomes a U-type design U (n. Example 14 shows that the cutting method has good performance in construction of uniform designs. ﬁnd a Up (ps ). Step 3. For a given measure of uniformity.8 0.2 0. ⎨ (l) = ck j . Algorithm CUT Step 1. k = 1.m) . Step 2.4 0.2 0 0 0. k = m. and calculate its induced design Dp = {c1 .

Winker and Fang (1998) used the TA algorithm to ﬁnd UDs under the star discrepancy.and wrap-around-discrepancies. Chan. and n = 7. Ma and Winker (2000) applied the TA algorithm for ﬁnding UDs under the centered L2 .hk/UniformDesign/ or http://www.hkbu.psu. the computation complexity is lower.5). The uniform design was motivated by the overall mean model (cf. all the classical Newton-Gauss type optimization methods do not work. 8. However.4 Characteristics of the Uniform Design: Admissibility.edu. we can ﬁnd many Un (ns ). Section 1. s = 2. They chose p = 31. More methods for generating UDs will be introduced in Sections 3. LLC . the genetic algorithm and stochastic evolutionary algorithms.5 and 3. The user can ﬁnd many useful uniform designs on the web site http://www. Chapter 4 gives a detailed introduction to some powerful optimization methods and their applications to construction of space-ﬁlling designs.7 Construction of uniform designs by optimization It is clear that searching a UD for a given (n. Fang and Mukerjee (2001) characterized an orthogonal array of strength two in terms of D-optimality under a regression model where the factor levels are allowed to vary continuously in the design space.math. Jin. Fang. and Robustness Each type of statistical design has its own characteristics under a certain statistical model.2 is optimal under certain criteria (like D-optimality. As the domain U (n. q) is an optimization problem. respectively.3.90 Design and Modeling for Computer Experiments method.edu / rli/UniformDesign/. In the past twenty years. 4. 3. 3. The results show that all the nearly uniform designs by the cutting method have a smaller CD-value than their corresponding designs generated by the glp method. For example. the threshold accepting. · · · . (b) from one initial design Dp . s = 2. Chen and Sudjianto (2005) proposed the enhanced stochastic evolutionary (ESE) algorithm for constructing optimal design of computer experiments including uniform designs. Minimaxity. Popular among them are the simulated annealing. 30 based on a nearly uniform design U31 (311 6). A-optimality) for the linear regression model (Kiefer (1959) and Atkinson and Donev (1992)). n < p. 3. s. Cheng (1980) established the universal optimality of a fractional factorial plan given by an orthogonal array of strength two (cf. 3. Ma and Fang (2004) gave an illustration example.2 for the deﬁnition of an orthogonal array). good estimation of the overall mean is not enough for most © 2006 by Taylor & Francis Group. · · · . q s ) is a ﬁnite set. many powerful stochastic optimization algorithms have been proposed. Section 1. the optimal design introduced in Section 1.stat.6.

The former has the model y = f (x) and the model for the latter is y = f (x) + ε. the uniform measure on C s is optimal in a certain sense. A measure can be deﬁned by a distribution in measure theory. If. Wiens (1991) obtained two optimality properties of the uniform measure. Let v(x) = (v1 (x).17) where f (x) is unknown. Let Dn = {x1 . Therefore. whereas bias-minimizing designs are rather eﬃcient. as an approximation. when the experimenter wants to have a high quality metamodel.18) is misused as the true model. and considered worst-case and average case models. Its empirical distribution FDn (x) can also be considered a design. but in a certain class of functions. If we put more conditions on the class F. we call them the maximum mean-square-error model and the average mean-square-error models. He proved that the uniform design is optimal and robust for both of these models. Yue and Hickernell (1999) considered an approximate model and decomposed the sum of squares into two components: variance and model bias. vk (x)) be any k(≤ n) linearly independent functions on L2 (C s ). the set of L2 -integrable functions on C s .17) and proved that the uniform measure on C s is admissible and minimax under this framework. Xie and Fang (2000) proposed a framework in terms of the decision theory for model (3. xn } be a design. respectively. any distribution on the experimental domain can be considered a design. discrepancy. several authors such as Wiens (1991). LLC . (3. Hickernell (1999). bias-minimizing designs tend to spread the points evenly © 2006 by Taylor & Francis Group. Therefore. · · · . Xie and Fang (2000). This idea was proposed by Kiefer (1959) and has been a powerful force in the development of the theory of optimal design and of uniform design. Uniform design has been applied in computer experiments and in factorial plans with a model unknown. denoted by F.18) as an approximation to (3.Uniform Experimental Design 91 computer experiments. the linear model k y= i=1 βi vi (x) + (3. Moreover. · · · . and ε is the random error. Yue (2001). and robust design. and Hickernell and Liu (2002) gave many optimalities of the uniform design under diﬀerent models. Hickernell (1999) combined goodness-of-ﬁt testing. One wants to ﬁnd a metamodel y = g(x) for the above two cases. They investigated the importance of both variance and bias components and showed that the variance-minimizing designs can yield substantial bias.

Uniform design has been widely used in various ﬁelds in the past decades. Although it is rare for a single design to be both maximally eﬃcient and robust.. One explanation for this phenomenon is that both approaches can work for a large class of functions f (·). Wang and Fang (1996). LLC .2.17) and found that lowdiscrepancy sets have good performance in such nonparametric models. and Fang and Yang (1999). Hickernell and Liu (2002) found that “When ﬁtting a linear regression model to data.92 Design and Modeling for Computer Experiments over the domain. Uniform design can be applied to • computer experiments • factorial plans with a model unknown • experiments with mixtures For a discussion of the latter. Section 3. Yue (2001) applied the Bayesian approach to model (3. aliasing can adversely aﬀect the estimates of the model coeﬃcients and the decision of whether or not a term is signiﬁcant. not just for a speciﬁc f (·). both LHS and UD are robust against the model speciﬁcation. it is shown here that uniform designs limit the eﬀects of aliasing to yield reasonable eﬃciency and robustness together. The above studies show the advantages of the uniform design (measure).6) is an important criterion in study of orthogonal designs and found a number of interesting results. Hickernell and Liu (2002) showed that categorical discrepancy (cf. It is interesting to note that both LHS and UD are motivated by the overall mean model at the beginning of the study. This gives another justiﬁcation for the uniform design.e. readers should refer to Fang and Wang (1994). Optimal experimental designs give eﬃcient estimators assuming that the true form of the model is known.” They also showed that the uniform designs limit aliasing. while robust experimental designs guard against inaccurate estimates caused by model misspeciﬁcation. Although the overall mean model is far from the ﬁnal goal of ﬁnding a good metamodel. Users report that the uniform design is • a friendly design in the sense of ﬂexibility of the numbers of runs and levels of the factors • a robust design which is reliable against changes in the model • easy to understand and implement • convenient in that designs with small size have been tabulated for general use © 2006 by Taylor & Francis Group. i. both LHS and UD have excellent performance in computer experiments.

When orthogonality cannot be satisﬁed. λ) is said to be resolvable. or Un (q11 ×. Ge and Liu (2002). 2004). LLC . such that each treatment appears in m blocks.1 Resolvable balanced incomplete block designs Deﬁnition 8 A balanced incomplete block design (BIBD) with parameters (n. s. such that every treatment appears in each parallel class precisely once. λ). denoted by BIBD(n.5 Construction of Uniform Designs via Resolvable Balanced Incomplete Block Designs Due to the resource needs of the experiment or limitations of the experimental environment many factors in experiments can only allow a small number of r rm levels. with some qj << n. Lu and Meng (2000) and Fang. m. s. A BIBD(n. Table 3. Ge. where t < n. designs Un (q s ). Lu. 3. .Uniform Experimental Design 93 3. P9 listed in 9 columns of Table 3. that lead to many combinatorial designs.3 gives a design RBIBD(10. λ). 1). Along with this relationship many uniform designs can be obtained from the rich literature in combinatorial designs without any optimization computation. Fang.6. 2. 9.2. m. and every pair of treatments appears in exactly λ blocks. q << n. Lin. m. This section introduces the construction method via combinatorial designs. called parallel classes. among which the resolvable balanced incomplete block design has been systematically developed. denoted by RBIBD(n. Lin and Liu (2003) independently found a relationship between resolvable balanced incomplete designs and uniform designs under the categorical discrepancy deﬁned in Section 3. The RBIBD is an important class of block designs. λ). is an arrangement of n treatments into s blocks of size t. Lu and Sun (2001). and Fang. s. t. Fang. · · · × qm ). . are extremely useful. Tang and Yin (2003) found many new UDs by this method.19) Hence there are only three independent parameters in the deﬁnition. At the beginning we review some basic concepts in block designs. where there are 10 treatments that are arranged in 45 blocks of size 2 such that each treatment appears in 9 blocks and every pair of treatments appears in exactly one block. treatments. Fang. t.5. some weaker combinatorial conditions can be set. The ﬁve parameters must satisfy the following relations: nm = st and λ(n − 1) = m(t − 1).3. . © 2006 by Taylor & Francis Group. if its blocks can be partitioned into m sets of blocks. Clearly. Fang. Winker and Zhang (2000) found that many existing orthogonal designs are uniform designs as the orthogonal designs satisfy some combinatorial balances. s. this design has a good balance among blocks. There are 9 parallel classes P1 . t. m. Therefore. and pairs. t. 45. (3. Liu and Qin (2002. Ge and Liu (2003). .

For each Pj . Similarly.6} {3.10} {8. given an RBIBD(n. Step 3. In this way. 1) and U (10. 45.4} {1.10} P7 {5. 59 ). .10} {1.5} {9.9} {3. n. λ).5. 59 ) 1 in Table 3. Pm . .9} {8. . . t. 2. q m ). and the m parallel classes are denoted by P1 . . P9 in the design RBIBD(10. . in which a © 2006 by Taylor & Francis Group. .7} P6 {5.4. . So this RBIBD can be expressed as RBIBD(n. . 10} falls in column P1 and row bj row. .6} {3.10} On the contrary. 9. .5} {6. .8} {6. q m ) that is the output. Each parallel class corresponds to a column of U (10.9} {4. 2. 9} is located at P1 column and bj row.4} {1.6} {7. m. . . TABLE 3. is called the RBIBD method.8} {6. 9. . u = 1. each of which consists of q = s/m disjoint blocks.8} P5 {2. Clearly.4. 45. n/q.2} {5.7} {2. . .8} {3. n/q. . 1) P1 bj 1 bj 2 bj 3 bj 4 bj 5 {1. s. λ).4.10} {5.5. 3.10} {2. The pair {1. bj . . We have 9 parallel classes P1 .7} {2. The m q-level columns constructed from Pj (j = 1.9} {1.2 RBIBD construction method Now. The categorical discrepancy was deﬁned in Section 3. and the ﬁve rows bj . 2. mq. 9. m. can be carried out as follows: Algorithm RBIBD-UD Step 1.4} P3 {4.9} {5.7} {4. 45. 59 ) in Table 3. q to the q blocks in each parallel class Pj . . . q. Step 2. λ).10} {1.7} {2. The construction method. if treatment k is contained in the u-th block of Pj .5} {4. let us establish the relationship between the above RBIBD(10. 1).9} {3. construct a q-level column xj = (xkj ) as follows: Set xkj = u. a q-level design U (n. m. where the n treatments are denoted by 1.6} {2. m. 2 so we put level ‘2’ at positions 8 and 9 of column 1 in Table 3.4. .8} {2. 4.9} {6.3 correspond to the ﬁve 1 2 3 4 5 levels 1. . t = n/q. q m ) can be constructed by the RBIBD method. so we put level ‘1’ at positions ‘1’ and ‘10’ of column 1 of U (10. bj . 5 of U (10. Give a natural order 1. 2.6} {1.94 Design and Modeling for Computer Experiments 3. 59 ). bj . .7} {4.3 New uniform designs There are many existing RBIBDs in the literature.2. 2. 59 ).8} {3. which form a U-type design U (10.3 An RBIBD(10. 3. With this RBIBD we can construct a U-type design U (n. If there exists an RBIBD (n.10} {7. LLC . as shown in Table 3. mq.3} {4. .8} {7.9} P8 {1. . bj of Table 3. j = 1. m) form a U (n. .3} P2 {2.5} P4 {3. the pair {8. . 9 columns are then constructed from these 9 parallel classes.10} P9 {1.6.

then a uniform Un (( n )n−1 ) exists. 225. And if λ = 1. possibly for (c) If n ≡ 0 (mod 3) and n = 6. (b) If n ≡ 3 (mod 6). q m ) constructed by the RBIBD method is a uniform design in the sense of the categorical discrepancy. n/q. m. a number of uniform designs can be obtained as follows: Theorem 5 The following Un (q s ) can be constructed by the Algorithm RBIBD-UD. except possibly © 2006 by Taylor & Francis Group. λ). then the U (n. 3 (d) If n ≡ 4 (mod 12). then a uniform Un (( n )n−1 ) exists. 240}. 59 ) Row 1 2 3 4 5 6 7 8 9 10 1 1 5 5 3 3 4 4 2 2 1 2 5 1 3 5 2 3 4 2 4 1 3 5 4 2 1 5 4 3 3 1 2 4 3 3 1 2 4 5 1 5 4 2 5 4 1 4 5 3 2 5 1 2 3 6 3 2 4 2 1 5 1 4 3 5 7 3 5 2 2 1 1 4 3 5 4 8 1 2 4 3 2 3 1 5 4 5 9 1 2 4 3 4 1 2 3 5 5 lower bound of the categorical discrepancy of U (n. we can use Algorithm RBIBD-UD to construct many new symmetric uniform designs.4 95 U (10. mq. then a uniform Un (( n )n−1 ) exists. Based on this theorem. Fang. Therefore. Obviously. then any of the possible level-combinations between any two columns appears at most once. then a uniform Un (( n ) 5 for n ∈ {45. 6 (h) If n ≡ 5 (mod 20). a U-type design whose categorical discrepancy attains the lower bound given in Theorem 2 is a uniform design. then a uniform Un (( n ) 4 (e) If n ≡ (f ) If n ≡ n ∈ {174. LLC . From the rich literature on RBIBDs. Ge and Liu (2003) obtained the following theorem: Theorem 4 Given an RBIBD(n. n−1 3 0 (mod 4). n−1 4 ) exists. then a uniform Un (( n )k(n−1) ) exists.Uniform Experimental Design TABLE 3. 645}. 4 0 (mod 6). except 6 (g) If n ≡ 0 (mod 6). 345. 465. q1 × · · · × qm ) is also given. then a uniform Un (( n )2(n−1) ) exists. where the categorical discrepancy is employed as the measure of uniformity: (a) If n is even. where k is a positive 2 integer. then a uniform Un (( n ) 3 n−1 2 ) exists. ) exists.

Several series of new uniform designs are then obtained. then a uniform Un (( n ) 5 except possibly for n ∈ {70. Lu. Ge. the reader can refer to Abel. 115. 395}. © 2006 by Taylor & Francis Group. Table 3. n−1 ) exists. 225.5 U15 (57 ) Row 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 1 1 1 2 5 2 3 5 4 4 1 3 3 2 4 5 2 1 5 1 2 2 3 3 1 2 4 5 4 3 4 5 3 1 2 4 1 5 4 3 2 3 5 3 1 2 4 5 4 1 2 3 4 1 5 1 5 2 3 3 2 4 4 5 5 1 2 2 2 4 1 3 3 1 3 4 5 5 4 5 6 1 4 5 2 3 2 4 3 5 2 1 3 1 4 5 7 5 1 2 3 1 1 2 3 4 4 3 2 4 5 5 All the existing uniform designs Un (q s ) are constructed based on U-type designs. 160. 15. 135. LLC . Fang. 315. 295. except 5 possibly for n ∈ {45.emba. 335.edu. and for RBIBDs in Case (b)–(g) the reader can refer to Colbourn and Dinita (1996). the reader can visit the web site: http://www. 190. 195.edu/~dinitz/newresults.uvm.hkbu. For more tables. TABLE 3. For the RBIBDs in Cases (h)–(j). 235. Tang and Yin (2003) developed a general construction method for UDs via resolvable packings and coverings.html.5 constructed by the case (b) of the theorem. (j) If n ≡ 0 (mod 5) and n = 10. then a uniform Un (( n ) 2 ) exists. 135. 195}.math.96 Design and Modeling for Computer Experiments n−1 (i) If n ≡ 5 (mod 10) and n = 15. Greig and Zhu (2001). which require the number of experimental runs n to be a multiple of the number of factor levels q. The reader can ﬁnd the deﬁnition and related theory and method in their papers. For more information on the RBIBDs in Case (a) the reader can refer to Rees and Stinson (1992). 90. gives U15 (57 ). 345.hk/UniformDesign. When n is not a multiple of q. 215. some new results also can be found on the web site http://www.

Suppose we need a design Un (q1 × · · · × qs ). Suppose that we want to change four columns of this design into a mixed level 6 × 4 × 3 × 2 design. 3. Let U = (uij ) be a U-type design. Fortunately. The above method is called the pseudo-level technique. (11. As a result. collapsing. columns 2 through 4 become 4.6.1 Pseudo-level technique Let us start with an illustrative example. We can apply an optimization algorithm introduced in Chapter 4 and obtain a required design. we can ﬁnd U12 (124 ). 2) ⇒ 1. which is listed in Table 2. Suppose one needs a uniform design U12 (6 × 4 × 3 × 2) that cannot be found in the literature. 3. In this section we introduce the pseudo-level technique. . respectively. 4. DU. 3. q1 × · · · × qs ) and V = (vkl ) be another design. 12) ⇒ 6.V = (1m ⊗ U . we can assign the original four columns to have 3. If the original design is UD. qm are diﬀerent. we have to ﬁnd alternative ways. U (n. and the new design is given in Table 3. merges two uniform designs into one with a larger size. mt ). and 2 levels. and combinatorial methods. or the experimenter wants to include more levels for important factors and less levels for less important factors. 2 levels.6. Similarly.1. we shall concentrate on the r rm construction of asymmetrical uniform designs. or other choices. suggested by Fang and Qin (2002).V by collapsing U and V as follows . U (m. LLC . but it cannot be found in the literature. © 2006 by Taylor & Francis Group. · · · . respectively.2 Collapsing method The collapsing method. Then we merge the original levels in column 1 by (1. We should choose one that has the minimum CD-value.6. (3. A design Un (q11 × · · · × qm ) is called an asymmetrical uniform design if q1 .6 Construction of Asymmetrical Uniform Designs In the previous section. Construct a new U-type design DU. · · · . In fact. column 1 becomes a six-level column. In this section. Asymmetrical uniform designs are required when the experimental environment has some limitation. we have introduced several methods for construction of symmetrical uniform designs. V ⊗ 1n ).Uniform Experimental Design 97 3. If using an optimization method is too diﬃcult (no computational code is available. 4) ⇒ 2. for example). 6. There are 4!=24 choices. the new one obtained by the pseudo-level technique usually has good uniformity. From this example the reader can easily obtained a design with mixed levels from a symmetrical UD.

LLC . Section A. then ⎛ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ =⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎝ 1 2 3 4 1 2 3 4 1 2 3 4 1 1 2 2 1 1 2 2 1 1 2 2 1 2 2 1 1 2 2 1 1 2 2 1 1 1 1 1 2 2 2 2 3 3 3 3 3 3 3 3 2 2 2 2 1 1 1 1 ⎞ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎠ DU. Obviously. q1 × · · · × qs × mt ) the set of designs DU.6 U12 (6 × 4 × 3 × 2) No 1 2 1 1 4 2 1 2 3 2 1 4 2 2 5 3 4 6 3 3 7 4 2 8 4 1 9 5 3 10 5 4 11 6 3 12 6 1 3 1 3 2 1 3 2 2 1 3 2 1 3 4 2 1 2 1 2 1 2 1 2 1 2 1 where ⊗ denotes the Kronecker product (cf. The above construction method is the collapsing method proposed by Fang and Qin (2002).t) (mn. 4×22 ×32 ). q1 × © 2006 by Taylor & Francis Group.V is a U (12.t) (mn.1).98 Design and Modeling for Computer Experiments TABLE 3. For example. and 1m is the m×1 vector of ones. Denote by U (s. q1 × · · · × qs × mt ). We wish to ﬁnd a design with the minimum CD-value in the design space U (s.V is a U-type design U (nm. let ⎛ ⎞ ⎞ ⎛ 111 13 ⎜2 1 2⎟ ⎟ U =⎜ V = ⎝2 2⎠ ⎝ 3 2 2 ⎠ and 31 421 be two U-type designs. the design DU.V .

The user can apply some optimization method to ﬁnd a design with a lower discrepancy. and CD(V ) as follows: (W D(DU. we have (1) When m is odd.1) (mn.V is a uniform design over U (s. q1 × · · · × qs × mt ) obtained by the collapsing method need not be a uniform design in the design space U (mn.2.t) (2n. • Under the centered L2 -discrepancy. 2 1+ k=1 j=1 m t 1 1 1 1 xkj − − xkj − 2 2 2 2 1 1 1 1 yal − − yal − 2 2 2 2 2 . Note that a design U (mn. The WD. 2s × m) if and only if U is a uniform design Un (2s ). the design DU. as the design space U (s. q1 × · · · × qs × mt ) if and only if U is a uniform design Un (q1 × · · · × qs ) and V is a uniform design Um (mt ).and CD-value of the new design DU. (2) When m is even and q1 = · · · = qs = 2. More results are open for further study. • When the CD is chosen as a measure of uniformity and t = 1.V is a uniform design over U (s. Designs obtained by the collapsing method are nearly UDs.V ))2 = (CD(U ))2 − + where αU = βV = 2 n n s 4 3 s (W D(V ))2 + 4 3 t − 4 3 s+t 13 12 s + αU (CD(V ))2 − 13 12 t + βV 13 12 s+t 1 − αU βV . q1 × · · · × qs × mt ) is a proper subset of the domain U (mn. q1 × · · · × qs × m) if and only if U is a uniform design Un (q1 × · · · × qs ).V is a uniform design over U (s. CD(U ). A natural idea would be to choose U and V to be uniform designs.V ))2 = (W D(U ))2 + and (CD(DU. Fang and Qin (2002).4) the design DU.Uniform Experimental Design 99 · · · × qs × mt ). q1 × · · · × qs × mt ). © 2006 by Taylor & Francis Group. 2 2 m a=1 1+ l=1 . the design DU.1) (mn.V is a uniform design on U (s.V can be calculated in terms of W D(U ). Section 3. LLC . 2s+t ) if and only if U is a uniform design Un (2s ) and V is a uniform design U2 (2t ).t) (mn. W D(V ).t) (mn. the design DU. when q1 = · · · = qs = 2 and m = 2. pointed out that: • Under the wrap-around L2 -discrepancy (cf. q1 × · · · × qs × mt ).

as a consequence. P2 = {{1. {1. A parallel class in a design is a set of blocks that partition the point set. Suppose that. is a pair (V. 4. such that each treatment appears in exactly four parallel classes. 1. 2}. 3.100 Design and Modeling for Computer Experiments 3. this method can also be applied to construction of asymmetrical UDs. is a resolvable P BD(n. 6}. Therefore. and R be a multiset satisfying |R| = |K|. 2. then |B| ∈ K. P3 = {{3. a {· · · } represents a block): P1 = {{1. 5}}. Uniformly resolvable block design: From now on. LLC . {1. Example 15 U RBD(6. Six treatments are arranged into b = 11 blocks of size kj from K = {3. {3. 2}. B) which satisﬁes the following properties: If B ∈ B. Construction method: We now establish relationships between uniform designs and uniformly resolvable block designs. {3. {3. {3. 6}. 5}.6. P4 = {{2. 1. A parallel class is uniform if every block in the parallel class is of the same size. P1 . {1. 4}. Note that all blocks in RBIBDs are of the same size. denoted by U RBD(n. {1. However. so this block design is a U RBD(6. and P4 . © 2006 by Taylor & Francis Group. λ) where n is the number of runs and λ is a positive integer. there is a corresponding positive rk ∈ R such that there are exactly rk parallel classes of block size k. {2. 5. denoted by P BD(n. 5. 2}. K. P2 . P3 . B. 5}. Let us cobsider an example. the UDs constructed by RBIBDs are symmetrical. 3}). that are given as follows (where in each parallel class. {2. λ. {4. A. Let K be a subset containing the diﬀerent values of the block sizes. K. for each k ∈ K. Note that every pair of treatments occurs in exactly λ = 1 block.3 Combinatorial method We introduced the relationships between resolvable balanced incomplete block designs and uniform designs and Algorithm RBIBD-UD for construction of symmetrical UDs in Section 3. 4}}. We can construct asymmetrical UDs if we can ﬁnd balanced block designs that have blocks with diﬀerent sizes. and every pair of distinct treatments of V appears in exactly λ blocks of B. A pairwise balanced design of order n with block size from K. 3}). denote by |A| the cardinality of a set A. 6}}.5. 3}. 6}. a new concept of pairwise balanced designs is helpful. Consider the case of n = 6 and V = {1. 4}. A uniformly resolvable block design. K. 6}}. λ) such that all of the parallel classes are uniform. R). 2.

Algorithm URBD-UD Step 1. 2}. which form a U6 (2 × 33 ) listed in Table 3. TABLE 3. m) to form a U-type design U (n. Ge and Liu (2003). Similarly. · · · . K. {1. Step 2. and B = ∪m Pj . respectively. 5. 1. thus we obtain a 2-level column of the design. given in this example. Then a uniform design Un (q1 ×· · ·×qm ) in the sense of the categorical discrepancy can be constructed with the following procedure proposed by Fang. 3}) as follows: In P1 . n}. 2. K = {n/q1 . Combine these m columns constructed from Pj of B (j = 1. m). Step 3. if treatment i is contained in the u-th block of Pj of B (j = 1. · · · . It can be shown m that λ = ( j=1 n/qj −m)/(n−1). 2. 2. We put “1” in the cells located in rows 1. and 6 of that column. n/qm }. q1 × · · · × qm ). Liu and © 2006 by Taylor & Francis Group. {3. Fang. · · · . 2.7. 1. 6 of that column. For convenience. · · · . The second column of the design is thus generated. For each Pj . 5 and 3.7 U6 (2 × 33 ) No 1 1 1 2 1 3 1 4 2 5 2 6 2 2 1 2 3 1 2 3 3 2 3 1 3 1 2 4 3 1 2 2 3 1 Let λij be the number of blocks in which the pair of treatments i and j appears. R). 2. 2}. 5. 4 columns are then constructed from these 4 parallel classes. LLC . {3. λ. let V = {1. Ge. {1. The fact that λij = ψ for all i = j means that any pair of distinct treatments in V appears in exactly λ blocks. Example 15 (Continued) A UD U6 (2 × 33 ) can be obtained via U RBD(6. 3}). · · · . 2. and 3 of the ﬁrst column of the design and “2” in the cells located in rows 4. · · · . from P2 we put “1” in the cell located in rows 1 and 4 of the second column of the design and “2” and “3” in the cells located in rows 2. suppose that there exists a U RBD(n. there are two blocks {1. In this way. Assign a natural order 1. where j=1 each Pj with qj blocks represents a parallel class of blocks. qj to the qj blocks in each parallel class Pj (j = 1. m).Uniform Experimental Design 101 To illustrate this. Four columns of U6 (2 × 33 ) are constructed by the four parallel classes of U RBD(6. 6}. construct a qj -level column xj = (xij ) as follows: Set xij = u. 3} and {4. 2.

6. 10}. . {5. 8. and λ = ( j=1 rj (kj − 1))/(n − 1) is a positive integer. P4 = {{1. λ. {2. {6. 9}. {6. {3. 8. 5. 9}. Fang and Qin (2004). 6. 10}. Ge and Liu (2003). 4}. {3. . {11. 4}. 11}. 11}. Qin (2002). 11}}. {10. {4. {3. {5. {2. . 12}}. {9. . P2 = {{1. {6. TABLE 3. {4. {2. 6. 10}. 7. 6. where l K = {k1 . {2. P8 = {{1. 7.102 Design and Modeling for Computer Experiments n n Qin (2004) pointed out that there exists a Un (( k1 )r1 · · · ( kl )rl ) with λij = λ for all 1 ≤ i = j ≤ n if and only if there exists a U RD(n. rl }. 12}. 9}. R). 6}. 11}. 12}}. 12}. {9. 7}. 4}. {3. P5 = {{3. {4. 8}. 7. {9. . 7}. 10}}. 5}. 11}. {5. {4. {1. 12}}. 8}. {2. . {4. P7 = {{1. 5. 11}}. {8. we can construct a U12 (65 × 43 ) shown in Table 3. 8}. 12}. 8}.8 U12 (65 × 43 ) No 1 1 1 2 1 3 2 4 2 5 3 6 3 7 4 8 4 9 5 10 5 11 6 12 6 2 1 2 1 2 3 4 3 4 5 6 5 6 3 1 2 2 1 3 4 4 3 5 6 6 5 4 1 2 3 4 5 6 1 2 4 5 6 3 5 3 4 1 2 1 2 5 6 6 3 4 5 6 1 2 3 4 1 2 3 4 1 2 3 4 7 1 2 3 4 4 1 2 3 2 3 4 1 8 1 2 3 4 2 3 4 1 3 4 1 2 © 2006 by Taylor & Francis Group. LLC . 12}. 3}. 6}. Example 16 A U RBD(12. {2. The following example gives another UD with mixed levels. 9}}. P6 = {{1. . {10. 10}. 3}. 7}. kl }. 5. . 2}. K. {2. and Chatterjee. 8. {7. {3. From this U RD and Algorithm URBD-UD. 3}) can be found in combinatorial design theory as follows: P1 = {{1. 9}. 12}. {7. {5. R = {r1 . 3}. {2. 10}. 1. P3 = {{1. {5. 11}}. More asymmetrical UDs can be found in Fang.

by deleting a row block with the same level in the ﬁrst column of Ln (q s ). In fact. Resolvable group divisible designs: Fang. The reader can ﬁnd the deﬁnition and related discussion in their paper. It is known that there are three pairwise orthogonal Latin squares of order 4.Uniform Experimental Design 103 3. Ge and Liu (2003) proved that there exists a Ukm (k × mm ) if N (m) > k − 2.3 we discussed construction of UDs via balanced block designs. LLC . The latter starts an orthogonal array Ln (q s ) and then generates a UD Un−q ((q − 1) × q s−1 ).5 and 3.6. Lin and Liu (2003). Orthogonal Latin squares/arrays: The concept of Latin squares is deﬁned in Deﬁnition 7. The following are some approaches: A. A set of Latin squares of order m. Lin and Liu (2003) also proposed deleting more row blocks from Ln (q s ) and obtained many (nearly) UDs such as U6 (2 × 33 ). Fang. which occur at the same location of the two Latin squares. this approach is equivalent to the method proposed in Fang. Fang. are all diﬀerent.9. So we can generate a U12 (3 × 44 ) that is listed in Table 3. Ge and Liu (2002) considered using room squares. © 2006 by Taylor & Francis Group. Room squares: Fang. Let N (m) denote the maximum number of pairwise orthogonal Latin squares of order m. any pair of which are orthogonal. B.6. The concept of the room square is a very useful tool in discrete mathematics. Two Latin squares of order m are called orthogonal to each other if their m2 pairs.9 U12 (3 × 44 ) No 1 1 1 2 1 3 1 4 1 5 2 6 2 7 2 8 2 9 3 10 3 11 3 12 3 2 1 2 3 4 4 2 3 1 1 4 2 3 3 1 2 3 4 1 3 2 4 2 3 1 4 4 1 2 3 4 3 1 4 2 3 2 4 1 5 1 2 3 4 2 4 1 3 4 1 3 2 C. There are many ways to generated resolvable balance incomplete block designs and uniformly resolvable block designs. Ge and Liu (2003) used the so-called resolvable group divisible designs to construct UDs. TABLE 3. is called a set of pairwise orthogonal Latin squares.4 Miscellanea In Sections 3.

and U9k (k × 99 ). U8k (k × 88 ). U5k (k × 55 ). 3. k = 2. 2 ≤ k ≤ 7. 2 ≤ k ≤ 6. 2 ≤ k ≤ 8. 4. LLC . © 2006 by Taylor & Francis Group. U7k (k × 77 ).104 Design and Modeling for Computer Experiments U8 (2 × 44 ). U12 (3 × 44 ).

Park (1994) developed an algorithm for constructing optimal LHDs based on IMSE criterion and entropy criterion.1 Optimization Problem in Construction of Designs Let D be the space of designs. each of which has n runs and s factors. and uniform designs under various criteria. Un (q s ). Note that the cardinality of D is ﬁnite. Alternatively. Tang and Yin (2005)). The threshold accepting heuristic has been used for construction of uniform designs under the star discrepancy (Winker and Fang (1998)). is an attractive advantage of the optimization approach. Let h(D) be the objective function of D ∈ D. Jin. Tang and Winker (2005)). and Fang. though at the expense of computational complexity. The optimization problem in the construction of space-ﬁlling designs is to ﬁnd 105 © 2006 by Taylor & Francis Group. with each factor having q levels. LLC . uniform designs. and Fang. 4. and other space-ﬁlling designs. each focusing on a speciﬁc structure of designs (e. In this chapter we shall introduce some useful optimization algorithms and their applications to the construction of space-ﬁlling designs. This generality. Un (ns ). Morris and Mitchell (1995) adapted a version of simulated annealing algorithm for construction of optimal LHDs based on φp criterion. Fang.4 Optimization in Construction of Designs for Computer Experiments Various optimization techniques have been employed for ﬁnding optimal LHDs. and UDs with mixed levels). Maringer. Fang. Lu and Winker (2003). Chen and Sudjianto (2005) demonstrated the use of an evolutionary algorithm to eﬃciently generate optimal LHD. centered L2 -discrepancy (Fang. although it is very large in most cases.g. In the past decades many powerful optimization algorithms have been developed. Lu and Winker (2003). and Ye (1998) employed the column-pairwise algorithm for constructing optimal symmetrical LHDs based on the φp criterion and entropy criterion.. In the previous chapter we introduced many useful methods for construction of UDs. Ma and Winker (2000). one may choose to employ a more general optimization approach that can be used to deal with any kind of UDs. and wrap-around L2 -discrepancy (Fang and Ma (2001b).

4. the φp criterion (2. • Start with a randomly chosen initial design D0 . threshold accepting heuristic (Winker and Fang (1998). according to the given deﬁnition of the neighborhood. On the other hand. More precisely. the design obtained by a row-exchange may not necessarily be the U-type. Go back to the previous step until the stopping rule is reached. the current design will move to a nearby design.6). a set of candidate designs.18).1 Algorithmic construction The stochastic optimization algorithms. one which is in the same “neighborhood. Although these stochastic optimization methods have quite diﬀerent names and appear in a diﬀerent iteration process. have been widely used in searching for space-ﬁlling designs. In cases where the objective function h(·) is to be maximized. simulated annealing (Morris and Mitchell (1995)). like column-pairwise exchange (Li and Wu (1997) and Ye (1998)). furthermore. Fang. Ma and Winker (2000). • Compute the objective function value of the new design Dnew and decide whether to replace the current design Dc with the new one Dnew . we can use the following uniﬁed way of constructing them. the star discrepancy (A. let Dc ∈ U (n. we can use −h(·) instead and minimize the new objective function. LLC . (2005)). q s ).2).11). or the WD (3. to deﬁne the column-exchange approach.1. D∈D (4. q s ).1. Let D be the set of U (n. The smallest neighborhood is to exchange two levels in a row or a column of Dc ∈ D. • Construct the neighborhood of Dc .8).1) The objective function h can be the IMSE (2. Lu and Winker (2003)) and stochastic evolutionary (Jin et al. 4. A necessary condition for deﬁning a neighborhood is that the designs in the neighborhood of Dc should be a subset of D.” There are many considerations for the deﬁnition of neighborhood in a given context. i. The column-exchange approach has been a more popular choice in the literature because it maintains the structure of the design such as the resulting design maintains the U-type to Dc .106 a design D∗ ∈ D such that Design and Modeling for Computer Experiments h(D∗ ) = min h(D). the entropy (2. we need to deﬁne a way in which members of this subset have similar features of Dc . Randomly choose one column of Dc and two entries in this © 2006 by Taylor & Francis Group.2. Sample a new design Dnew from the neighborhood.22). the CD (3. and let Dc = D0 . and Fang.e..2 Neighborhood In the iteration process of an optimization algorithm.

54 ) and belongs to the neighborhood of Dc . 4. Winker and Fang (1997) gave a detailed discussion on choice of the neighborhood. Let us deﬁne another neighborhood of Dc . Let ∇h = h(Dnew ) − h(Dc ).2) D =⎢ ⎢ ⎥ ⎥ ⎣4 4 1 2⎦ ⎣4 5 1 2⎦ 5321 5321 It is easy to see that there are 4 · 5 = 4 · 10 = 40 designs in the neighborhood 2 of Dc . compares h(Dc ) and h(Dnew ): LS Rule: 1. The new design is still a U (5. D =⎢ ⎢ ⎥ ⎥ ⎣4 5 1 2⎦ ⎣4 5 1 2⎦ 1321 5321 Now. ‘1’ means ‘replace’ and ‘0’ means ‘do not replace. 0. randomly choose two columns. Example 17 The design Dc in (4. then exchange these two entries to create a new design. say column 2. this rule will terminate the optimization © 2006 by Taylor & Francis Group. if ∇h ≤ 0. We need to decide whether the new design should replace the current one or not.e. of Dc . say entries (i. rows) 4 and 5. there are 4 2 ·[ 5 2 ]2 = 6 · 102 = 600 designs in the neighborhood.3 c Replacement rule Let D be the current design and Dnew be the new design chosen from the neighborhood of Dc . then randomly choose two entries in columns 1 and 4. two entries from each column.3) Here. Randomly choose a column. Many authors recommend choosing a smaller neighborhood so that there may be a higher possibility of reaching the global minimum. if ∇h > 0. and exchange the two entries. like the column pairwise. (4. and exchange them as follows: ⎡ ⎤ ⎡ ⎤ 5245 1245 ⎢3 4 3 3⎥ ⎢3 4 5 3⎥ ⎢ ⎥ ⎢ ⎥ c ⎢ 2 1 3 4 ⎥ =⇒ ⎢ 2 1 5 4 ⎥ . All such designs form a neighborhood of Dc . For example. Exchange these two entries and obtain a new design on the right-hand side. say 1 and 4.’ When the function h(·) has many local minima.1. then randomly choose two entries in column 2. (4.Optimization in Construction of Designs for Computer Experiments 107 column.. 54 ).2) is a U (5. The local search (LS) algorithm. Randomly choose two columns of Dc . ⎡ ⎤ ⎡ ⎤ 1245 1245 ⎢3 5 5 3⎥ ⎢3 4 5 3⎥ ⎢ ⎥ ⎢ ⎥ c ⎢ 2 1 3 4 ⎥ =⇒ ⎢ 2 1 3 4 ⎥ . LLC .

if ∇h > T h. Often one chooses Ti = (m − i + 1) T h . and let T h = αT h0 . Therefore.1. (4. if exp (−∇h/T h) < u.’ Therefore. A threshold T h(≥ 0) is used for this purpose in the threshold accepting rule: TA Rule: 1. Alternatively. 0. Choose the predeﬁned sequence of threshold h-range> T1 > T2 > · · · > Tm = 0.108 Design and Modeling for Computer Experiments process at a local minimum point and might be far from the global minimum.’ by analogy to the physical process of the annealing of solids. if exp (−∇h/T h) ≥ u. So the ‘TA rule’ becomes: TA Rule: 1. How to change the T h will be discussed in Section 4. the parameter T h in the SA will be monotonically reduced. © 2006 by Taylor & Francis Group.5) For the stochastic evolutionary (SE) algorithm. A slightly worse design is more likely to replace the current design than a signiﬁcantly worse design. the optimal process is similar to a random walk. where u is a random number and T h is a control parameter. otherwise h(Dnew ) will be accepted to replace h(Dc ) if it satisﬁes the following condition ∇h ≤ T h · u. The method used to choose a good value for T h is called the threshold accepting heuristic.4) The simulated annealing (SA) method uses a slightly diﬀerent replacement rule. the Dnew will replace the current design Dc if exp − ∇h Th ≥ u. and T h is a parameter called ‘temperature. the threshold will change in the optimization process. (4. The replacement rule for the SA is given by SA Rule: 1. Like the TA heuristic.2). 1). where α is a constant called the cooling factor. T h should be in the (0. if ∇h ≤ Ti . we can incorporate some heuristic rules for jumping out from a local minimum point. Like the SA and TA. When ∇h < 0. the case of T h = 0 reduces to the ‘LS Rule. denoted by u ∼ U (0. where T h = m 10 and m is a positive integer. for example. 0.5. m = 8. 0. replace h(Dc ) by h(Dnew ). the replacement rule combines LS and SA. h-range). Set T h = T h0 at the beginning of the optimal process.2. LLC . When T h ≥ h-range ≡ maxD∈D h(D) − minD∈D h(D). if ∇h ≤ T h. if ∇h > Ti . The performance of the SA largely depends on the selection of T h0 and α. and change Ti to Ti+1 in the iteration process according to some h-range predeﬁned rule. the optimization process needs to be repeated using various starting points. where u is a random number (see Section 2.

It can be easily veriﬁed that ∗ ∗ rit = rti = rit /s(i. As R is a positive deﬁnite matrix. j.e. and centered L2 -discrepancy. i. We shall discuss the details in the next section. k.18) and (2. such as the entropy. j ≤ n. where the kth column of Dc is chosen and two elements xik and xjk will be exchanged.. (2. ∗ ∗ rijt = rti = rjt s(i. © 2006 by Taylor & Francis Group. j. φp . LLC .19)). where rij is given by s rij = exp − k=1 θk |xki − xkj |p . let s(i. k. t). where R = (rij ) is the correlation matrix of the design Dc (cf. Since the search for optimal design involves a large scale combinatorial search. For any 1 ≤ t ≤ n and t = i. it has the Cholesky decomposition R = V V. j. 1 ≤ p ≤ 2. vij = 0 if i < j. t) = exp {θk (|xjk − xtk |p − |xik − xtk |p )} . In this section the neighborhood is chosen as the smallest one discussed in Example 17. j. the ability to quickly calculate the objective function is crucial for optimizing a design within a reasonable time frame. 1 ≤ i. t). only the elements in row i and j and columns i and j of R are changed. Denote the new correlation matrix by ∗ R∗ = (rij ).Optimization in Construction of Designs for Computer Experiments 109 Each optimization algorithm has its own stopping rule that is controlled by ∇h and a few controlling parameters.4 Iteration formulae To reduce computational complexity this section gives iteration formulae for diﬀerent criteria. 4. Let us introduce iteration formulae for the following criteria: Entropy criterion: We maximize log|R|. This strategy allows highly eﬃcient calculation of the objective function of the new design.1. The Cholesky factorization algorithm is: REPEAT FROM i = 1 TO n i−1 2 vii = 1 − k=1 vki REPEAT FROM j = i + 1 TO n i−1 vij = (rij − k=1 vki vkj )/vii END LOOP END LOOP After an exchange of xik and xjk . k. where V = (vij ) is an upper triangle matrix.

. The computational complexity of the modiﬁed Cholesky factorization algorithm will depend on both n and n1 . O V3 where V3 is also an upper triangle matrix. and therefore the computational complexity for totally re-evaluating the entropy will be O(sn2 ) + O(n3 ). We can use our knowledge about the above iteration process to reduce the computational load. the evaluation of © 2006 by Taylor & Francis Group. In addition. Let n1 = min(i. the Cholesky decomposition of R∗ will be useful. if n1 = n − 1.110 Design and Modeling for Computer Experiments For calculating the determinant of R∗ . On the other hand. Let R1 = V1 V1 be the Cholesky decomposition. R2 R3 where R1 : n1 × n1 . which is not dramatically better than O(n3 ) + O(sn2 ). On average. The total computational complexity of the new method will be between O(n) + O(n2 ) and O(n) + O(n3 ).3 and deﬁned in (2. the calculation of the elements of R costs O(sn2 ).e. φp criterion: The φp criterion was introduced in Section 2.4. The rest of the elements of V can be calculated by the following modiﬁed Cholesky factorization algorithm: REPEAT FROM i = 1 TO n1 REPEAT FROM j = n1 TO n i−1 vij = (rij − k=1 vki vkj )/vii END LOOP END LOOP REPEAT FROM i = n1 TO n 2 vii = 1 − k=1 vki REPEAT FROM j = i + 1 TO n i−1 vij = (rij − k=1 vki vkj )/vii END LOOP END LOOP i−1 The computational complexity of Cholesky factorization (or decomposition) is O(n3 ). R2 : n1 × (n − n1 ). some improvement in eﬃciency is achievable by modifying the Cholesky algorithm. and let R be re-expressed as R1 R2 R= . Note that the elements of V with index 1 ≤ i ≤ j ≤ n1 are kept unchanged. For example. the computational complexity will be smaller than O(n3 ) but larger than O(n2 ). the computational complexity will be O(n2 ). The calculation of φp includes three parts. Now the Cholesky factorization V of R can be computed based on V1 : V= V 1 V2 . While the determinant of the new R matrix cannot be directly evaluated based on the determinant of the old R matrix. and R3 : (n − n1 ) × (n − n1 ). if n1 = 1. i. j). LLC . the computational complexity will still be O(n3 ).22).

j. In total. j.7). t) jt tj jt The new φp -value becomes ⎡ φ∗ = ⎣φp + p p 1≤k≤n. Now. k. This results in signiﬁcant reduction of computation time compared to the original calculation of φp . The computational complexity of calculating the CL2 discrepancy is O(sn2 ).j 1/q 1/q . j.k=i. then d∗ = d∗ = [dp + s(i. The centered L2 -discrepancy: This was deﬁned in (3. Let C = (cij ) be a symmetric matrix. the total computational complexity of the new calculation technique is O(n) + O(n log2 (p)). We can use an idea similar to that of φp to improve the computational eﬃciency.Optimization in Construction of Designs for Computer Experiments 111 all the inter-site distances. Denote the new distance matrix as D∗ = (d∗ ). To this end. otherwise. where zik = xik − 0. if i = j. After an exchange between xik and xjk . the computational complexity will be O(sn2 )+O(n2 log2 (n))+O(s2 log2 (p)). the sorting will take O(n2 log2 (n)) operations (cf. Let Z = (zij ) be the centered design matrix. let ij s(i. t) = |xjk − xtk |q − |xik − xtk |q . s s 2 1 1 2 k=1 (1 + |zik |) − n k=1 (1 + 2 |zik | − 2 zik ). (1992)). Press et al. t)] it ti it and d∗ = d∗ = dp − s(i. calculating φp will be very timeconsuming. LLC . express φp as follows: ⎡ φp = ⎣ 1≤i<j≤n n ⎤1/p d−p ⎦ ij . For any 1 ≤ j ≤ n and j = i. Such a more eﬃcient computation can be constructed by avoiding the sorting of inter-site distances. in the D matrix only elements from rows i and j and columns i and j are changed. whose elements are: cij = 1 n2 1 n2 s 1 k=1 2 (2 + |zik | + |zjk | − |zik − zjk |). and the evaluation of φp will take O(s2 log2 (p)) operations (since p is an integer.j (d∗ )−p − d−p ⎦ jk jk . the sorting of those inter-site distances to obtain a distance list and index list. where dij = ( k=1 (xik − xjk )q )1/q is the inter-site distance between xi and xj and D = (dij ). k.5. © 2006 by Taylor & Francis Group. k. Thus. ⎤1/p (d∗ )−p − d−p + ik ik 1≤k≤n. The evaluation of all the inter-site distances will take O(mn2 ) operations. p-powers can be computed by repeated multiplications). Therefore. a more eﬃcient computation technique is needed. and the evaluation of φp . j.k=i.

k. The time ratio.where To and Tn are the times needed to calculate the criterion using the original methods (i. it and c∗ = cjt /γ(i. totally recalculate the value using all elements in matrices) and those of the new methods are summarized in Table 4.1. k.. t) = 2 + |zjk | + |ztk | − |zjk − ztk | . j. j. for the entropy criterion. LLC . For any 1 ≤ t ≤ n and t = i. because of the involvement of matrix determinant calculation.112 It can be veriﬁed that Design and Modeling for Computer Experiments [CD(Dn )] = 2 13 12 2 n n + i=1 j =1 cij . the square CD-value of the new design Dn is given by n ∗ [CD(Dn )]2 = [CD(Dn )]2 + c∗ − cii + c∗ − cjj + 2 ii jj t=1. j. 2 + |zik | + |ztk | − |zik − ztk | ∗ After an exchange of xik and xjk . From the table. The total computational complexity is also O(n). with the new approach.e. t).j (c∗ − cit + c∗ − cjt ).e. To /Tn . we ﬁnd that for the φp criterion and the CL2 criterion.1 Computational Complexity of Criterion Evaluation Metho d Original Prop osed φp O (mn2 ) + O (n2 log2 (n)) +O (s2 log2 (p)) O (n) + O (n log2 (p)) CL2 O (mn2 ) O (n) Entropy O (n3 ) + O (mn2 ) O (n2 ) + O (n) ∼ O (n3 ) + O (n) Table 4. jt The computational complexity of this new approach is O(n). j. let γ(i. k. the eﬃciency is not dramatically improved. The empirical results in Table 4.t=i. it jt where c∗ = γ(i. However. which is much less than the original O(sn2 ).2 provides examples of the performance comparison of computing times of the optimization criteria for various size of LHDs. t)cit . respectively. TABLE 4. The new computational complexity is close to O(n) in both cases. calculation using all matrix elements) and the new more eﬃcient methods. where Dn is the current design. show the improvement of computational eﬃciency.2 match well with the analytical © 2006 by Taylor & Francis Group.. A comparison of the computational complexity of the original technique (i. the eﬃciency can be signiﬁcantly improved.

1 Algorithms For all four optimization methods. LS.5 15. 4.5 12.9 To /Tn 4. This result reinforces the attractiveness of uniform design using the CL2 criterion from the design generation point of view. denoted by N (Dc ). (2005).1 Entropy(θ = 5.0 CL2 Tn 2. The larger the size of the design. SA. Interested readers should consult Jin et al. for 100 × 10 LHDs. Start from a randomly chosen initial design D0 from D. SA. t = 1) LHDs 12 × 4 25 × 4 50 × 5 100 × 10 To 12.2 1. the more the computational savings are. It is also observed that the computing time needed for the φp criterion is up to 3.2 Computing Time (in seconds) of Criterion Values for 500.8 45.0 times as many as that of the CL2 criterion.0 1305. 000 LHDs (To = the time needed by the original calculation methods to evaluate 500. and SE. the calculation of the CL2 criterion only requires about 1/82 of the computation eﬀort of the original method using the whole matrix. their algorithms have the same structure: Algorithm for stochastic optimization: Step 1. introduced in the previous section.0 To /Tn 1. For example. TA.2. Calculate ∇h = h(Dnew ) − h(Dc ) for the given criterion h.4 3. from N (Dc ).2 12.2 53.0 Tn 5.5 10.3 82.Optimization in Construction of Designs for Computer Experiments 113 examinations in Table 4. the entropy criterion is much less eﬃcient.9 2. © 2006 by Taylor & Francis Group.7 41.4 6.0 1012.2 5. t = 2) To 16.2 To /Tn 2. Tn = the time needed to calculate 500.0 2116.2 39. To /Tn is the ratio of To and Tn ) φp (p = 50. denoted by Dnew . TABLE 4.2 Optimization Algorithms This section gives summaries of optimization algorithms for the four optimization methods. and SE.0 Tn 14. set Dc = D0 and the threshold parameter T h to its start value Tc . Choose a design.1 4.8 167.6 75.0 239.5 197. LS.1 2.5 To 10. 000 LHDs.1 19. TA.1 30.0 1378. Step 2.1.3 347. LLC . Find the neighborhood of Dc . 000 LHDs using the new more eﬃcient methods. Compared to the other two criteria.1 30.

LLC . Deliver Dc . The replacement probability for the above four algorithms is given by: Local search p(Tc . Threshold Accepting p(Tc . 1. if ∇h ≤ Tc . Step 3. For details the reader can refer to Section 4. Step 4. Dc . Tc changes according to the current situation. denoted by Dnew . Let us introduce them one by one. Replace Dc with the new design Dnew if ∇h ≤ 0. ⎩ 0 otherwise. It is the simplest algorithm among the four algorithms introduced in this section. stop the process. Replace Dc with the new design Dnew if • ∇h ≤ 0. Dnew ) = 1 − ∇h/Tc . denoted by N (Dc ). It is clear that each stochastic optimization algorithm has its own controlling rule. otherwise. Repeat Step 2 and Step 3 until either fulﬁlling the rule for changing the threshold value or fulﬁlling the stopping rule. if ∇h < Tc . where Tc is chosen from the pre-determined series T1 > T2 > · · · > Tm . Calculate ∇h = h(Dnew ) − h(Dc ). 0. 0. 4. Dc .2 Local search algorithm The local search (LS) algorithm takes T h = 0 in the entire process. Step 1. else.5.2. Dnew ) = 1. Finally. Dc . Here.114 Design and Modeling for Computer Experiments Step 3. otherwise. Step 2. Dnew ) = where Tc = 0. Step 4. Simulated annealing p(Tc . Stochastic evolutionary ⎧ if ∇h < 0. ⎨ 1.2. deliver Dc . Dnew ) = exp{∇h/Tc }. Dc . Tc = αT h (0 < α < 1). from N (Dc ). • otherwise. Dc . if ∇h ≤ Tc . p(Tc . Start with a randomly chosen initial design D0 from D and let D = D0 . c © 2006 by Taylor & Francis Group. Find the neighborhood of Dc . otherwise. replace Dc by Dnew with a probability of p(Tc . Choose a design. Dnew ).

it has been successfully applied to many famous problems. Calculate ∇h = h(Dnew ) − h(Dc ).4 Threshold accepting algorithm The threshold accepting (TA) algorithm is a modiﬁcation of the SA. Step 1. for details the reader can refer to Winker and Fang (1997). Step 2. Initialize T h = T0 . from N (Dc ). α. Choose a design.Optimization in Construction of Designs for Computer Experiments 115 The LS algorithm could quickly ﬁnd a locally optimal design. It is recommended to test a small-size design ﬁrst in order to get experience in the choice of T0 and α. otherwise. and imax .2. It has been shown to be simpler and more eﬃcient than the SA in many applications. 4. the graph coloring problem. Gelett and Vecchi (1983) for a cost function that may possess many local minima.’ the minimum energy conﬁguration is obtained. Fang. Lin. c else i = i + 1. Replace Dc with the new design Dnew and Flag=1 if ∇h < 0.2. denoted by Dnew . Deliver Dc . Repeat Step 2 and Step 3 until Flag=0.3 Simulated annealing algorithm Simulated annealing (SA) algorithm was proposed by Kirkpatrick. Find the neighborhood of Dc . 4. To achieve good results a careful cooling parameter tuning is required. Step 4. As mentioned before. LLC . Winker and Zhang (2000). The column pairwise algorithm is a local search algorithm. Start with a randomly chosen initial design D0 from D and let D = D0 . Repeating the local search algorithm from diﬀerent initial designs can increase the chances of reaching the global minimum. u ∼ U (0. 1). let T h := αT h. Step 3. Although the convergence rate of the SA is slow. If i > imax . such as the traveling problem. Set i = 1 and F lag = 0. if exp{−∇h/T h ≥ u}. There are several versions of the TA algorithm. and the number partitioning problem. α is a cooling parameter. © 2006 by Taylor & Francis Group. A too large T0 or α may lead to slow convergence and tremendous computational cost while a too small T0 or α may lead to an inferior design. A given increment in criterion value is more likely to be accepted early in the search when temperature has a relatively high value than later in the search as the temperature is cooling. denoted by N (Dc ). the graph partitioning problem. Fang. and was proposed by Dueck and Scheuer (1991). It works by emulating the physical process whereby a solid is slowly cooled so that when eventually its structure is ‘frozen. Winker and Fang (1998).

Let nacpt be the number of accepted designs in the inner loop and nimp be the better designs found in the inner loop. Jin. from N (Dc ). 0. However. The outer loop controls the entire optimization process by adjusting the threshold T h in the acceptance criterion. let Dc = D0 . Step 5. and Winker (2001). denoted by N (Dc ). denoted by Dnew . otherwise. 4. Find the neighborhood of Dc . Replace Dc with the new design Dnew if ∇h ≤ Tc . Step 2. Step 3.005×criterion value of the initial design. T h is set back to T h0 . Step 1. e. Its value is incremented based on certain ‘warming’ schedules only if it seems that the process is stuck at a local minimum.2. Choose a design. deliver Dc . The inner loop process is similar to that of the TA algorithm. T1 > T2 · · · > Tm . The threshold T h is initially set to a small value T h0 . Winker and Zhang (2000). The threshold T h is maintained on a small value in the improving stage and is allowed to be increased in the exploration stage. T h is set to a small value. The following version was used in Fang. Start from a randomly chosen initial design D0 from D.5 Stochastic evolutionary algorithm The stochastic evolutionary (SE) algorithm. is a stochastic optimization algorithm that decides whether to accept a new design based on a threshold-based acceptance criterion. LLC . The latter gives a comprehensive introduction to the TA and its applications. If the given threshold sequence is not yet exhausted. proposed by Saab and Rao (1991). In the outer loop. Step 4. Repeat Step 2 and Step 3 a ﬁxed number of times. it is often diﬃcult to decide the value of T h0 and the warming schedule for a particular problem. and give a sequence of the threshold parameter T h. whenever a better solution is found in the process of warming up. Chen and Sudjianto (2005) developed an enhanced stochastic evolutionary (ESE) algorithm that uses a sophisticated combination of warming schedule and cooling schedule to control T h so that the algorithm can adjust itself to suit diﬀerent experimental design problems. Saab and Rao (1991) show that the SE can converge much faster than the SA. The ESE algorithm consists of an inner loop and an outer loop. T h will be increased or decreased based on the following situations: © 2006 by Taylor & Francis Group. and starting value Tc = T1 . Calculate ∇h = h(Dnew ) − h(Dc ). The inner loop picks up new designs from the neighborhood of the current design and decides whether to accept them based on an acceptance criterion. Therefore. take the next threshold value and repeat Step 2 and Step 4.g.. else leave Dc unchanged.116 Design and Modeling for Computer Experiments Lu and Winker (2003). Winker and Fang (1997) gave a detailed discussion on the choice of the neighborhoods and the two controlling parameters. Lin. at the beginning.

centered discrepancy. or T h := T h/α1 . If nacpt is larger than the large percentage. Therefore. set α2 = 0. where 0 < α1 < 1.3 Lower bounds of the discrepancy and related algorithm Suppose that a discrepancy (for example. 0. set α1 = 0. T h will ﬂuctuate within a range based on the value of nacpt . many authors have invested much eﬀort in ﬁnding some strict lower bounds for diﬀerent discrepancies. This process will be repeated until an improved design is found. The ESE algorithm is more complicated than the above three algorithms. respectively: T h := α1 T h. For example. 0.. LLC . if one can ﬁnd a strict lower-bound of the discrepancy on the design space.8) of J. where 0 < α3 < α2 < 1. Figure 4.. T h is increased rapidly (so that an increasing number of worse designs could be accepted) to help the search process move away from a locally optimal design.9 and α3 = 0. T h is decreased slowly to search for better designs after moving away from the local optimal design.g. The word ‘strict’ means the lower bound can be reached in some cases. for example. T h will be increased if nacpt is less than the same percentage of the number of total designs J.7. or wrap-around discrepancy) is employed in construction of uniform designs. If nacpt is less than a small percentage (e. The following equations are used in the algorithm to decrease or increase T h.1 presents the ﬂow chart of the ESE algorithm. T h will be rapidly increased until nacpt is larger than a large percentage (e. T h will be decreased if nacpt is larger than a small percentage (e. q s ).Optimization in Construction of Designs for Computer Experiments 117 (a) In the improving stage. The following equations are used to decrease or increase T h. or T h := T h/α3 . (b) In the exploration stage. 4. 0.g. the process will be terminated since no further improvement is possible. For a given design space U (n.8. As soon as the discrepancy of the current design in the optimization process reaches the lower bound.g. this lower bound can be used as a benchmark for construction of uniform designs and can speed up the searching process by the use of some optimization algorithm.1) of J. respectively: T h := α2 T h. T h will be maintained at a small value so that only better designs or slightly worse designs will be accepted. © 2006 by Taylor & Francis Group.1) of the number of total designs J and nimp is less than nacpt .. T h will be slowly decreased until nacpt is less than the small percentage. Speciﬁcally. the categorical discrepancy.

© 2006 by Taylor & Francis Group.1 Flowchart of the enhanced stochastic evolutionary algorithm.118 Design and Modeling for Computer Experiments FIGURE 4. LLC .

3. denoted by LBWD2 (n.2.7) + 1 n 3 2 + n−1 n 6 5 λ .m. · · · . respectively.9) + 1 n 3 2 s + n−1 n 27 23 λ .e. while the ordering is reversed when n > s. · · · . is the number of positions where they diﬀer.8) (4.3 3 23 18 s and (4. q s ).3. Lu and Winker (2003) pointed out that the lower bound LBWD2 (n. The maximum of these two lower bounds can be used as a benchmark in searching process.Optimization in Construction of Designs for Computer Experiments 119 4. denoted by LBWD2 (n. 2s ) r s is more accurate than the bound LBWD2 (n. 2s ) are given by c LBWD2 (n. q s ). DH (xk . 2 ) when n ≤ s. two lower bounds are given as follows: Theorem 7 Two lower bounds of the wrap-around L2 -discrepancy over the design space U (n.2 Lower bounds of the wrap-around L2 -discrepancy Fang.m.6. 2s )=− m=1 1 5 k m sn.2 is the remainder at division of n by 2m (n mod 2m ). λ = s(n − 2)/[2(n − 10)]. Lu and Winker (2003) obtained strict lower bounds for centered and wrap-around L2 -discrepancies for two. where sn.5 and 3. The Hamming distance between rows xk = (xk1 . xls ). xl ) = #{j : xkj = xlj .m.m. They gave two kinds of lower bounds: one is related to the distribution of all the c level-combinations among columns of the design. © 2006 by Taylor & Francis Group.3 s 1− m s m n. · · · .6) (4.1 Lower bounds of the categorical discrepancy Theorem 2 in Section 3. 3s ) are given by c =− LBWD2 (n. Theorem 6 Two lower bounds of the wrap-around L2 -discrepancy over the design space U (n. For three-level U-type designs. j = 1. For two-level U-type designs. LLC . their results are summarized in the following theorem.2 2 5 4 k and (4.2 s 1− m s m n.6 gives lower bounds for the categorical discrepancy that have played an important role in construction of uniform designs in Sections 3. xks ) and xl = (xl1 . 2s )= 11 8 1 n2 4 3 s − 5 4 k s 4 3 s s + r LBWD2 (n. and another is based on Hamming distances of any two rows of the der sign. c Fang. 3s ) 4 3 23 18 4 3 s + s k 73 54 s + 4 23 m 1 n2 r LBWD2 (n. 3s )=− m=1 s sn. i. and [x] is the integer part of x. 4. s}.m.and three-level U-type designs.

j = 1. The following table gives the distribution of α-values over k the set {αij : 1 ≤ i < j ≤ n. xls ). and j α xl = (xl1 . j = 1. s). Fang. these products can only take (q + 1)/2 possible values. for a U-type design U (n. where sn. 1 ≤ k ≤ s}..8).. · · · . q s ). 4(2q − 4)/(4q 2 ). s}. 2(2q − 2)/(4q 2 ).. this distribution is the same for each design in U (n. when q is odd. · · · . Recently. q s ). i. However. its α-values can only be limited into a speciﬁc set. q. q s ) with even q and odd q is given by © 2006 by Taylor & Francis Group. for a U-type design. · · · . respectively.e. 4(2q − 4)/(4q 2 ). 0. 3s ) is more accur s rate than LBWD2 (n.m. . . it is easy to see that the wrap-around L2 -discrepancy is only a k function of products of αij ≡ |xik − xjk |(1 − |xik − xjk |) (i.3 is the residual of n (mod 3m ) and λ = Similar to the two-level case. ms). 3 ). Theorem 8 A lower bound of the wrap-around L2 -discrepancy on U (n. The maximum of these two lower bounds can be used as a benchmark in the search process. otherwise..120 Design and Modeling for Computer Experiments k(n−3) 3(n−1) .e. . α-values can only take q/2+1 possible values. LBWD2 (n. LLC . q s ). denote by Fkl the distribution of their {αkl . q 2 /(4q 2 ). i. when q is even. From the analytical expression of equation (3. q s ) for each positive integer q. n. . q even q odd α − values number α − values number sn(n − q) sn(n − q) 0 0 2q 2q 2 2(2q − 2) sn 2(2q − 2) sn2 4q 2 q 4q 2 q 4(2q − 4) sn2 4(2q − 4) sn2 4q 2 q 4q 2 q ··· ··· ··· ··· ··· ··· ··· ··· (q − 2)(q + 2) sn2 (q − 3)(q + 3) sn2 4q 2 q 4q 2 q q2 sn2 (q − 1)(q + 1) sn2 4q 2 2q 4q 2 q For any two diﬀerent rows of the design U (n. i = j and k = 1. α The Fkl ’s can characterize whether a U-type design is a uniform design or not. · · · . the accuracy is reversed. the two lower bounds have diﬀerent accuracy in c diﬀerent situations. (q − 1)(q + 1)/(4q 2 ). Tang and Yin (2005) proposed a lower bound on the design space U (n. . Usually. We shall see that this fact is very useful in ﬁnding the lower bounds and create a new algorithm. 0. · · · . when k ≤ (n − 1)/2. Note that given (n. 2(2q −2)/(4q 2 ). xk = (xk1 . xks ). for both even and odd q. More precisely. .

s respectively.Optimization in Construction of Designs for Computer Experiments 121 LBeven n−1 = Δ+ n 3 2 s(n−q ) q (n−1) 5 4 2sn q (n−1) sn q (n−1) 3 2(2q − 2) − 2 4q 2 2sn q (n−1) ··· 3 (q − 2)(q + 2) − 2 4q 2 LBodd n−1 = Δ+ n 3 2 s(n−q ) q (n−1) and 3 2(2q − 2) − 2 4q 2 2sn q (n−1) 2sn q (n−1) ··· 3 (q − 1)(q + 1) − 2 4q 2 s . we deﬁne m 3 j ln( − αkl ). where Δ = − 4 1 3 + . 2s ) with n = 2s−k for some positive integer k (k ≤ s). 3. 3 ) obtained in Theorems 6 and 7. 4. which is called the balance-pursuit heuristic. if all its Fkl distributions. It will be introduced in Section 4. Theorem 9 Consider the design space U (n. p = q ≤ n the fact α α that Fkl = Fpq implies δkl = δpq . respectively. Therefore.3. δkl = 2 j =1 for any two rows k and l. the above lower bounds are equivalent to LBWD2 (n. In this case.3 Lower bounds of the centered L2 -discrepancy The ﬁrst strict lower bound of the centered L2 -discrepancy was obtained by Fang and Mukerjee (2000). A U-type design U (n. are the same. 2s ) and r s LBWD2 (n. A lower bound of the centered L2 -discrepancy on the design space is given by LBCD2 = 13 12 s −2 35 32 s s−k + r=0 s 1 1 + r 8r n s r=s−k+1 s 1 . the W D-value of this design achieves the above lower bound. k = l. q s ) is a 3 n 2 α uniform design under the wrap-around L2 -discrepancy. Tang and Yin (2005) proposed a more powerful algorithm. but the inverse may not be true. Theorem 8 provides not only the lower bounds that can be used for a α benchmark. r 4r © 2006 by Taylor & Francis Group. Checking that all α Fkl distributions are the same imposes a heavy computational load. but also the importance of balance of {Fkl }. Aiming to adjust those δkl s as equally as possible.3. LLC . r When q = 2.4. Fang. for any 1 ≤ k = l. Obviously.

Fang. Tang and Winker (2005) used a diﬀerent approach to ﬁnd some lower bounds for the design space U (n. since it will depend on the structure and property of the design. (4. which we call the balance-pursuit heuristic (BPH. Ma and Winker (2002). or if {Fkl } or {δkl } are the same if the wrap-around L2 -discrepancy is being used.10) and λ = s(n − 2)/[2(n − 1)]. Similar to the threshold accepting heuristic. proposed by Fang. Lu and Winker (2003). Then it goes into a large number. Fang. In fact. say τ . Fang and Ma (2001b). The diﬀerence between the discrepancies of Dnew and Dc is calculated and © 2006 by Taylor & Francis Group. 4s ). Furthermore. Lu and Winker (2003) extended the above result to all two-level U-type designs as follows: Theorem 10 Consider the design space U (n. In each iteration the algorithm tries to replace the current solution Dc with a new one. The reader can ﬁnd the details in their paper. A lower bound of the centered L2 -discrepancy on the design space is given by LBCD2 = 13 12 s −2 35 32 s + 1 n 5 4 s + n−1 n 5 4 λ . of iterations. the aim of using a temporary worsening up to a given threshold value is to avoid getting stuck in a local minimum. The new design Dnew is generated in a given neighborhood of the current solution Dc .3. Maringer. the BPH algorithm has more chances to generate better designs in the sense of lower discrepancy in each iteration. That gives us a possibility to propose a more powerful algorithm. the BPH algorithm starts with a randomly generated Utype design D0 . a neighborhood is a small perturbation of Dc . One of the advantages of the BPH algorithm is that it does not require a threshold accepting series. LLC . Fang. Recently. But how to determine a proper threshold accepting series is itself a diﬃcult problem. The BPH algorithm also uses a random warming-up procedure. Compared with the existing threshold accepting heuristic. the above theory shows that the lower bound can be reached if the Hamming distances between any two α rows are the same if the categorical discrepancy is being used. for simplicity). As stated in Winker and Fang (1998). which can save considerable time in the computational searching.122 Design and Modeling for Computer Experiments Late. since it gives an approximate direction to the better status. for example Fang. 2s ). 4. Tang and Yin (2005).4 Balance-pursuit heuristic algorithm The threshold accepting heuristic has been successfully applied for ﬁnding uniform designs by Winker and Fang (1998). but it uses a diﬀerent way to jump out from a local minimum. which plays an important role in the threshold accepting heuristic. the lower bounds discussed in this section can be used as a benchmark. However. 3s ) and U (n.

2. According to Theorem 8. ˜ ˜ xt ) and row pair (xj .2. then for any row xt other than xi or xj . we know that the wrap-around L2 -discrepancy can be expressed in terms of the sum of eδij s. similarly to the way we introduced in Section 4. Based on the formula (3. (B) Iteration: Instead of calculating two discrepancies of W D2 (Dnew ) and W D2 (Dc ). Suppose the k-th elements in rows xi and xj are exchanged. or the design needs to be warmed-up. Tang and Yin (2005): © 2006 by Taylor & Francis Group. k k ˜ δjt = δjt + ln(3/2 − αit ) − ln(3/2 − αjt ). Tang and Yin (2005). and the objective function change will be ∇= t=i. This requirement can be easily fulﬁlled by selecting one column of Dc and exchanging two elements in the selected column. If the result is not worse. instead of using random selection elements within a column for exchanging as done in the literature. Three key points are emphasized as follows: (A) Neighborhood: Most authors choose a neighborhood of the current solution Dc so that each design in the neighborhood is still of the U-type. xt ) will be changed. then k k ˜ δit = δit + ln(3/2 − αjt ) − ln(3/2 − αit ). ˜ ˜ (C) Lower bound: The last key point is the lower bound. there are altogether 2(n − 2) distances (δij s) updated. As soon as the lower bound is reached. k k Here αit and αjt are α-values as deﬁned in Section 4.8). So our two pre-selection methods both aim to distribute the distances δkl s as evenly as possible. LLC . To enhance convergence speed the BPH algorithm suggests two possible pre-selection methods for determining the neighborhood choice. we should reduce diﬀerences among the current δkl s.3.Optimization in Construction of Designs for Computer Experiments 123 compared in each iteration. xt ) and row pair (xj . the distances of row pair (xi . Two methods. then we replace Dc with Dnew and continue the iteration. the BPH algorithm focuses on the diﬀerence between W D2 (Dnew ) and W D2 (Dc ).1. are suggested by Fang. The following gives a pseudo-code of the BPH algorithm proposed by Fang. maximal and minimal distances of row pairs and single row with maximal and minimal sum of distances. Denote the new distances between row pair (xi . And for a single exchange of two elements in the selected column. the process will be terminated.j eδit − eδit + eδjt − eδjt . xt ) as δti and δtj .

q m ) and let Dmin := Dc 3 for i = 1 to τ do 4 5 6 7 8 9 10 11 12 13 14 15 Generate Dnew ∈ N (Dc ) by randomly using two pre-selection methods if W D2 (Dnew ) achieves the lower bound then return(Dnew ) end if if W D2 (Dnew ) ≤ W D2 (Dc ) then Dc := Dnew if W D2 (Dnew ) < W D2 (Dmin ) then Dmin := Dnew end if else if rand(1000)< 3 then Dc := Dnew end if 16 end for 17 return(Dmin ) © 2006 by Taylor & Francis Group. LLC .124 Design and Modeling for Computer Experiments The BPH Algorithm for searching uniform designs under W D2 1 Initialize τ 2 Generate starting design Dc ∈ U(n.

Analysis of functional response in the context of design of experiment is a relatively new topic. The last chapter treats computer experiments with functional responses. spline regression. 125 © 2006 by Taylor & Francis Group. and local polynomial regression. Real-life case studies will be used to demonstrate the proposed methodology. Chapter 5 begins with fundamental concepts of modeling computer experiments. neural network approach.Part III Modeling for Computer Experiments In this part. Various methods for sensitivity analysis are presented in Chapter 6. and describes the commonly used metemodeling approaches. These methods are very useful tools for interpreting the resulting metamodels built by using methods introduced in Chapter 5. We introduce several possible approaches to deal with such kinds of data. LLC . including polynomial regression. Kriging method. we introduce diverse modeling techniques for computer experiments.

n}. thereafter. we aim to extend regression techniques existing in statistical literature to model computer experiments. (5. Thus. Let us begin with some fundamental concepts in statistical modeling. therefore. Once the data have been collected from an experiment.9) built on a computer experiment sample {(xi .3. it is a natural question 127 © 2006 by Taylor & Francis Group. This view suggests that many fundamental ideas and concepts in statistical modeling may be applied and extended to modeling computer experiments. (5. For some useful techniques in regression. The primary goal of metamodeling is to predict the true model f (x) at an untried site x by g(x) in the metamodel (1.6.1) does not have a random error term on the right-hand side. Note that compared to nonparametric regression models in the statistical literature. · · · . we describe the relationship between the input variables and the output variable by the model (1.1. 5. LLC . the reader can refer to Appendix A.1) or output variable = f (input variables).5 Metamodeling In the previous chapters various types of designs for computer experiments were introduced. In this section. In this chapter. · · · . yi ). Some concepts and methods of modeling were brieﬂy reviewed in Section 1.1 Mean square error and prediction error Let y denote the output variable and x = (x1 .1) where f is an unspeciﬁed smooth function to be approximated.. i = 1.1 Basic Concepts Modeling computer experiment can be viewed as regression on data without random errors. no random errors). xs ) consist of the input variables. we will refresh some useful ideas and concepts on modeling.e. Notice that the outputs of computer experiments are deterministic (i. 5. we will provide more detailed discussion on the methodologies. we wish to ﬁnd a metamodel which describes empirical relationships between the inputs and outputs.

LLC . n = p).5)8 + 11147. we may collect a large number of new samples. As an illustration. In such © 2006 by Taylor & Francis Group. yk ).3571(x − 0. 1.9332(x − 0.· · · .8565(x − 0.9931 + 1.. 11} be a computer experiment sample from the true model f (x) = 2x cos(4π x) at xk = 0.5)4 + 1857. The prediction error reaches its minimum at p = 9.0006(x − 0. When computer experiments are neither time-consuming nor computationally intensive.8838(x − 0. More generally. Then WMSE becomes equal to MSE.128 Design and Modeling for Computer Experiments how to assess the performance of g(x). 1.5)2 − 152.5) − 76. we may deﬁne weighted mean square error (WMSE) WMSE(g) = T {f (x) − g(x)}2 w(x) dx. Intuitively.5)9 . as small as possible over the whole experimental region T .1427(x − 0.e. and then calculate the prediction error.1698(x − 0. we want the residual or approximate error. The weighted function w(x) allows one to easily incorporate prior information about the distribution of x over the experimental domain. Figure 5.5)3 +943. · · · . Deﬁne mean square error (MSE) as MSE(g) = T {f (x) − g(x)}2 dx. We now give a simple illustrative example. deﬁned as f (x) − g(x).5)5 − 3983. The ﬁtted curve and ﬁtted values at observed x’s are also depicted in Figure 5. Note that when the degree of polynomial equals 10 (i. From the above deﬁnition of the prediction error. When such prior information is not available.1. when x is uniformly scattered over the experiment region T .9600(x − 0. · · · . we ﬁt the data by a polynomial model with degrees p equal to 0. Figure 5. For instance.5)6 −7780. Example 18 Let {(xk . each run in Example 2 required 24 hours of simulation. it is common to assume that x is uniformly distributed over T .1(a). 10. The weighted function w(x) here is set to be equal to 1 for computing the PE. In many situations. where w(x) ≥ 0 is a weighted function with T w(x) dx = 1. computer experiments are computationally intensive. where the approximate polynomial is g(x) = 0.7937(x − 0.1(b) displays the plot of prediction error versus the degrees of polynomial model ﬁtted to the data. 0. the MSE equals the prediction error (PE) in the statistical literature.1 (a) depicts the plots of true curve and the scatter plot of collected samples. Because there is no random error in the outputs of computer experiments. we need the value of f (x) over unsampled sites to calculate the prediction error.5)7 + 5756. it yields an interpolation to the collected sample. k = 1.

Metamodeling 2. ﬁtted curve. let g−i denote the metamodel based on the sample excluding (xi .. · · · . yi ). (b) is the plot of prediction error against the degree of polynomial ﬁtted to the data. For a pre-speciﬁed K. we may modify the cross validation procedure in the following way.1 Plots for Example 18. respectively). respectively.5 1 0.5 129 0. a general strategy is to estimate the prediction error of a metamodel g using the cross validation (CV) procedure. The prediction error reaches its minimum when p = 9. To reduce the computational burden. LLC . using CV1 scores becomes computationally too demanding because we need to build n metamodels.2 −1 −1. This procedure is referred to as leave-one-out cross validation in the statistical literature.5 0. Kriging and neural network models that will be introduced in Sections 5. Further let y(k) and g(k) be the vector consisting of observed values and predicted values for the k-th group using the g(−k) .4 0.6 0. direct evaluation of the prediction error seems to be computationally impractical. For i = 1. (a) is the plot of the true model.5 True Sample Fitted Curve Fitted Values 0.6 0.4 and 5. The cross validation score is deﬁned as CVn = 1 n n {f (xi ) − g−i (xi )}2 .g.1 −2 0 0. When the sample size n is large and the process of building nonlinear metamodel is time consuming (e. scatter plot of collected sample. n.7 2 1.8 1 0 0 2 4 6 8 10 (a) (b) FIGURE 5. Then the K-fold © 2006 by Taylor & Francis Group. Let g(−k) be the metamodel built on the sample excluding observations in the k-th group.4 0 0.5 0. situations. i=1 (5.6.3 −0.5 0. divide the sample into K groups with approximately equal sample size among the groups. and ﬁtted values using the polynomial regression model.2 0. To alleviate this problem.2) The cross validation score can give us a good estimate for the prediction error of g.

(5. BL (x)} is a set of basis functions deﬁned on the experimental domain. See. Minimizing the cross validation score with respect to λ yields a data-driven method to select the regularization parameter. the spline basis function. or the basis function constructed by the covariance function in the Kriging model (see Section 1. the basis function may be the polynomial basis function. the construction of a metamodel is in fact an interpolation problem mathematically.3) where {B0 (x). Since outputs of computer experiments are deterministic. the cross validation score depends on the regularization parameter and is denoted by CV (λ). λ Here. · · · . For instance. ˆ λ = min CV(λ). Denote y = (y1 . This data-driven approach to selecting the regularization parameter may be implemented in many modeling procedures for computer experiments. Therefore. the minimization is carried out by searching over a set of grid values for λ. b(x) = (B0 (x). (5. βL ) . to be selected. the cross validation score in (5. · · · . CV can be either CVn or CVK . LLC . 5.130 Design and Modeling for Computer Experiments cross validation score is deﬁned as CVK = 1 n K (y(k) − g(k) ) (y(k) − g(k) ). · · · . We shall discuss these basis functions in detail later on. In most modeling procedures. say λ.2) coincides with the n-fold cross validation score. In other words.2 Regularization Most metamodels in the literature for modeling computer experiments have the following form: L g(x) = j=0 Bj (x)βj . The theoretical properties of cross validation have been studied extensively in the statistical literature. In the statistical literature. β = (β0 . k=1 Thus.6).4) © 2006 by Taylor & Francis Group. Let us begin with some matrix notation. Li (1987). the metamodel g depends on some regularization parameter (see the section below).1. radial basis function. here CV can be either CV1 or CVK . yn ) . for example. · · · . BL (x)) .

This occurs when L ≥ n or when the columns of B are not linearly independent. The solution of (5.6). Consider (5.7) where || · || is the Euclidean norm and λ0 is a Lagrange multiplier.b) L showed that minimizing the 1 -norm β 1 = j=1 |βj | under the constraint of (5.6) There are many diﬀerent ways to construct the basis functions. (5. Instead of dealing with the constraint minimization problem directly. · · · . Donoho (2004a. B0 (xn ) · · · BL (xn ) ⎛ 131 (5.Metamodeling and ⎞ B0 (x1 ) · · · BL (x1 ) ⎜ B (x ) · · · BL (x2 ) ⎟ ⎟ B=⎜ 0 2 ⎝ ··· ··· ··· ⎠. and λ corresponds to a ridge parameter. This method is referred to as the normalized method of frame in the statistical literature (Rao (1973) and Antoniadis and Fan (2001)). the resulting estimate of β has many zero elements.6) yields a very sparse solution. there are many diﬀerent solutions for β that interpolate the sampled data y. Li (2002) established a connection between the normalized method of frame and penalized least squares for modeling computer experiments. we take L large enough such that following equation has a solution: y = Bβ. i.5) To interpolate the observed outputs y1 . · · · . We might face a situation where Equation (5. we consider the following minimization formulation: β 2 + λ0 y − Bβ 2 .9) where IL is the identity matrix of order L. and therefore. xn } using the basis B0 (x). (5.e. · · · . LLC . B1 (x). To deal with this issue.9) is known as ridge regression in the statistical literature (Hoerl and Kennard (1970)).7) as a penalized sum of least squares 1 y − Bβ 2 L 2 +λ l=1 2 βj . (5.6) becomes an undetermined system of equations. © 2006 by Taylor & Francis Group. yn over the observed inputs {x1 . We can further obtain the following metamodel ˆ gλ (x) = b(x) β λ . (5.8) where λ is referred to as a regularization parameter or tuning parameter.8) can be expressed as ˆ β λ = (B B + 2λIL )−1 B y.. The solution of (5. statisticians choose a parameter set β that minimizes β 2 under the constraint of (5.

This motivates us to pursue a less complicated metamodel (i.. which can provide the solution of (5. In particular.e. Fan and Li (2001) demonstrated that with proper choices of penalty function. a model with fewer terms (cf.3). Many traditional variable selection criteria in the literature can be derived from the penalized least squares with the L0 penalty: pλ (|β|) = 1 2 λ I(|β| = 0).10) with the L1 penalty: pλ (|β|) = λ|β|. In many applications. q > 0. Tibshirani (1996) proposed the LASSO algorithm. 2 where I(·) is the indicator function. The connection between the normalized method of frame and penalized least squares allows us to extend the concepts of variable selection and model selection in statistics to modeling for computer experiments. as demonstrated in Example 18. In this situation. which may depend on n and can be chosen by a data-driven criterion.) Frank and Friedman (1993) considered o the Lq penalty: pλ (|β|) = λ|β|q . Considering other penalty functions.132 Design and Modeling for Computer Experiments The regularization parameter λ can be selected to trade oﬀ the balance between bias and variance of the resulting metamodel using the cross validation procedure introduced in the previous section. and λ is a regularization parameter. for example. the resulting estimate can automatically reduce model complexity. The L1 penalty is popular in the literature of support vector machines (see.. such as the cross validation procedure or a modiﬁcation thereof. deﬁne a penalized least squares as Q(β) = 1 y − Bβ 2 L 2 +n j=0 pλ (|βj |). a larger value of λ yields a simpler model. Table 5. variable selection may play an important role in modeling for computer experiments. LLC . we may want to perform optimization using the computer model to ﬁnd a point with maximum/minimum response over the experimental domain. For this reason. which may not be the best predictor in terms of prediction error. (5. Smola and Sch¨lkopf (1998). it is desirable to build a metamodel g which has good prediction capability over the experimental domain yet which also has a simple form. i. taking λ to be zero yields an interpolation. more parsimonious model) by applying the variable selection approach. © 2006 by Taylor & Francis Group. because variable selection can be viewed as a type of regularization.e. Minimizing the penalized least squares yields a penalized least squares estimator.10) where pλ (·) is a pre-speciﬁed nonnegative penalty function. In principle.

1 s B2s+1 (x) = x1 x2 . Let {x1 . See Section A. if B has a full rank. 5. Using notation deﬁned in (5.” Fan and Li (2001) advocated the use of the smoothly clipped absolute deviation (SCAD) penalty.11) {yi − i=1 j=0 Bj (xi )βj }2 = y − Bβ 2 .4 for more discussions on variable selection criteria and procedures for linear regression models. and may suﬀer the drawback of collinearity.7.24) with all ﬁrst-order interactions have been the common choice in practice. · · · . The polynomial basis functions in (5. xn } consist of a design.2 Polynomial Models Polynomial models have been widely used by engineers for modeling computer experiments. to simplify the modeling process. In polynomial models.Metamodeling 133 The penalized least squares with the Lq penalty yields a “bridge regression. · · · . · · · . whose derivative is deﬁned by pλ (β) = λ{I(β ≤ λ) + (aλ − β)+ I(β > λ)} for some a > 2 (a − 1)λ and β > 0 with pλ (0) = 0 and a = 3. which results in the least squares estimator ˆ β = (B B)−1 B y. we shall apply the above regularization methods to various metamodeling techniques. Thus. Note that the number of polynomial basis functions dramatically increases with the number of input variables and the degree of the polynomial. yn are their associated outputs. the least squares approach is to minimize n L (5. · · · . lower-order polynomials such as the quadratic polynomial model or the centered quadratic model (1.11) are adapted from traditional polynomial regression in statistics. Bs (x) = xs . To alleviate the problem. Bs+1 (x) = x2 . B2s (x) = x2 . They are not an orthogonal basis.23)): B0 (x) = 1.5). and y1 . orthogonal polynomial bases have been introduced to model computer experiments in the © 2006 by Taylor & Francis Group. LLC . B1 (x) = x1 .4) and (5. Throughout this chapter. Bs(s+3)/2 (x) = xs−1 xs . · · · . the basis functions are taken as (see (1. ··· .

rs (x) can serve to deﬁne the basis function Bl (x) in (5.11). 0 0 φ2 (u) du = 1. and any ﬁnite set of functions Φr1 . Let φ0 (u) = 1 for all u ∈ [0. In practice.0 (x) = 1. AIC.12) can be easily used to construct an orthogonal polynomial basis. · · · . 1]s . BIC. or φ-criterion (see Section A. the second stage consists of reducing the number of terms in the model following a selection procedure. people usually limit the model to only linear or up to lower-order models or models with ﬁxed terms. 44 176 14784 These orthogonal polynomials together with their tensor products (5. (2000) suggested a new © 2006 by Taylor & Francis Group. For integers j ≥ 1. the number of possible terms in the polynomial basis grows rapidly. The selected model usually has better prediction power. 1] can be constructed using Legendre polynomials over [−1/2. such as a stepwise selection based on Cp . Giglio et al.134 Design and Modeling for Computer Experiments literature. the number of possible candidate polynomial interpolators grows dramatically. xs ) ∈ [0. The ﬁrst few Legendre polynomials are: φ0 (u) = 1 √ φ1 (u) = 12(u − 1/2) √ 1 φ2 (u) = 180{(u − 1/2)2 − } 12 √ 3 φ3 (u) = 2800{(u − 1/2)3 − (u − 1/2)} 20 3 3 φ4 (u) = 210{(u − 1/2)4 − (u − 1/2)2 + } 14 560 √ 5 5 φ5 (u) = 252 11{(u − 1/2)5 − (u − 1/2)3 + (u − 1/2)} 18 336 √ 15 5 5 φ6 (u) = 924 13{(u − 1/2)6 − (u − 1/2)4 + (u − 1/2)2 − }.··· . LLC . once a polynomial interpolator is selected. as the number of variables and the order of polynomials increase. and j 0 φj (u)φk (u) du = 0. the required number of data samples also increases dramatically. Hence. (5. They further constructed the basis function over the unit cube by taking tensor products of univariate basis function. 1].12) Let B0 (x) ≡ Φ0. The univariate orthogonal polynomials over [0. Due to the structure of the polynomial basis.··· . An s dimensional tensor product basis function over x = (x1 .rs (x) = j=1 φrj (xj ). which can be prohibitive for computationally expensive simulation models. 1/2] by a location transformation.4). although it may not exactly interpolate the observed data. As a result.··· . Therefore. let φj (u)s satisfy 1 1 1 φj (u) du = 0. 1]s is then as follows s Φr1 . for j = k. An and Owen (2001) introduced quasi-regression for approximation of functions on the unit cube [0.

Very often.1) are given below: x1 : x2 : x3 : x4 : x5 : x6 : x7 : x8 : gasket thickness number of contour zones zone-to-zone transition bead proﬁle coining depth deck face surface ﬂatness load/deﬂection variation head bolt force variation The small number of runs is necessary due to the simulation setup complexity and excessive computing requirements. head bolt rundown) as well as engine operating conditions (e. surface ﬂatness) are minimized. and Bates et al..Metamodeling 135 orthogonal procedure to select a parsimonious model. the main eﬀect plots can provide us with very rich information. x7 . The 8 factors (see Table 5. We may directly employ the penalized least squares (5. from which it can be seen that the interaction between gasket thickness and surface ﬂatness is strong.10) to select a parsimonious model.. this suggests that one can select a gasket thickness such that the gap lift is insensitive to the surface ﬂatness variation. Example 2 (Continued) The computer model of interest simulates engine cylinder head and block joint sealing including the assembly process (e. A computer experiment employing uniform design with 27 runs and 8 factors was conducted to optimize the design of the head gasket for sealing function. Bates et al.g. We conclude this section by illustrating a new application of polynomial basis approximation.2 indicate that gasket thickness (x1 ) and surface ﬂatness (x6 ) are the most important factors aﬀecting gap lift. Lin.3. is given in the last column of Table 5. An and Owen (2001).g. The main eﬀect plots shown in Figure 5.1.g. x5 ) so that it minimizes the “gap lift” of the assembly as well as its sensitivity to manufacturing variation (x6 . (2003) proposed a global selection procedure for polynomial interpolators based upon the concept of curvature of ﬁtted functions. There are many case studies using a polynomial basis in the literature. x8 ). As discussed in the last section. Fang. The main eﬀect plots in Figure 5. · · · . The response y. The objective of the design is to optimize the head gasket design factors (x1 .. © 2006 by Taylor & Francis Group. LLC . One particular interest in this study is to set gasket design factors so that the eﬀect of manufacturing variations (e. variable selection can be regarded as a kind of regularization.2 are used only for reference in this case. therefore. See. Winker and Zhang (2000). thermal and cylinder pressure cyclical loads due to combustion process). the gap lift. The interaction plots are shown in Figure 5. (2003). for instance.

87 1.82 3.77 1.53 2.33 2.64 2.13 1.29 1.00 2.17 2.70 1.1 Engine Block and Head x1 x2 x3 2 2 3 3 3 3 1 1 2 3 1 2 1 1 2 1 3 2 1 3 1 2 3 2 3 2 1 2 1 1 1 3 3 3 2 2 3 3 1 2 1 1 1 2 1 3 1 3 1 2 3 3 2 2 1 2 1 2 2 2 2 3 3 2 3 2 3 3 1 2 2 3 2 1 1 1 1 3 3 1 3 Joint Sealing Assembly Data x4 x5 x6 x7 2 2 1 2 2 3 1 3 3 2 1 3 1 2 2 3 2 3 1 1 3 3 3 2 2 1 2 3 1 1 1 1 3 3 2 1 2 1 3 1 1 3 2 1 3 1 2 1 3 2 1 2 3 3 2 3 1 3 1 2 2 3 3 2 1 1 3 3 2 1 3 2 2 2 3 1 1 3 3 3 3 2 3 1 2 2 2 2 1 2 3 3 3 1 1 3 1 1 1 2 3 1 2 2 1 2 2 1 x8 3 1 3 1 2 2 3 1 2 3 3 3 3 1 1 3 2 1 1 3 1 2 2 2 2 1 2 y 1.14 4.00 1.07 1.89 2.92 1.20 1.19 1.136 Design and Modeling for Computer Experiments TABLE 5.54 1.69 1.66 2.91 © 2006 by Taylor & Francis Group.74 2.43 1. LLC .21 1.42 5.69 3.

3. Note that we have only 27 runs. This implies that the eﬀect of surface ﬂatness. LLC .2 lists the variables in the selected models. and then apply traditional variable selection procedures to select signiﬁcant terms. We ﬁrst standardize all x-variables and then apply the penalized least squares with the SCAD penalty for the underlying polynomial model. we ﬁrst standardize all x-variables.2. We next illustrate how to rank importance of variables using penalized least squares with the SCAD penalty (see Section 2. We also noticed that the coeﬃcient for the interaction between x1 and x6 in all selected models is negative. To select signiﬁcant terms in the model. The results are depicted in Table 5.2 Main eﬀect plots. The above quadratic polynomial model has 45 parameters to estimate and will result in an over-ﬁtting model. it can be concluded that gasket thickness and desk face surface ﬂatness are important factors. and take the values of λ from a very large one to a very small one. From the selected model by the φ-criterion.3). Table 5. Consider the following quadratic polynomial model: f (x) = β0 + β1 x1 + · · · . From Table 5. © 2006 by Taylor & Francis Group. can be minimized by adjusting gasket thickness.3. β8 x8 + β9 x2 + · · · + β16 x2 1 8 +β17 x1 x2 + · · · + β44 x7 x8 . x8 and x2 may have some eﬀects. a manufacturing variation variable.Metamodeling 137 FIGURE 5.

x1 x6 6 φ-criterion x2 .3 Interaction eﬀect plots. x1 x2 . x2 . x2 . x1 x6 1 6 BIC x2 . x1 x6 1 6 RIC x2 . x1 x6 . x1 x2 . LLC .2 Variable Selection Criterion Variables in the Selected Model AIC x2 . x2 .138 Design and Modeling for Computer Experiments FIGURE 5. TABLE 5. x2 x6 1 6 8 © 2006 by Taylor & Francis Group. x2 .

1524 + 1. the gasket thickness is also an important variable. x2 . both x6 and x1 are selected. x6 x7 . x3 x7 . LLC . x7 . x1 x6 . We will further use this example to demonstrate how to rank important variable by using decomposition of sum of squares in Chapter 6. When λ = 0. x1 x6 8 0. The spline method mainly includes smoothing splines (Wahba (1990)). a manufacturing variation variable. This implies that surface ﬂatness is the most important variable. x1 x6 0. x1 x6 . With the chosen value of λ = 0. the corresponding row is highlighted by bold-face font in Table 5.1930.4012 x6 .1023 x8 .4212 x6 0. can be minimized by adjusting gasket thickness. Craven and Wahba (1979)) method (see also Appendix) was employed to select the tuning parameter λ.4212. x1 0. x8 . The GCV chooses λ = 0.1750 x6 . x2 . next in importance to the surface ﬂatness. x1 x6 . x1 x6 . the rank of importance of variables seems to be x6 . x5 . x4 x5 .3 139 Rank Importance of Variables λ Selected Model 0. x1 .1185 x2 . x2 .1512 x6 .3.0884 x2 .0678x6 − 0. x1 x6 .1. x4 x5 6 0. Following Fan and Li (2001). For a certain class of estimators. the selected model is y (x) = 1.3.0975 x8 .1930. the GCV is equivalent to the cross validation introduced in Section 5.Metamodeling TABLE 5. x7 x8 4 6 From Table 5. x4 . but the GCV method requires much less computation. From Table 5.2736x1 x6 .3 Spline Method Splines are frequently used in nonparametric regression in the statistical literature. x4 x5 8 0. x4 x5 6 8 0. when λ = 0. ˆ The negative interaction between gasket thickness and the surface ﬂatness implies that the eﬀect of surface ﬂatness. x3 x7 .3. Therefore. x2 . x3 . 5.1930 x6 . x2 . x1 x6 .1. the generalized cross validation (GCV. x2 . x4 x5 6 0. x2 . © 2006 by Taylor & Francis Group. x4 x5 6 8 0. x2 .0928 x2 .4012. With this value of λ. the interaction between x1 and x6 has signiﬁcant impact. only x6 is selected.

κK . The mechanism of constructing a multi-dimensional spline basis is also similar to that of polynomial regression. we consider a quadratic spline model with 10 degrees of freedom (i.375)2 − 67. in order to make the spline ﬁt comparable to the polynomial ﬁt. we reﬁt the data in Example 18 using a regression spline. κ1 .50)2 + + + +166. Here we focus on regression splines. (5.8806x − 32. One may further pursue a better spline ﬁt by using a higher order spline and/or adding more knots. LLC . and I(·) is the indicator function.3. as an extension to polynomial models.875)2 .3. kl .13) + + + + where κi = i/8 are termed as knots.14) is a © 2006 by Taylor & Francis Group.4558(x − 0.e.625)2 + 39.25)2 − 85. It is not diﬃcult to see that s(x) given by (5.1643(x − 0.140 Design and Modeling for Computer Experiments regression splines (Stone et al.3704(x − 0. a+ = a I(a ≥ 0).5367(x − 0.4452(x − 0. the p-th order spline is deﬁned as K s(x) = β0 + β1 x + β2 x2 + · · · + βp xp + k=1 βp+k (x − κk )p .75)2 − 181. The spline basis used in (5.14) is a pth degree polynomial on each interval between two consecutive knots and has p − 1 continuous derivatives everywhere. Example 18 (Continued) For the purpose of illustration. The resulting metamodel is given by ˆ f (x) = 0. + (5. We will introduce penalized splines in Section 7. The pth derivative of s(·) takes a jump of size βp+l at the l-th knot.9065(x − 0. (1997)). For given knots.1 Construction of spline basis The above example shows that splines can be used to approximate univariate functions as easily as the polynomial regression.3511(x − 0. · · · . from which we can see that the linear spline basis is a piecewise linear function. from which the linear spline has a slightly better ﬁt to data than the best polynomial ﬁt.6717x2 + 37.14) where p ≥ 1 is an integer.00001756 + 3. Let us begin with construction of a one-dimensional spline basis. Since the polynomial of order 9 is the best polynomial ﬁt. + + + The resulting ﬁt is depicted in Figure 5. and penalized splines (Eilers and Marx (1996) and Ruppert and Carroll (2000)). The linear spline basis is depicted in Figure 5.4.125)2 + +52. 5.. terms): 2 f (x) = β0 + β1 x + β2 x2 + β3 (x − κ1 )2 + β4 (x − κ2 )2 + β5 (x − κ3 )2 + + + +β6 (x − κ4 )2 + β7 (x − κ5 )2 + β8 (x − κ6 )2 + β9 (x − κ7 )2 .4(a).

Sp (u) = up .8 1 (a) (b) FIGURE 5. · · · . such as a B-spline (De Boor (1978))). · · · . and (b) plot of spline ﬁt. its implementation is diﬃcult in practice because the number of coeﬃcients to be estimated from data increases exponentially as the dimension s increases.4 0. can form the basis function of a function space over x. · · · . denote S0 (u) = 1.rs (x) = j=0 Srj (xj ). Sp+K (u) = (u − κK )p . K + p.0 (x) = 1. r1 = 0. κK . Friedman (1991) proposed an approach called multivariate adaptive regression splines (MARS).2 y 0 −0.rs (x). xs ) is s Br1 .5 −1 −1.6 0.15) It is easy to see that B0.··· . the number of basis functions and the knot locations are adaptively determined by the © 2006 by Taylor & Francis Group.2 0. · · · .5 141 2. For given ﬁxed knots κ1 . However. A spline basis in multi-dimensional functional space is constructed by using a tensor product.5 1 True Sample Fitted Curve Fitted Values 0. Using a recursive partitioning approach. In MARS. It is common to standardize the x-variables ﬁrst such that the ranges for all x-variables are the same. LLC . Then take the same knots for each x-variables.1 0 0 0.7 0.5 −2 0 0.··· .··· . (5.2 0.5 2 1.4 (a) Quadratic spline basis. the tensor product method presents us with a straightforward approach for extending one dimensional regression spline ﬁtting to multidimensional model ﬁtting. + + Then an s-dimensional tensor product basis function over x = (x1 . · · · .4 x 0. K + p.6 0. S1 (u) = u.6 0.5 0.4 0. power basis and can be replaced by another spline basis.8 1 0. As discussed in Friedman (1991).8 0.Metamodeling 0. rs = 0. Sp+1 (u) = (u − κ1 )p . · · · . and any ﬁnite set of functions Br1 .3 0. · · · .

142

Design and Modeling for Computer Experiments

data. Friedman (1991) presented a thorough discussion on MARS and gave an algorithm to create spline basis functions for MARS. Readers are referred to his paper for details. Let B0 (x), · · · , BL (x) be a ﬁnite subset of {Br1 ,··· ,rs (x) : r1 = 0, · · · , K + p; · · · ; rs = 0, · · · , K + p}. Consider a regression spline model:

L

f (x) =

j=1

βj Bj (x).

When L + 1 ≤ n, the least squares estimator for (β0 , · · · , βL ) is ˆ β = (B B)−1 B y, where B is the associated design matrix (cf. Section 1.2). However, only when N + 1 ≥ n, does the resulting metamodel exactly interpolate the observed sample. How to choose a good subset from the set of spline basis functions is an important issue. In a series of works by C.J. Stone and his collaborators (see, for example, Stone et al. (1997)), the traditional variable selection approaches were modiﬁed to select useful sub-bases. The regularization method introduced in Section 5.1.2 can be applied to the regression spline model directly. This is referred to as penalized splines in the statistical literature. Ruppert (2002) has explored the usefulness of penalized splines with the L2 penalty.

5.3.2

An illustration

The borehole example has been analyzed using the polynomial regression model in Section 1.8. Here, the penalized spline approach is used to analyze the borehole data, as demonstrated by Li (2002). We ﬁrst employ a U30 (308 ) to generate a design with 30 experiments, and the outcomes and their corresponding design points are depicted in Table 5.4. Similar to analysis in Section 1.8, we take x1 = log(rw ), x2 = log(r), x3 = Tu , x4 = Tl , x5 = Hu , x6 = Hl , x7 = L, and x8 = Kw . Let zi be the standardized variable of xi . We construct quadratic splines (zi − κij )2 with the knots at the median + of zi , denoted by κi2 , and lower and upper 20 percentiles of zi , denoted by κi1 and κi3 . The values of the knots of zi ’s are depicted in Table 5.5. Here a+ means the positive part of a, i.e., a+ = aI(a > 0), and is taken before taking square in the construction of the quadratic spline basis. Like the analysis in Section 1.8, we take natural logarithm of outputs as response variable y. For each variable zi , i = 1, · · · , 8,

2 zi , zi , (zi − κ1 )2 , (zi − κ2 )2 , (zi − κ3 )2 , + + +

are introduced to the initial model. We also include an intercept and all interactions of zi ’s in the initial model. Thus, there is a total of 68 x-variables in the

© 2006 by Taylor & Francis Group, LLC

Metamodeling

143

TABLE 5.4

**Designs and Outputs
**

rw 0.0617 0.1283 0.0950 0.0883 0.0583 0.0683 0.0817 0.1350 0.0717 0.1050 0.0750 0.1383 0.1183 0.1417 0.1083 0.1117 0.1017 0.1150 0.0783 0.1217 0.0850 0.0983 0.0650 0.0550 0.0517 0.1450 0.0917 0.1250 0.1483 0.1317 r 45842 35862 10912 22555 14238 4258 27545 19228 40852 39188 20892 17565 32535 44178 12575 5922 2595 49168 24218 7585 47505 37525 9248 30872 34198 932 15902 29208 25882 42515 Tu 84957 93712 81456 114725 95464 88460 105970 76203 104218 72701 98965 79704 74451 83207 112974 69199 86708 97215 63945 107721 67447 100717 70949 109472 77954 102468 91961 65697 90211 111222 Tl 106.3017 81.6150 111.5917 67.5083 71.0350 88.6683 109.8283 69.2717 74.5617 101.0117 99.2483 93.9583 108.0650 72.7983 86.9050 63.9817 83.3783 90.4317 76.3250 78.0883 85.1417 65.7450 102.7750 95.7217 79.8517 104.5383 115.1183 97.4850 92.1950 113.3550 Hu 1108 1092 1016 1008 1068 1080 1088 1032 1000 1064 1036 1100 1076 1060 1104 1056 992 1020 1096 1040 1044 1084 1012 1052 1024 1072 1048 996 1004 1028 Hl 762 790 794 778 810 766 802 738 746 814 754 782 722 706 730 774 714 726 718 806 798 742 734 710 786 750 702 758 818 770 L 1335 1241 1223 1260 1559 1129 1428 1148 1372 1167 1671 1652 1540 1447 1484 1409 1633 1204 1279 1353 1615 1596 1521 1185 1465 1297 1391 1316 1503 1577 Kw 9990 10136 10063 10048 9983 10005 10034 10151 10012 10085 10019 10158 10114 10165 10092 10100 10078 10107 10027 10121 10041 10070 9997 9975 9968 10173 10056 10129 10180 10143 output 30.8841 126.2840 51.6046 44.7063 17.6309 40.7011 41.9919 146.8108 29.8083 74.3997 29.8223 116.6914 101.7336 154.9332 93.2778 78.5678 55.4821 101.7270 56.9115 80.7530 34.6025 65.1636 24.2095 27.3042 13.5570 165.6246 65.8352 89.2366 86.2577 89.7999

TABLE 5.5

**Knots for Regression Spline
**

Variable κ1 κ2 κ3 z1 −1.042 0.1439 1.0072 z2 −0.7144 0.3054 0.8152 z3 −1.0564 0 1.0564 z4 −1.0564 0 1.0564 z5 −1.0564 0 1.0564 z6 −1.0564 0 1.0564 z7 −1.0564 0 1.0564 z8 −1.0564 0 1.0564

© 2006 by Taylor & Francis Group, LLC

144

Design and Modeling for Computer Experiments

initial model. Since we only conduct 30 experiments, the initial model is overparameterized. Penalized least squares with the SCAD penalty (Section A.4) is applied for this example. The estimated λ is 1.1923 ∗ 10−5 . The resulting coeﬃcients are depicted in Table 5.6. We further estimate the MSE via randomly generating 10,000 new samples, and the estimated MSE is 0.1112, which is much smaller than the one obtained by polynomial regression in Section 1.8.

TABLE 5.6

Estimate of Penalized Least Squares Estimates Variable Est. Coeﬀ. Variable Est. Coeﬀ. 2 intercept 4.0864 z7 0.0067 0.6289 z1 z2 0.0005 z1 -0.0041 z1 z4 0.0024 z3 0.0010 z2 z3 -0.0005 z4 0.1231 z4 z6 0.0007 z5 -0.1238 z5 z6 0.0165 z6 -0.1179 z5 z7 0.000035 z7 2 -0.0095 z6 z7 -0.0011 z5 2 -0.0072 (z2 − k1 )2 -0.0007 z6 +

Variable (z3 − k1 )2 + (z4 − k1 )2 (z2 − k2 )2 + (z5 − k2 )2 + (z7 − k2 )2 + (z8 − k2 )2 + (z1 − k3 )2 + (z2 − k3 )2 +

Est. Coeﬀ. 0.0021 -0.0004 -0.0002 -0.0127 0.0014 0.0275 -0.0174 -0.0014

5.3.3

Other bases of global approximation

Polynomial basis and spline basis are the most popular bases in modeling computer experiments. Other bases for global approximation of an unknown function have also appeared in the literature. For example, the Fourier basis can be used to approximate periodic functions. It is well known that 1, cos(2π x), sin(2π x), · · · , cos(2kπ x), sin(2kπ x), · · · , forms an orthogonal basis for a functional space over [0, 1]. Using the method of tensor product, we may construct an s-dimensional basis as follows:

s

**Br1 ,··· ,r2s (x) =
**

j=1

cos(2r1 π x1 ) sin(2r2 π x1 ) · · · cos(2(r2s−1 π xs ) sin(2r2s π xs )

**for r1 , · · · , r2s = 0, 1, · · · . In practice, the following Fourier regression model is recommended:
**

s m

β0 +

i=1 j=1

{αij cos(2jπ xi ) + βij sin(2jπ xi )}.

© 2006 by Taylor & Francis Group, LLC

Metamodeling

145

The Fourier basis has been utilized for modeling computer experiments in Bates et al. (1996). Motivated by Fang and Wang (1994), Bates et al. (1996) further showed that the good lattice point method (see Section 3.3.3) forms a D-optimal design for the Fourier basis.

5.4

Gaussian Kriging Models

The Kriging method was proposed by a South African geologist, D.G. Krige, in his master’s thesis (Krige (1951)) on analyzing mining data. His work was further developed by many other authors. The Gaussian Kriging method was proposed by Matheron (1963) for modeling spatial data in geostatistics. See Cressie (1993) for a comprehensive review of the Kriging method. The Kriging approach was systematically introduced to model computer experiments by Sacks, Welch, Mitchell and Wynn (1989). Since then, Kriging models have become popular for modeling computer experiment data sets. Suppose that xi , i = 1, · · · , n are design points over an s-dimensional experimental domain T , and yi = y(xi ) is the associated output to xi . The Gaussian Kriging model is deﬁned as

L

y(x) =

j=0

βj Bj (x) + z(x),

(5.16)

where {Bj (x), j = 1, · · · , L} is a chosen basis over the experimental domain, and z(x) is a random error. Instead of assuming the random error is independent and identically distributed, it is assumed that z(x) is a Gaussian process (cf. Appendix A.2.2 for a deﬁnition and Section 5.5.1 for basic properties) with zero mean, variance σ 2 , and correlation function r(θ; s, t) = Corr(z(s), z(t)). (5.17)

Here r(θ; s, t) is a pre-speciﬁed positive deﬁnite bivariate function of z(s) and z(t). Model (5.16) is referred to as the universal Kriging model. In the literature, y(x) = μ + z(x) is referred to as the ordinary Kriging model, which is the most commonly used Kriging model in practice. A natural class for the correlation function of z, and hence of y, is the stationary family where r(θ; s, t) ≡ r(θ; |s − t|). For instance, the following function

s

r(θ; s, t) = exp{−

k=1

θk |sk − tk |q },

for

0<q≤2

© 2006 by Taylor & Francis Group, LLC

146 or

**Design and Modeling for Computer Experiments
**

s

r(θ; s, t) = exp{−θ

k=1

|sk − tk |q },

for

0 < q ≤ 2.

is popular in the literature. The case q = 1 is the product of OrnsterinUhenbeck processes. The case q = 2 gives a process with inﬁnitely diﬀerentiable paths and is useful when the response is analytic. We will further introduce some other correlation functions in Section 5.5.

5.4.1

Prediction via Kriging

It is common to expect that a predictor g(x) of f (x) or y (x) of y(x) satisﬁes ˆ some properties: DEFINITION 5.1 has the form y (x) = ˆ (a) A predictor y (x) of y(x) is a linear predictor if it ˆ

n i=1 ci (x)yi .

(b) A predictor y (x) is an unbiased predictor if E{ˆ(x)} = E{y(x)}. ˆ y (c) A predictor y (x) is the best linear unbiased predictor (BLUP) if it has ˆ the minimal mean squared error (MSE), E{ˆ(x) − y(x)}2 , among all linear y unbiased predictors. The Gaussian Kriging approach essentially is a kind of linear interpolation built on the following property of the multivariate normal distribution. Let z ∼ Nn+1 (0, Σ), and partition z into (z1 , z2 ) , where z1 is univariate and z2 is n-dimensional. Then, given z2 , the conditional expectation of z1 is E(z1 |z2 ) = Σ12 Σ−1 z2 , 22 (5.18)

where Σ12 = Cov(z1 , z2 ) and Σ22 = Cov(z2 ). Equation (5.18) implies that for a given observed value of z2 , a prediction to z1 is z1 = Σ12 Σ−1 z2 . ˆ 22 This is a linear combination of components of z2 . It can be shown that z1 is a linear unbiased predictor. Under the normality ˆ assumption, it can be further shown that z1 is the BLUP. To facilitate the ˆ normality assumption, the Gaussian Kriging approach views z(xi ) = y(xi ) − L j=0 βj Bj (xi ), i = 1, · · · , n, the residuals of deterministic output, as the realization of a Gaussian process z(x). Thus, for any new x∗ ∈ T (that is, x∗ is diﬀerent from x1 , · · · , xn ), (z(x∗ ), z(x1 ), · · · , z(xn )) follows an n + 1dimensional normal distribution. Therefore, a linear predictor for z(x∗ ) can easily be obtained by using (5.18), and is denoted by z (x∗ ). Furthermore, the ˆ predictor for y(x∗ ) can be obtained by using (5.19) below.

© 2006 by Taylor & Francis Group, LLC

xn . θ) ∂β∂θ = 0. xj ). σ 2 . ˆ (5. LLC . · · · . σ 2 . Equation (5. Since (β. θ) is unstable because R(θ) may be nearly singular and σ 2 could be very small. ∂β∂σ 2 2σ ∂ 2 (β.4. the density of y is (2πσ 2 )−n/2 |R(θ)|−1/2 exp − 1 (y − Bβ) R(θ)−1 (y − Bβ) . x1 . This assumption can be satisﬁed not only by normal distributions but also by elliptical distributions.2 Estimation of parameters From the normality assumption of the Gaussian Kriging model. θ) 1 = − 4 B R−1 (θ)(y − Bβ). B 2 ∂β∂θ 2σ ∂θ and E(y) = Bβ. the log-likelihood function of the collected data equals 1 n 1 log(σ 2 ) − log |R(θ)| − 2 (y − Bβ) R−1 (θ)(y − Bβ). after dropping a constant. σ 2 . Under the normality assumption on z(x).19) where r(x) = (r(θ. x).5). σ 2 . xi . E ∂ 2 (β. 2σ 2 Then. See Fang. and other notations given in (5. θ) 1 ∂R−1 (θ) = (y − Bβ). It is desirable to estimate them separately. θ) ∂β∂σ 2 = 0. θ) = − ∂ 2 (β. σ 2 . σ 2 .18) is a critical assumption of the Gaussian Kriging approach to model computer experiments. while θ is a smoothing parameter vector. the BLUP of y(x) can be written as ˆ ˆ y (x) = b(x)β + r (x)R−1 (θ)(y − Bβ). simultaneous maximization over (β.20) Maximizing the log-likelihood function yields the maximum likelihood estimate of (β. 5. Kotz and Ng (1990) for properties of elliptical distributions.Metamodeling 147 Let R(θ) be an n × n matrix whose (i. such as multivariate t-distribution and scale mixture of normal distributions. Furthermore. © 2006 by Taylor & Francis Group. the MSE of the BLUP is σ 2 1 − (b (x). and E ∂ 2 (β. the parameters β and θ play diﬀerent roles: β is used to model overall trend. r (x)) 0 B B R(θ)−1 b(x) r(x) . Furthermore. j)-element is r(θ. σ 2 .4) and (5. 2 2 2σ (5. r(θ. In practice. θ). x)) .

θ) separately. This formulation is referred to as the generalized least squares criterion in the literature since E(y) = Bβ and Cov(y) = σ 2 R(θ). (5. ˆ Estimation of θ: The maximum likelihood estimator for θ does not have a closed form. The Newton-Raphson algorithm or Fisher scoring algorithm may be used to search for the solution. By setting ∂ (β. © 2006 by Taylor & Francis Group. we may estimate β and (σ 2 . An unbiased estimator for σ 2 is ˆ ˆ σ 2 = (n − L − 1)−1 (y − Bβ) R−1 (θ)(y − Bβ). It has been empirically observed that the prediction based on the simultaneous maximization over (β.21) Estimation of σ 2 : The maximum likelihood estimator for σ 2 also has an expressive form. σ 2 . Initial value of θ: As studied by Bickel (1975) in other statistical settings. with a good initial value. σ 2 .22) σ 2 = n−1 (y − Bβ) R−1 (θ)(y − Bβ). ˆ ˆ (5. θ) is block-diagonal. θ) ∂β∂(σ 2 . This allows one to estimate β and (σ 2 . θ) iteratively in the following way. In practice. E ∂ 2 (β. θ) performs almost the same as that which relies on the estimation of β and (σ 2 . σ 2 . It is clear that a good initial value for β is the least squares estimator. ˆ which is a biased estimator for σ 2 . it follows that for a given θ. In practice. we apply the Gaussian Kriging model for the data set in Example 18. θ)/∂σ 2 = 0. θ) because the Fisher information matrix. LLC . This is consistent with the above theoretical analysis. the one-step estimator may be as eﬃcient as the full iteration one. This is equivalent to setting the initial value of θ to be 0. the maximum likelihood estimate of β is ˆ β mle = (B R−1 (θ)B)−1 B R−1 (θ)y which can be derived via minimizing (y − Bβ) R−1 (θ)(y − Bβ) with respect to β.148 Design and Modeling for Computer Experiments This implies that the maximum likelihood estimator of β is asymptotically independent of that of (σ 2 . although it is close to unbiased as n increases. The maximum likelihood estimate of β: For a given θ. Before we continue. one may stop at any step during the course of iterations. θ) separately and iteratively by using the Fisher scoring algorithm or using the Newton-Raphson algorithm and ignoring the oﬀ-block-diagonal matrix.

−76. the Gaussian Kriging model provides us with a better prediction in this case. σ 2 )= (−0.2693. we reﬁt the data in Example 18 using the following Gaussian Kriging model: y(x) = μ + z(x). Step 3: For a given θ.4150. b 167.16) is given by r (β. σ 2 . Fortunately.8839. 459. Example 18 (Continued) As a demonstration. z(t)} = σ 2 exp{−θ|s − t|2 }. the least squares estimator of β. The predicted curve and the true curve are almost identical in this example. these algorithms are rather simple to implement. one may use the restricted maximum likelihood (REML. the logarithm of REML for model (5.22) and solving the following equation ∂ (β.6337). σ 2 ) are (ˆ.5. 14. θ) = 1 (n − L − 1) 1 log σ 2 − log |R(θ)| − log |B R−1 (θ)B| 2 2 2 1 − 2 (y − Bβ) R−1 (θ)(y − Bβ). 589. −219. After dropping a constant. θ. Step 4: Iterate Step 2 and Step 3 until it converges. 970. This step requires a numerical iteration algorithm. ˆ ˆ = R−1 (θ)(y − μ1n ) = (26. such as the NewtonRaphson algorithm or Fisher scoring algorithm. θ. 2σ © 2006 by Taylor & Francis Group.6298.1 As a natural alternative approach to the maximum likelihood estimate. The resulting prediction is depicted in Figure 5.21). to solve the above equation. update β using (5. −103.5381. Furthermore. θ)/∂θ = 0.1082. θ) by using (5.6721. σ 2 . Compared with the polynomial regression model and regression spline. Step 2: For a given β.Metamodeling Algorithm for Estimation of Gaussian Kriging Model: 149 Step 1: Set the initial value of β to be (B B)−1 By. −868. μ ˆ ˆ The maximum likelihood estimates for (μ. where z(x) is a Gaussian process with mean zero and covariance Cov{z(s).6745.0115. Patterson and Thompson (1971)) method to estimate the parameters involving the covariance matrix. from which we can see that the Gaussian Kriging exactly interpolates the observed sample. The REML is also called the residual or modiﬁed maximum likelihood in the literature. Remark 5.9443). 10. −822.4172. LLC . we update (σ 2 .

In practice. 10}.6 0. θ). θ) yields a REML estimate for the unknown parameters. demonstrated in Li and Sudjianto (2005).4 x 0.8 1 FIGURE 5. Maximizing r (β. Let the sample data be {xi = i : i = 0. the maximum likelihood estimate for μ and σ 2 can be easily computed.5 0 −0. Consider the following Gaussian Kriging model to ﬁt the data: y(x) = μ + z(x). which © 2006 by Taylor & Francis Group.5 1 0. Example 19 Consider the following one-dimensional function: y = sin(x). σ 2 .2 0. σ 2 .5 −2 0 y Design and Modeling for Computer Experiments Prediction via Kriging True Sample Fitted Curve 0.5 −1 −1.5 Plot of Gaussian Kriging model ﬁt. LLC . We can further compute the proﬁle likelihood function (θ). We illustrate the motivation of the penalized Kriging approach via the following simple example. z(t)) = σ 2 exp{−θ|s − t|2 }. where z(x) is a Gaussian process with mean zero and covariance: Cov(z(s). For a given θ. σ 2 . For simplicity. one of the serious problems with the maximum likelihood estimate of θ is that the resulting estimate may have a very large variance because the likelihood function near the optimum is ﬂat. The algorithm to optimize r (β. 2. θ) is parallel to that for (β. we refer to the Gaussian Kriging approach with penalized likelihood as a penalized Kriging approach. Li and Sudjianto (2005) proposed a penalized likelihood approach to deal with this problem.150 2 1. · · · .

From the ﬁgure we can see that the shape of the proﬁle restricted likelihood function in this example is the same as that of Figure 5. we take a much larger sample {xi = i : i = 0. Deﬁne a penalized likelihood as s Q(β.1 has been utilized to select the smoothing parameter vector. and pλ (·) is a penalty function with a regularization parameter λ.6(g). and its corresponding prediction is displayed in Figure 5. However. The corresponding likelihood function is depicted in Figure 5. A useful way © 2006 by Taylor & Francis Group.6(g) and the prediction is shown in Figure 5. Comparing Figure 5. σ 2 .6(e) clearly shows that the penalized likelihood function is not ﬂat around the optimum. θ) is deﬁned in (5. because it can be viewed as a smoothing parameter vector (see Section 5. Furthermore. 10}. The prediction based on REML becomes very erratic when x is not equal to the sample data. The prediction based on the REML is displayed in Figure 5. where (β. the locations of the maximum likelihood estimate and the penalized maximum likelihood estimate are very close. The curvature around the optimum implies that the resulting estimate for θ possesses smaller standard error. 0. which shows that the prediction becomes very erratic when x is not equal to the sampled data. which is the same as that obtained by the ordinary likelihood approach. The prediction is the same as ˆ that in Figure 5.Metamodeling 151 equals the maximum of the likelihood function over μ and σ 2 for any given θ.6(f).5. from which we can see that the likelihood function becomes almost ﬂat for θ ≥ 1.7 for more discussions). and that it achieves its maximum at θ = 3 and becomes almost ﬂat for θ ≥ 1. Figure 5. From Figure 5.18). The corresponding Gaussian logarithm of proﬁle likelihood (log-likelihood. σ 2 . θ) = (β.6(a). · · · .6(b). θ) + n j=1 pλ (|θj |). We now consider the REML method.6(e). For investigating the performance of the penalized Kriging method. To avoid erratic behavior. for short) function (θ) versus θ is depicted in Figure 5. The prediction based on the Gaussian Kriging model is displayed in Figure 5.6(c). the prediction and the true curve are almost identical.6(a). Li and Sudjianto (2005) considered a penalized likelihood approach as follows.6(h) conﬁrms that the predictions yielded by the penalized Kriging method are quite accurate. a penalized log-likelihood function with the SCAD penalty is depicted in Figure 5. LLC . σ 2 . To conﬁrm the model selected by penalized Kriging. 1. the cross validation procedure described in Section 5. Readers are referred to Li and Sudjianto (2005) for theoretical analysis of the penalized likelihood approach.1.6(h). The corresponding logarithm of the proﬁle restricted likelihood function versus θ is depicted in Figure 5. Model selection techniques can be employed to estimate the parameter θ. it can be very computationally expensive because the cross validation procedure has to search over s-dimensional grid points.6(b) because θ = 3. Figure 5.6(f). In the literature.6(c).6(e) with Figure 5.

(d) is the prediction via Kriging with n = 6 using the REML method. (b) and (h) are prediction via Kriging with sample size n = 6 and 21.5 −5 0 −10 −0. (e) is the penalized loglikelihood function for Kriging with n = 6. LLC . respectively.5 Log−lkhd 100 0 y 0 1 θ 50 −0.5 −30 0 −40 y 0 1 θ −0. and dots stands for prediction at the sample data points.6 Kriging and penalized Kriging.5 0 −1 2 3 0 2 4 x 6 8 10 FIGURE 5. (c) is the log-restricted likelihood function for Kriging with n = 6. the solid line stands for prediction. (f) and (h). respectively. (a) and (g) are the log-likelihood functions for Kriging with sample size n = 6 and 21. (d). (f) is the prediction via penalized Kriging with n = 6.152 5 Design and Modeling for Computer Experiments (a) Log−Likelihood Function 1 (b) Prediction via Kriging 0 −5 0.5 −30 −1 0 1 θ 2 3 0 2 4 x 6 8 10 (c) Log−Restricted Likelihood (d) Prediction via REML 5 1 Log−Restricted Lkhd 0 0.5 −50 −60 −1 2 3 0 2 4 x 6 8 10 (g) Log−Likelihood Function (h) Prediction via Kriging 200 1 150 0. In (b). the dashed line stands for the true curve.5 Log−lkhd −10 0 −15 −20 y −25 −0.5 −15 y −20 −1 0 1 θ 2 3 0 2 4 x 6 8 10 (e) Penalized Log−Likelihood (f) Prediction via Penalized Kriging 0 1 Penalized Log−lkhd −10 −20 0. © 2006 by Taylor & Francis Group.

Since then. This combination results in both a lateral movement of the piston within the cylinder and a rotation of the piston about the piston pin. Piston slap is an unwanted engine noise caused by piston secondary motion. proﬁle. For this study. The crankshaft is modeled as a rigid body rotating with a constant angular velocity. de Luca and Gerges (1996) gave a comprehensive review of the piston slap mechanism and experimental piston slap analysis. the power cylinder system is modeled using the multi-body dynamics code ADAMS/Flex that includes a ﬁnite element model. wrist pin.4. LLC . (2003). 5. where ﬂexibility is introduced via a model superposition. © 2006 by Taylor & Francis Group. This indeed reduces the Kriging method to a radial basis function method. The piston. These impacts may result in the objectionable engine noise. it is desirable to have an analytical model to describe the relationship between the piston slap noise and its covariates. x3 = Skirt length. with the advent of faster. A detailed and thorough explanation of this study can be found in Hoﬀman et al. variation in clearance due to cylinder bore distortion and piston skirt proﬁle and ovality are included in the analysis. where σj is the sample standard deviation of the j-component of x. x2 = Location of Peak Pressure. more powerful computers. and ovality. vibration. and connecting rod were modeled as ﬂexible bodies. much of the piston slap study has shifted from experimental analysis to analytical analysis for both the power cylinder design phase and for piston noise troubleshooting. Thus. In addition. such as piston skirt length. We ﬁrst give a brief description of this study. that is. including noise source analysis and parameters inﬂuencing piston slap. Piston slap is a result of piston secondary motion.6 for a thorough discussion on how to implement the radial basis function method to model computer experiments. and harshness (NVH) characteristics of the vehicle and engine are critical elements of customer dissatisfaction. See Section 5. The secondary motion is caused by a combination of transient forces and moments acting on the piston during engine operation and the presence of clearances between the piston and the cylinder liner. The noise. the departure of the piston from the nominal motion prescribed by the slider crank mechanism. We take the piston slap noise as the output variable and have the following input variables: x1 = Clearance between the piston and the cylinder liner.Metamodeling 153 to reduce computational burden is to set θj = σj θ. Boundary conditions for the ﬂexible bodies are included via a Craig-Bampton component mode synthesis. and it causes the piston to impact the cylinder wall at regular intervals.3 A case study Total vehicle customer satisfaction depends greatly on the customer’s level of satisfaction with the vehicle’s engine.

00 4 85.20 22. TABLE 5.34 56. x5 = Skirt ovality. Chapter 3) was employed to plan a computer experiment with 12 runs. (2003)) to desensitize the piston slap noise from the source of variability (e.45 55.20 15.00 1.36 59.00 3.00 x5 1. The collected data are displayed in Table 5.40 14.20 18. we consider the following Gaussian Kriging model: y(xi ) = μ + z(xi ).00 3.00 3.00 3 29.40 24.00 18.60 24.7. variance σ 2 .00 23.00 11 43. x6 = Pin oﬀset. (2004) for the probabilistic design optimization study.20 25.14 0.60 23.00 10 15.00 21.00 1.00 2.80 25.98 1.00 21.. The ultimate goal of the study is to perform robust design optimization (Hoﬀman et al.00 3.00 8 71.00 1.82 0.66 0.00 x6 0.65 53.50 0.77 57.00 6 57.40 12.60 14.00 1.00 2.00 9 43.82 1.00 13. In this example.00 2 15. A Gaussian Kriging model is employed to construct a metamodel as an approximation to the computationally intensive analytical model.00 3.00 12 57.154 x4 = Skirt proﬁle. xj ) = exp − k=1 θk |xik − xjk |2 .40 x4 2. (2003) and Du et al.80 13.00 1.00 7 85.00 2.14 0.. We ﬁrst conduct some preliminary analysis using correlation functions with a single parameter.00 5 29.00 Data x2 16.85 56. In this discussion.77 56.00 3.60 x3 21.30 1. the availability of a good metamodel is a necessity. LLC .00 12.00 2.50 0.00 3.97 58.7 Design and Modeling for Computer Experiments Piston Slap Noise Run x1 1 71.00 22. θk = θ/σk . we only focus on the development of the metamodel. Interested readers should consult Hoﬀman et al. where z(x) is a Gaussian process with zero mean.75 57.00 1.68 58.00 1. and correlation function between xi and xj : s r(xi .e. uniform design (cf. To accomplish this goal.g. in order to quickly gain a rough picture © 2006 by Taylor & Francis Group.66 y 56.80 15.30 0.80 21. i.00 2.00 2.98 1. clearance variation).00 3.50 52.64 Since each computer experiment requires intensive computational resources.00 16.00 2.

(a) is log-likelihood function. L1 . (b). © 2006 by Taylor & Francis Group. respectively. LLC . and (d) are penalized loglikelhood functions with the SCAD.7 Log-likelihood and penalized log-likelihood of the case study example with n = 12. (c).Metamodeling 155 (a) Log−likelihood Function (b) SCAD Penalized Log−likelihood −12 Penalized Log−lkhd −16 −14 Log−likelihood −17 −16 −18 −18 −19 −20 −20 −22 0 1 θ 2 3 −21 0 1 θ 2 3 (c) L1 Penalized Log−likelihood (d) L2 Penalized Log−likelihood 0 Penalized Log−lkhd Penalized Log−lkhd 0 −20 −100 −40 −200 −300 −60 −400 −80 −500 −100 0 1 θ 2 3 −600 0 1 θ 2 3 FIGURE 5. and L2 penalty.

9798E-1 0.5779 ˆ θ4 4.1942 4. where λ = 0.3114 The leave-one-out cross validation procedure (cf.8 Penalized Maximum Likelihood Estimate Parameter MLE SCAD L1 ˆ 0. but the four penalized likelihood estimates for σ 2 and θj s are quite diﬀerent. TABLE 5.5177 3. The resulting estimate of λ equals 0.1300.1501 ˆ θ6 12.2909 0.4638 L2 0. their resulting penalized maximum likelihood estimates for θ under the constraint θk = θ/σk and θj ≥ 0 are very close. The four estimates for μ are very close.4854 0. for short).06 for the SCAD.3375 1.2) was used to estimate the tuning parameter λ. the L1 and the L2 penalties.2433E-1 0. The three penalized log-likelihood functions near its optimum are not ﬂat.6321 σ2 ˆ ˆ θ1 0. From the shape of the penalized log-likelihood functions.3776E-2 0.2253 4.1100. where σk stands for the sample standard deviation of the k-th component of xi . © 2006 by Taylor & Francis Group. and the penalized log-likelihood functions with the SCAD.6300 0.3067 MAR 1. Section 5. This preliminary analysis gives us a rough picture of log-likelihood function and the penalized likelihood function.3027E-5 0.1100 0.1418E-3 ˆ θ3 2. we intuitively set θk = θ/σk .5 log(n)/n) and n = 12. In our implementation.2596 56.4269E-1 0. Plots of the log-likelihood function. and 0. i = 1.156 Design and Modeling for Computer Experiments of the logarithm of the proﬁle likelihood function (log-likelihood.5321 3.1481E-1 MSE 2. the resulting estimate of penalized likelihood with the SCAD penalty may be more eﬃcient than the other two.9953 2. Although the three penalty functions are quite diﬀerent.5614E-6 0.3264E-1 0. LLC .8.2275(= 0. respectively.7.2590 3.1300 λ μ ˆ 56.18570E-6 0. The resulting estimates of μ. and the shape of the corresponding penalized likelihood functions look very diﬀerent. are depicted in Figure 5.8233E-3 0.4844 4. n.0588 1. the L1 .6269 0. σ 2 .1087 1. from which we can see that the loglikelihood function is ﬂat when the log-likelihood function near its optimum ˆ (θ = 3) is ﬂat. θj s are depicted in Table 5. This choice of θk allows us to plot the log-likelihood function l(θ) against θ.7275 56. 0. This creates the same problem as that in Example 19 when the sample size equals 6. · · · .0914 0.1170 3.0600 56. Comparisons below recommend the resulting estimate of the penalized likelihood with the SCAD penalty.1670E-2 ˆ θ2 1. and the L2 penalties.0914 0.1397 0.2022E-3 ˆ θ5 4.4451 0.

and the L2 penalties.3114 for the penalized Kriging method with the SCAD. deﬁned as MAR = median{|y(xi ) − y (xi )| : i = 1.8. and MARs equals 1. · · · . ˆ y (x) = μ + r(x)b. 1. (a) Absolute Residual 7 (b) Absolute Residual 4 6 AR of SCAD 3 5 2 1 0 AR of L1 4 4 3 2 1 0 1 3 2 AR of Kriging (c) Absolute Residual 0 0 2 4 AR of Kriging 6 4 3 AR of L 2 2 1 0 0 1 2 3 AR of Kriging 4 FIGURE 5. ˆ The MAR for the ordinary Kriging method is 1.1.0588.4638. and 1. LLC . from which we can see that the penalized Kriging with the SCAD uniformly improves the ordinary Kriging © 2006 by Taylor & Francis Group. respectively. The metamodel obtained by the penalized Kriging with the SCAD penalty outperforms the other three metamodels. we conduct another computer experiment with 100 runs at new sites.4. To assess the performance of the penalized Gaussian Kriging approach. 100}. We now plot the sorted absolute residuals from penalized Kriging versus the absolute residuals from Kriging in Figure 5.8 Plots of absolute residuals. the L1 . ˆ ˆ where r(x) was deﬁned in Section 5. and further compute the median of absolute residuals (MAR).3375.Metamodeling 157 ˆ ˆ ˆ Denote by b = R−1 (θ)(y − 1n μ) the best linear unbiased predictor for the response variable at input variable x.

(c) and (d) are the penalized loglikelihood functions with the SCAD.9.105. The penalized Kriging with the L2 penalty has almost the same performance as that of the ordinary Kriging model. from which we can see that the shape of the log-likelihood function is the same as that of the penalized log-likelihood function with the SCAD penalty. and plot the log-likelihood against θ in Figure 5.18. let θj = θ/σj . We further compute the maximum likelihood estimate for all of the parameters θj . L1 . Again. and L2 penalties.158 Design and Modeling for Computer Experiments model. we apply the penalized Kriging method to the new sample with 100 runs. respectively.9. The selected λ values are 0. To understand the behavior of the penalized Kriging method when the sample size is moderate. for the SCAD. LLC . and 0. from which we found that all of these estimates are close. (a) Log−likelihood Function (b) SCAD Penalized Log−likelihood −50 −50 Penalized Log−lkhd 0 1 θ (c) L Penalized Log−likelihood 1 −60 −60 Log−likelihood −70 −70 −80 −90 −80 −90 −100 −100 2 3 −110 0 1 θ 2 3 (d) L Penalized Log−likelihood 2 0 0 Penalized Log−lkhd −100 Penalized Log−lkhd 0 1 θ −500 −1000 −200 −1500 −2000 −300 −2500 −400 2 3 −3000 0 1 θ 2 3 FIGURE 5.18. 0. (a) is the log-likelihood function.9 Log-likelihood and penalized log-likelihood when n = 100. while the penalized Kriging with the L1 penalty does not perform well in this case. (b). as expected. © 2006 by Taylor & Francis Group. respectively. and L2 penalties. The resulting estimate is listed in Table 5. L1 .

9 159 Penalized Maximum Likelihood Estimate When n = 100 Parameter MLE SCAD L1 L2 ˆ1 0. (1993) focus on computer models that can provide not only the response but also its ﬁrst partial derivatives.Metamodeling TABLE 5.2752 0.3150E-5 0.4261E-5 ˆ θ4 0.6519E-3 ˆ θ3 0.1792 0.1548 0.2939E-1 0. x ∈ T } be a Gaussian process with the mean.5858E-4 θ ˆ θ2 0.3971E-4 0.1515 0. we brieﬂy introduce some basic properties of Gaussian processes. MacKay (1998) described how inferences for a Gaussian process work and how they can be implemented with ﬁnite computational resources.5.5204E-3 0. x ∈ T } is called stationary.3. LLC .3943E-4 0.5. respectively. and for any s. t) = Cov(Y (s).2880 0.5 Bayesian Approach Bayesian interpolation has a long history (see Diaconis (1988) for an interesting account of a method published by Poincar´ in 1896).2590E-1 0. (1991) and Morris et al.3641E-1 ˆ θ6 0. When σ(s. Morris et al.2618E-5 0. In this section. These authors provided a Bayesian formulation for Gaussian Kriging models. Bayesian interpolation was introduced to model computer experiments by Currin et al. ∂xj σ 2 (x) = Var{Y (x)}.3003 ˆ5 θ 0. σ(s.5634E-3 0.2162 5. For the sake of further discussion in Section 5. t) is entirely a function of s−t. 5.1 Gaussian processes Gaussian processes have played an important role in Bayesian interpolation. Y (t)). variance.2765 0. and covariance functions as μ(x) = E{Y (x)}.2602E-5 0.4514E-4 0. denote Yj (x) = ∂Y (x) . © 2006 by Taylor & Francis Group. the process {Y (x).5192E-3 0. (1993). Kimeldorf and e Wahba (1970) and Wahba (1978) established the connection between Bayesian interpolation and smoothing splines.2593E-1 0. Let { Y (x). t ∈ T .

· · · .b1 . Theorem 11 If Y (x) is a Gaussian process with mean function μ(x) covariance function σ(s.2 Bayesian prediction of deterministic functions Suppose that {xi . Ys (x)) is (s + 1)-dimensional Gaussian processes with mean function and covariance function given in (5. a prediction y (x) for Y (x) is the posterior mean.··· . x ∈ T . (5.27) ˆ © 2006 by Taylor & Francis Group.··· . We summarize the properties of the derivative process in the following theorem. an N × N positive deﬁnite matrix with (i.bs ) (t)} = σ (a1 . j)-element σ(ui .5.··· .25) and ΣU U |YO = Cov(YU . i = 1. The general derivative process is deﬁned as ∂ a1 +···+as Y (x) Y (a1 . (1993)).··· .24) (5. YO ) Cov(YO )−1 {YO − E(YO )}. yn )T . the prior “knowledge” about the unknown function y(x). YO ) Cov(YO )−1 {YO − E(YO )} (5.23) and (5.as ) (x) is a Gaussian process with mean function and covariance function given by E{Y (a1 . LLC .bs ) (s.as ) (x) and Cov{Y (a1 . then Y (a1 . t). Y1 (x).as .26) Therefore. t). · · · . YU |YO ) = ΣU U − Cov(YU . · · · . it is well known that the posterior process. given the vector of observed responses YO ≡ (y1 . such that for every ﬁnite set of sites U = {u1 .as ) (s). and yi = y(xi ) is the associated output to xi . (5.24). n} are design points over an s-dimensional experimental domain T .23) Moreover.160 Design and Modeling for Computer Experiments the derivative of Y (x) with respect to xj . uj ). is also a Gaussian process. Y (b1 . 5. μ(uN )) ≡ μU and with covariance matrix Cov(YU ) ≡ ΣU U .··· .··· . YO ) Cov(YO )−1 Cov(YO . With this prior speciﬁcation. · · · . · · · .as ) (x) = ∂xa1 · · · ∂xas s 1 for aj ≥ 0. (1991)). the posterior mean and covariance at any ﬁnite set of sites U ⊂ T have the following closed forms: μU |YO = E(YU |YO ) = μU + Cov(YU . 2. uN } ⊂ T . · · · . y (x) = μx|YO = μ(x) + Cov(Y (x).··· . respectively (cf. YU ). · · · . So.as ) (x)} = μ(a1 . (5. Section 1 of Morris et al.··· . Furthermore. s. the random vector YU = (Yu1 . In the context of Bayesian interpolation. which is also ˆ the optimal condition under the squared loss in the context of decision theory (Currin et al. YuN ) is multivariate normal with mean vector E(YU ) = (μ(u1 ). j = 1. is taken to be the Gaussian process Y = {Y (x). x ∈ T }. (Y (x).

Metamodeling

161

which is the most popular choice of prediction function derived from Gaussian prior processes. The elements of Cov(Y (x), YO ) can be viewed as functions of x, and they further form a set of bases consisting of n functions of x. The basis functions follow automatically from the choice of prior process Y and design D and do not need to be chosen by the data analyst. From an approximation theoretical point of view, (5.27) is the unique interpolating function in the spanned space of the n basis functions. Moreover, the prediction y (x) can also be viewed as ˆ a minimal norm interpolant. See, for example, Micchelli and Wahba (1981), Sack and Ylvisaker (1985), and Currin et al. (1991). In practice, it is usual to consider only stationary prior processes. That is, the prior mean and variance are assumed to be constant for all sites x ∈ T : μ(x) = μ and σ 2 (x) = σ 2 , and prior correlation is assumed to depend on only the distance between two sites: Corr(Ys , Yt ) = r(θ; s − t ), where r(θ; 0) = 1, and θ is a unknown parameter vector to be estimated by the collected data. The correlation function can be easily derived using the product correlation rule, deﬁned by

s

r(θ; s − t ) =

k=1

rk (θk ; |sk − tk |),

**where rk (θk ; ·) is a univariate correlation function. The correlation function
**

s

r(θ; s − t ) = exp{−

k=1

θk |sk − tk |q }

can be derived by taking rk (θk ; |sk − tk |) = exp{−θk |sk − tk |q } which is referred to as the exponential correlation function. Denote lk to be the range of the k-coordinator of input vector over T , k = 1, · · · , s. The linear correlation function is deﬁned as rk (θk ; |sk − tk |) = 1 − 1 |sk − tk |, θ k lk for 1 < θk < ∞. 2

**The non-negative linear correlation function is deﬁned as rk (θk ; |sk − tk |) = 1− 0,
**

1 θk lk |sk

− tk |,

when when

|sk − tk | < θk lk , |sk − tk | ≥ θk lk ,

Under the assumption of stationarity, equation (5.27) becomes y (x) = μx|yO = μ + Cov(Y (x), YO ) Cov(YO )−1 {YO − μ1n }, ˆ (5.28)

© 2006 by Taylor & Francis Group, LLC

162

Design and Modeling for Computer Experiments

which is the same as the one derived from the ordinary Gaussian Kriging model (5.16) and (5.17). A natural way to eliminate μ and σ 2 from the prior process would be to assign them standard noninformative prior distributions. However, Currin et al. (1991) suggested estimating μ and σ 2 by their maximum likelihood estimate since the maximum likelihood estimate is the most reliable compared with various kinds of cross validation they have tried (see Currin et al. (1991)). The estimation procedure described in Section 5.4.2 can be directly used to search the maximum likelihood estimate for (μ, σ 2 , θ) under the assumption of stationarity. The resulting metamodel based on the Bayesian method is exactly the same as the ordinary Gaussian Kriging model (5.16) and (5.17). The Bayesian formulation provides us the insights into why the ordinary Gaussian Kriging model works well in practice. Under the Bayesian formulation, we can easily incorporate other information into analysis of computer experiments. Speciﬁcally, we introduce how to utilize derivatives of the response.

5.5.3

Use of derivatives in surface prediction

In some situations, computer models can provide not only the response y(x) but also ﬁrst partial derivatives yj (x) = ∂y(x)/∂xj , j = 1, · · · , s. Inspired by several applications and strong scientiﬁc interests, Morris et al. (1993) proposed a Bayesian modeling procedure which can simultaneously model the response and its derivatives. In what follows, we introduce their modeling procedure in detail. Suppose that x1 , · · · , xn are design points over the s-dimensional experimental domain T , and at each xi , yi ≡ (y(xi ), y1 (xi ), · · · , ys (xi )) is the associated output of the true model, where yj (xi ) is the partial derivative of y(xi ) with respect to xj . Denote the vector consisting of all outputs by y = (y1 , · · · , yn ) . Denote further Yi ≡ (Y (xi ), Y1 (xi ), · · · , Ys (xi )) . As suggested by Morris et al. (1993), the prior uncertainty about the vector y is represented by treating y as a realization of the random normal n(s + 1)-vector Y = (Y1 , · · · , Yn ) (5.29)

with mean μ and covariance matrix Σ given by (5.23) and (5.24), respectively. It is well known that the posterior process, denoted by Y ∗ (x), is also a Gaussian process with mean function μ∗ (x) = E{Y ∗ (x)|y} = μ(x) + Cov(Y (x), Y)Σ−1 (y − μ) and covariance function K ∗ (s, t) = Cov{Y ∗ (s), Y ∗ (t)|y} = σ(s, t) − Cov{Y (s), Y}Σ−1 Cov{Y, Y (t)}. (5.31) (5.30)

© 2006 by Taylor & Francis Group, LLC

Metamodeling

163

Equations (5.30) and (5.31) are the posterior mean and covariance function for a general Gaussian process. The advantage to the use of stochastic processes as priors for y(x) is that the variability of the posterior processes Y ∗ (x), as expressed by the posterior covariance function K ∗ (s, t) in (5.31), can be used to provide measures of uncertainty, and design can be sought to minimize the expected uncertainty in some sense. See Morris et al. (1993) for details. Under the assumption of stationarity, the above formulas can be simpliﬁed. Notice that μ(x) ≡ μ and σ(s, t) = σ 2 r(|s − t|), where σ 2 is a constant, r(·) is a correlation function that depends only on the distance between s and t, and |a| stands for a vector with elements |aj |. In practice, r(·) is often taken to have the following product correlation form:

s

r(|s − t|) =

k=1

rj (sj − tj ),

where rj s usually are chosen from a parametric family of suitably diﬀerentiable correlation functions on the real line. Due to the stationarity, μk (x) = E{Yk (x)} = 0, and we further have Cov(Y (a1 ,··· ,as ) (s), Y (b1 ,··· ,bs ) (t)) = σ 2 (−1)

Èa

s

j

rk

k=1

(aj +bj )

(sj − tj ). (5.32)

Note that here we consider only the ﬁrst derivative, which for each of aj and bj is either 0 or 1. The choice of correlation functions rj (·) must be twice diﬀerentiable. However, notice that the exponential correlation function rk (θk , sk − tk ) = exp{−θk |sk − tk |q }, 0 < q ≤ 2,

is twice diﬀerentiable only in the case of q = 2. Both the linear correlation function and the non-negative linear correlation function are not diﬀerentiable at some places. One may, however, use a cubic spline to construct a correlation function that is diﬀerentiable everywhere. Deﬁne a cubic correlation function as θk1 θk1 rk (θk , |sk − tk |) = 1 − 2 |sk − tk |2 + 3 |sk − tk |3 , 2lk 6lk where θk1 and θk2 are positive parameters that satisfy θk2 ≤ 2θk1 and θk2 − 2 6θk1 θk2 + 12θk1 ≤ 24θk2 . Deﬁne further the non-negative cubic correlation

© 2006 by Taylor & Francis Group, LLC

164 function ⎧ ⎪ ⎪ ⎪1 − 6 ⎪ ⎪ ⎪ ⎨

Design and Modeling for Computer Experiments

|sk − tk | |sk − tk | |sk − tk | θ +6 , if < , θ k lk θ k lk lk 2 3 rk (θk , |sk − tk |) = 2 1 − |sk − tk | , if θ ≤ |sk − tk | < θ, ⎪ ⎪ θ k lk 2 lk ⎪ ⎪ ⎪ ⎪ 0, if |sk − tk | ≥ θ, ⎩ lk where θk > 0. We now discuss how to estimate the unknown parameters presented in the model. As usual, the correlation function is ﬁtted by a family of parametric models indexed by the parameter vector θ. Morris et al. (1993) suggest estimating (μ, σ 2 , θ) using the maximum likelihood estimation. Let R(θ) be the correlation matrix of Y in (5.31), consisting of all outputs. Furthermore, denote r(θ) = Corr(y, Y (x)) = (r(θ; |x − x1 |), · · · , r(θ; |x − xn |)) . After dropping a constant, the log-likelihood of the collected data is 1 1 (μ, σ 2 , θ) = − n(s + 1) log(σ 2 ) − log |R(θ)| 2 2 1 − 2 (y − μe) R−1 (θ)(y − μe), (5.33) 2σ where e is an n(s + 1) binary vector with 1 in position (i − 1)(s + 1) + 1, i = 1, · · · , n, i.e., in each position corresponding to the mean of some Y (xi ), and 0 everywhere else. The log-likelihood function has the same form as that in (5.20). The estimation procedure described in Section 5.4.2 can also be utilized here. In particular, for ﬁxed θ, the maximum likelihood estimate for (μ, σ 2 ) is e R−1 (θ)y μ(θ) = ˆ e R−1 (θ)e and 1 (y − μ(θ)e) R−1 (θ)(y − μ(θ)e). ˆ σ 2 (θ) = ˆ n(s + 1) One may employ the cross validation procedure to select an optimal θ. But since θ is multi-dimensional, the computation could be expensive. A popular way to determine θ is to maximize (ˆ(θ), σ 2 (θ), θ), which is called a μ ˆ proﬁle likelihood, and therefore the resulting estimator for θ is referred to as the proﬁle likelihood estimator. Of course, the penalized likelihood approach described in Section 5.4.2 is also applicable to the current setting. As usual, one may directly use the posterior mean to predict the response at a new site x. Under the assumption of stationarity, the prediction of Y (x) is ˆ ˆ ˆ (5.34) y (x) = μ + r(θ) R−1 (θ)(y − μ(θ)e). ˆ This will be implemented in the next section.

2

3

© 2006 by Taylor & Francis Group, LLC

Metamodeling

165

5.5.4

An example: borehole model

The borehole model (see Example 8) is a classical example in the literature of modeling computer experiments. This example was ﬁrst studied by Worley (1987). Then it was used to illustrate how to use derivatives in surface prediction in Morris et al. (1993). Recently, this example has been revisited by An and Owen (2001) and Fang and Lin (2003). Section 1.8 presents a detailed description of this example. Here we summarize the demonstration of Morris et al. (1993). For simplicity of presentation, let x1 , · · · , x8 be the scaled version of rw , r, Tu , Hu , Tl , Hl , L and Kw , respectively. That is, for example, x1 = (rw − 0.05)/(0.15 − 0.05). Therefore, the ranges of xi s are the interval [0, 1]. As illustrated in Morris et al. (1993), in order to produce a somewhat more nonlinear, nonadditive function, the range of Kw is extended to [1500, 15000]. To demonstrate the concepts deﬁned in Section 5.5.1, ﬁx x2 , · · · , x7 to be zero, and view y(x) as a response of x1 and x8 . The 3-D plots and contour plots of the response over x1 and x8 are depicted in Figure 5.10(a) and (b), which were reconstructed from Morris et al. (1993). The data, (y(xi ), y1 (xi ), y8 (xi )), i = 1, 2, 3, are depicted in Table 5.10. Represent the output as a 9 × 1 vector y = (3.0489, 12.1970, · · · , 244.4854) .

TABLE 5.10

Data x x1 x2 x3

for Borehole Example x1 x8 y(x) 0.0000 0.0000 3.0489 0.2680 1.0000 71.6374 1.0000 0.2680 93.1663

y1 (x) 12.1970 185.7917 123.6169

y8 (x) 27.4428 64.1853 244.4854

In this example, let us take the correlation function to be the exponential correlation function. Let rj (θj ; |sj − tj |) = exp{−θj |sj − tj |2 } and

s

r(θ; s − t ) =

j=1

rj (θj ; |sj − tj |2 ).

To calculate the correlation function of the derivative processes of Y (x) by

© 2006 by Taylor & Francis Group, LLC

2 10 0. r18 (s.2 0. 2 r11 (s. r81 (s. r08 (s. |s − t|).4 0. t) = r(00000001. as Σ = σ 2 R(θ).8 0 10 0 200 y(x) 0.8 1 x 1 (c) 3D plot of estimated response (d) Contour Plot of Response 300 1 20 0 15 200 y(x) 0. |s − t|).10000000) (θ.4 25 0 1 0. One may further compute Σ.2 10 x 0 8 0 0. |s − t|).6 0.8 1 x1 FIGURE 5. r10 (s. the covariance matrix of Y. LLC . |s − t|) = −2θ8 (s8 − t8 )r(θ. |s − t|).4 0.00000001) (θ. r80 (s. |s − t|) = −4θ1 θ8 (s1 − t1 )(s8 − t8 )r(θ.10 The 3-D plot and contour plot for the borehole example. |s − t|) = −2θ1 (s1 − t1 )r(θ.5 0 0 x 1 0. we ﬁrst calculate the derivatives of r(θ. 2 r88 (s.8 10 0 0 100 0.00000000) (θ.5 1 0.00000000) (θ. t) = r(10000000. and © 2006 by Taylor & Francis Group. |s − t|).00000001) (θ.00000001) (θ. |s − t|).5 1 25 0.00000001) (θ. |s − t|).166 Design and Modeling for Computer Experiments Theorem 11. |s − t|) = −4θ1 θ8 (s1 − t1 )(s8 − t8 )r(θ. |s − t|).4 0. t) = r(00000001. |s − t|) = (2θ8 − 4θ8 (s8 − t8 )2 )r(θ.5 0 0 x 0 8 x1 0 0. |s − t|). |s − t|) = −2θ8 (t8 − s8 )r(θ. t) = r(10000000.6 x8 75 50 100 0. r01 (s.6 x8 75 50 0 1 0.2 0. t) = r(θ. |s − t|) = −2θ1 (t1 − s1 )r(θ. |s − t|) = (2θ1 − 4θ1 (s1 − t1 )2 )r(θ. |s − t|). t) = r(00000000. (a) 3D plot of response 1 (b) Contour Plot of Response 20 0 300 15 0. t) = r(10000000.6 0. t) = r(10000000. Denote r00 (s. t) = r(00000000.10000000) (θ.

Metamodeling 167 the elements of R(θ) can be directly computed from the above expressions. The predictions using MLP © 2006 by Taylor & Francis Group.10(c) and (d). although there are only three runs. σ 2 = 135. θ8 = 0.317. Rumelhart et al. from which we can see the resulting metamodel provides us with a good prediction over the unobserved sites. We can train a neural network to perform a particular function by adjusting the values of the connections weights (i. Training in neural networks is synonymous with model building and parameter estimation. 5. Example 18 (Continued) For the purpose of illustration.418. The neuron model and the architecture of a neural network describe how a network transforms its input into an output.460. −0.429. In this section. −0. maximum Latin hypercube design and modiﬁed maximin design.369. the prediction at a new site x is given by ˆ ˆ y (x) = 69. Bishop (1995). maximin design. This mapping of inputs to outputs can be viewed as a non-parametric regression computation.47.34) in this example. 1]8 is very challenging. parameter estimation) methods (Hassoun (1995). LLC . ˆ ˆ ˆ where b = R−1 (θ)(y − 69.006.467..914. The log-likelihood function (5.e. 0.453. 0.. and ˆ b = 103 × (1. we train MLP and RBF networks using the data set in Example 18. (1996)). and Hagan et al.e.674) . and provided a detailed analysis based on the 10 runs data. 0. The 3-D plot and the contour plot of the resulting metamodel are displayed in Figure 5.579.6 Neural Network The term neural network has evolved to encompass a large class of models and “learning” (i. They reported that the maximization θ ∈ [0. 0.15. The ﬁeld of neural networks has a history of ﬁve decades and has gained widespread applications in the past ﬁfteen years. we will consider two popular types of neural network models for performing regression tasks known as multi-layer perceptron (MLP) and radial basis function (RBF) networks. Interested readers are referred to Section 4 of their paper. (1993) further tried four diﬀerent experimental designs: Latin hypercube design. ˆ ˆ Directly implementing (5. (1986) gave an systematic study with references therein. Haykin (1998).15 + r(θ) β.15e). 0. Morris et al. 0. θ1 = 0. Neural networks are composed of simple elements operating in parallel based on a network function that is determined largely by the connections between elements. parameters) between elements.33) achieves its maximum at ˆ ˆ μ = 69. each having 10 runs.

168

Design and Modeling for Computer Experiments

with 7 hidden units and RBF with 7 basis functions are depicted in Figure 5.11. We can see that both MLP and RBF networks have good performance. The selection of appropriate size of networks (i.e., the number of hidden units in MLP and the number of basis functions in RBF) will be further discussed in this section.

2.5

2

1.5

1

True Sample RBF MLP

0.5

0

-0.5

-1

-1.5

-2

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

FIGURE 5.11 Plot of neural network model ﬁt.

5.6.1

Multi-layer perceptron networks

A single neuron or perceptron that consists of inputs, weights and output (see Figure 5.12) performs a series of linear and non-linear mapping as follows:

s

v=

i=1

wi xi + w0 ,

and b(v) =

1 1 + e−λv

or

b(v) = tanh(λv),

© 2006 by Taylor & Francis Group, LLC

Metamodeling

169

where xi are the inputs, and wi are the corresponding weights or parameters of the model, and b(v) is the activation function or transfer function usually chosen to have a logistic-type of function. The steepness of the logistic function, λ, is typically set equal to 1.

x1

1

w1

w0

1 0.8

x2 w2

. . .

S

v

0.6

0.4

0.2

0 -3 -2 -1 -0.2 0 1 2 3

-0.4

b(v)

-0.6

-0.8

-1

xp

wp

FIGURE 5.12 A neuron model with xi as inputs, wi as weights or parameters, and b(v) as the activation function output.

A multi-layer perceptron (MLP) network (see Figure 5.13), that consists of input, hidden, and output layers with nonlinear and linear activation functions in the hidden and output layers, respectively, approximates inputs and outputs as follows:

d

y= ˆ

j=1

βj bj (vj ) + β0 ,

where d is a pre-speciﬁed integer, βj is the weight connection between the output and the jth component in the hidden layer, and bj (vj ) is the output of the jth unit in the hidden layer, bj (vj ) =

and

s

1 , 1 + e−λvj

or

bj (vj ) = tanh(λvj ),

vj =

i=1

wji xi + wj0 ,

where wji is the weight connection between the jth component in the hidden layer and the ith component of the input.

© 2006 by Taylor & Francis Group, LLC

170

Design and Modeling for Computer Experiments

x1

W b1 b2

. . .

b

ˆ y = å b j (v j )b j + b 0

j =1 d

x2

. . .

xp

Input Layer

bd

Hidden Layer Output Layer

FIGURE 5.13 A three-layer MLP network.

Cybenko (1989) showed that a single hidden layer network employing a sigmoid type (as well as other types) activation function is a universal function approximator. Theorem 12 Let b(.) be a continuous sigmoid-type function. Then for any continuous real-valued function f on [0, 1]s (or any other compact subset of Rs ) and ε > 0, there exist vectors w1 , w2 , · · · , ws , β and a parameterized function g(x, W, β) : [0, 1]s → R such that | g(x, W, β) − f (x) |< , for all x ∈ [0, 1]s , where

d

g(x, W, β) =

j=1

T βj bj (wj xi + wj0 ),

and wj ∈ Rs , w0 ∈ Rd , W = (w1 , w2 , · · · , wd ), w0 = (w10 , w20 , · · · , wd0 ), and β = (β1 , β2 , · · · , βd ). Network training to estimate the parameters is typically done using the least squares criterion to minimize

n

E=

(yk − yk )2 . ˆ

k=1

© 2006 by Taylor & Francis Group, LLC

Metamodeling

171

ˆ An instantaneous version of the above criterion, E = (yk − yk )2 , is used in a stochastic gradient algorithm and leads to parameters being updated as follows: ∂E = −ρ(yk − yk )bj , ˆ ∂βj ∂E ∂E ∂bj ∂vj t+1 t Δwji = wji − wji = −ρ = −ρ , ∂wji ∂bj ∂vj ∂wji

t+1 t Δβj = βj − βj = −ρ t t where βj and wji are the parameter values at the tth iteration, ρ is a learning rate, and

∂vj = xi , ∂wji ∂E ∂ 1 ∂ yk ˆ = { (yk − yk )2 } = −(yk − yk ) ˆ ˆ = (yk − yk )βj , ˆ ∂bj ∂bj 2 ∂bj ∂bj 1 = b( vj )[1 − bj (vj )] for bj (vj ) = , ∂vj 1 + e−λvj ∂bj = 1 − b2 (vj ) for bj (vj ) = tanh(vj ). ∂vj Therefore, weights between the input and hidden layers are updated as follows: ˆ Δwji = ρ(yk − yk )βj ∂bj xi . ∂vj

The above parameter updating is known as back propagation learning (Rumelhart et al. (1986)). There are numerous other training algorithms based on gradient optimization that are more eﬃcient (in the sense of faster convergence) than the above algorithm. Interested readers should consult the considerable literature on neural networks (Haykin (1998), Hassoun (1995), and Bishop (1995)). The steps for training MLP networks can be summarized as follows.

−x 1. Normalization: Normalize the data set such that ui = (xisi i ) , where xi and si are the sample mean and standard deviation of xi , respectively. This normalization helps the training process to ensure that the inputs to the hidden units are comparable to one another and to avoid saturation (i.e., the sum products of the weights and the inputs resulting in values which are too large or too small for a logistic function) of the activation functions.

2. Select network architecture: • Select the number of hidden layers (a single hidden layer is suﬃcient; see Theorem 12).

© 2006 by Taylor & Francis Group, LLC

172

Design and Modeling for Computer Experiments • Select the proper number of units in the hidden layer. Too few or too many units will create under- or over-ﬁtting, respectively. Over-ﬁtting causes poor capability of predicting untried points. • Select activation functions. Sigmoid or tanh activation functions are the most popular choices for the units in the hidden layer while the linear activation function for the output unit is appropriate for regression problems.

3. Select learning rate ρ: Too large a value of learning rate may cause the learning process to fail to converge while too small a learning rate may cause slow convergence. Many variations of learning algorithms to speed up the convergence rate employ diﬀerent strategies of learning rate (e.g., Darken and Moody (1992), Le Cun, Simard and Pearlmutter (1993)). 4. Initialization: Initialize the weights, W and β, randomly within small ranges. Because the network training employs gradient search, the training process may be trapped to a local minima. Therefore, several initial starting values of W and β may be tried to get the best result. 5. Training: Train the network with the algorithm of choice (e.g., back propagation) until suﬃcient ﬁtting error is achieved for the training data set. Many practices in neural networks suggest splitting the data sets into training and testing sets. The former is used for network training while the latter is used to stop the training when prediction error on the testing data set achieves a minimum. When the size of the data set is small, however, this approach may be unjustiﬁed. Other practices include early stopping of training. That is, the training iteration process is stopped after a small number of iterations before the ﬁtting errors are too small to avoid overﬁt. This heuristics, however, is very ad hoc as it is usually diﬃcult to determine when to stop. Some authors have suggested using a penalty function in addition to the least square criterion function to eliminate some of the units in the hidden layer or to prune some of the weights. In the example below we shall employ a post-training penalty approach to eliminate some units in the hidden layer to improve network prediction capability.

5.6.2

A case study

An exhaust manifold is one of the engine components that is expected to endure harsh and rapid thermal cycling conditions ranging from sub-zero to, in some cases, more than 1000o C. The increasingly reduced product development cycle has forced engineers to rely on computer-aided engineering tools to predict design performance before any actual physical testing. Hazime, Dropps, Anderson and Ali (2003) presented a transient non-linear ﬁnite element method to simulate inelastic deformation used to predict the thermo-

© 2006 by Taylor & Francis Group, LLC

The method incorporates elastic-plastic and creep material models and transient heat transfer analysis to simulate the manifold behavior under transient thermal loads. Several networks with diﬀerent numbers of hidden units and training processes are considered: 1. To optimize the gasket design to prevent leaks. 2. The exhaust manifold assembly includes the exhaust manifold component. several training strategies are evaluated: © 2006 by Taylor & Francis Group.3 for the deﬁnition of SLH).Metamodeling 173 mechanical fatigue life of cast exhaust manifolds. a symmetric Latin hypercube (SLH) design with 17 runs and 5 factors is applied to the simulation model (see Section 2. The SLH design matrix is optimized with respect to maximin distance (see Section 2. For the latter network.3) as shown in Table 5. Three-layer MLP networks are used as the metamodel. in terms of its internal stress distribution and contact pressure at the gasket. Exhaust Manifold Gasket Cylinder Head Fastener FIGURE 5. the model also predicts dynamic sealing pressure on the gasket to identify potential exhaust gas leaks. In addition to predicting fatigue life. LLC .4.14).11. A network with 2 hidden units which is expected to under-ﬁt the training data.14 Exhaust manifold ﬁnite element model. and a portion of the cylinder head (see Figure 5. fastener. The transient eﬀects of temperature and the eﬀect of creep change the behavior of the manifold. A network with 15 hidden units which is expected to over-ﬁt the training data. Sigmoidal activation functions are employed for the hidden layer while the linear activation function is used for the output unit. in turn aﬀecting the sealing integrity from cycle to cycle. gasket.

56 0.25 0.69 0.68 • Perform training until the ﬁtting errors are very small (i.06 0.63 0. minimum RMSE) to the testing data set.19 6 0.25 0.65 10 0. Note that the two unit hidden layers do not provide suﬃcient ﬁt to the training data and also provide poor prediction to the testing data set.38 0.06 0.56 0.63 0. The results for these networks in terms of the root mean square errors for the testing dataset are summarized in Table 5. • Perform post-training process by applying the penalized least square method to the connection between the hidden and output layers.94 12 0.00 0.13 0.63 11 0.000 training iterations).13.69 0.63 0.68 9 0.94 0.88 0.63 14.38 0.75 16..56 0..31 0.174 TABLE 5.94 0.19 0.75 0.38 0.44 11.25 2.00 0.81 0.25 0.50 0.81 0.69 10. • Perform early stopping with only 200 iterations in the training process.19 1. © 2006 by Taylor & Francis Group.63 0.50 0.00 0. In this respect.e. LLC . The RMSE for the MLP with 15 hidden units without early stopping exhibits over-ﬁt problems for the data as indicated by a very small ﬁtting RMSE but a larger prediction RMSE. the trained networks are used to predict the testing data set in Table 5.81 9.88 15 0.98 13 0.56 10.81 0.44 0.13 0.56 0.50 0.00 0.00 0.13 7.75 0.20 17 1.06 14.19 0.94 0.75 0.44 0.88 19.94 0. To evaluate the performance of the networks.88 1.75 1.51 14 0.08 3 0.00 0.50 9.81 0.31 0.70 7 0.69 0.01 16 0.06 0.31 22.38 0.11 Design and Modeling for Computer Experiments Training Data Set for the Exhaust Manifold Sealing Performance Run x1 x2 x3 x4 x5 y 1 0.19 19.01 5 0.31 0.13 1.00 0.69 0. the good performance for this strategy is achieved unfairly because the testing data set is used during the training to make the early stopping decision while the other approaches did not use the testing data at all during the training.06 0.50 0.e.25 0. 10.31 0.13 0.38 18.94 17.19 0.00 0.12.44 0.88 0. The decision for the early stopping strategy of 200 iterations is based on the best prediction (i.44 0.00 19.00 11.55 8 0.36 4 0.30 2 0.88 0.

11 0.05 0.58 6.32 0.00 0.00 0.68 0.1905 15 unit hidden layer with early stopping 0.53 0.54 7 0.84 19.37 0.95 0.11 0.3258 2.05 1.15 8 0.59 6 0.68 1.00 0.Metamodeling TABLE 5.05 0.30 14 0.47 0.63 1.08 20 0.57 11 0.11 15.58 0.16 0.00 0.30 16 0.53 0.32 0.48 3 0.63 0.0.47 0.79 0.47 12.89 0.11 0.79 0.00 0.47 0.19 13 1.53 0.42 0.37 0.37 0.74 13.0.00 20.95 12.9047 ping and penalized least square posttraining (3 out of 15 hidden units are maintained) 2.26 0.89 0.21 0.74 0.05 15 0.95 0.21 3.0102 (200 iterations) 15 unit hidden layer without early stop.42 0.63 0.42 0.16 0.26 14.19 17 0.63 21.68 0.68 18.79 6.05 10.21 0.0000 ping (10000 iterations) 15 unit hidden layer without early stop.26 0.74 0.58 0.58 0.21 0.53 19.89 0.89 0. LLC .84 0.84 0.32 21.76 TABLE 5.16 17.68 0.37 12.00 0.4882 3.05 0.00 14.32 0.37 0.53 0.04 18 0.84 0.32 0.00 0.11 0.47 0.16 0.79 0.95 0.63 0.57 12 0.47 9 0.12 175 Testing Data Set for the Exhaust Manifold Sealing Performance Run x1 x2 x3 x4 x5 y 1 0.16 0.84 0.58 0.00 0.26 1.1853 2.21 0.2575 © 2006 by Taylor & Francis Group.74 0.53 19 0.95 0.61 2 0.42 8.74 0.42 0.93 5 0.26 0.79 0.72 4 0.89 12.32 10 0.13 Performance Comparison in Terms of RMSE (ﬁtting training data and predicting testing data) of Diﬀerent MLP Networks Networks RMSE (ﬁt) RMSE (prediction) 2 unit hidden layer 1.

15 hidden units) is fully trained. Note that the selection is conducted after a large size of networks (i. For this example only 3 out of 15 hidden units are selected by the SCAD penalty function.. 25 20 True Values 15 10 5 0 0 5 10 15 20 25 Predicted Values FIGURE 5.e. · · · . True values from MLP network with 3 hidden units selected by the SCAD penalized least squares to the 15 hidden unit networks. the network mapping can be considered a linear model in terms of the weight connection between hidden and output units. Here SCAD penalty function can be used. LLC . a simple post-training hidden unit selection is performed. are ﬁxed.15 Predicted values Vs. 1 y − β0 − Bβ 2 s 2 + j=1 pλ (|βj |). W. In this respect. to minimize bias minimization to the training data and to minimize prediction variance). β. bd ). For ﬁxed W.176 Design and Modeling for Computer Experiments To alleviate the problem of unknown early stopping of training and to achieve better prediction capability (i.10) can be applied to select the sigmoidal basis functions.. © 2006 by Taylor & Francis Group. with hidden units as sigmoidal basis functions. where B = (b1 . similar to the polynomial basis function selection. The comparison between the prediction of the 3 out of 15 hidden unit MLP and the true values is shown in Figure 5. b2 . the hidden unit selection process can be performed quickly and easily by using the standard linear least squares technique to estimate β.e.15. a penalized least square method in (5. Note that once the network is trained so that the weights between input and hidden layers.

e. When Gaussian basis functions are chosen. the RBF model is closely related to the Gaussian Kriging model © 2006 by Taylor & Francis Group. yn ) . • The basis functions are not constrained to be at the data points.Metamodeling 177 5. Too large or too small θ produces an under. Here. j = 1. it can be interpreted as the (inverse) width of the Gaussian kernel. n. the interpolation for the data set can be written as y = μ + Bβ. such as by setting all θj equal (θj = θ) to some multiple average distance between the basis function centers to ensure some overlap so that the interpolation is relatively smooth. thus. · · · . the parameter β can be easily calculated using the least squares method β = (B B)−1 B(y − μ). Once the basis functions are set.6. parameter estimation). but they can be selected as part of the network training process (i. These parameters θj in RBF are often heuristically chosen.3 Radial basis functions Radial basis function (RBF) methods are techniques for exact interpolation of data in multi-dimensional space (Powell (1987)). In a matrix form. where μ is the mean of y. 2. Bishop (1995)) such as • The number of basis functions need not be equal to the number of data points and typically are less than n. B = {bij ( x − xj )}. There are several popular choices of basis functions such as Linear Cubic Thin-plate spline Gaussian bj ( bj ( bj ( bj ( x − xj x − xj x − xj x − xj ) = x − xj ) = x − xj 3 ) = x − xj 2 log x − xj ) = exp(−θj x − xj 2 ) A number of modiﬁcations to the exact interpolation have been proposed (Moody and Darken (1989). The RBF maps the inputs to the outputs using a linear combination of the basis functions n y(x) = μ + j=1 βj bj ( x − xj ). where y = (yi . • The width parameters θj can be varied for each basis function and determined during the training process..or over-smoothing. Note that θj is the reciprocal of the variance in a normal distribution. i. respectively. · · · . and the square matrix. LLC . we will use the Gaussian basis function.

When μ is ˆ 1 © 2006 by Taylor & Francis Group.4 can be applied to determine the parameter θ as follows: 1 1 1 pλ (θ).4. From our experience (Li and Sudjianto (2005)). the SCAD penalty function can be employed as the penalty terms to regularize the estimation of θ.e. j)-element b(xi . (a − 1)λ -32 -34 -36 -38 -40 -42 -44 0 2 4 6 8 10 12 FIGURE 5.178 Design and Modeling for Computer Experiments (see Li and Sudjianto (2005)). respectively. LLC . As mentioned in Sections 5. xj ).1 and 5. Maximizing the penalized likelihood yields ˆ a penalized likelihood estimate μ and θ for μ and θ. θ) = − log |B(θ)|− 2 − (y − 1N μ)T B−1 (θ)(y − 1N μ) − 2 2 j=1 s where pλ (. therefore. pλ (θ) = λ{I(θ ≤ λ) + (aλ − θ)+ I(θ > λ)}. and B(θ) is an n × n matrix with the (i.25.. Q(μ.5 ∼ 1. The parameter λ can be set equal to α{ln(n)/n} 2 . i. the optimal value for θ is insensitive with respect to the change of λ for α = 0.) is a given non-negative penalty function with a regularization parameter λ.16 SCAD penalized likelihood function for the exhaust manifold data. maximum likelihood estimation or its penalized function as discussed in Section 5.

The performance of the networks in predicting the test data set (Table 5. Once the parameter is determined.8744. 3.11 is used to illustrate the selection of parameters for the network.6 is used. may be employed to reduce the number of basis functions for the purpose of improving the prediction capability. a one-dimensional line search (see. where xi and si are the mean and standard deviation of xi . Normalize the data set such that ui = (xi − xi )/si . Apply a line search algorithm to ﬁnd θ that maximizes the penalized likelihood criterion. The RMSE of arbitrarily selected θ = 2 is compared to that of θ selected using penalized likelihood as well as βs from the least squares and (SCAD) penalized least squares.14. least squares 2.8744.8744.1. as presented in Section 5.. penalized least squares 6. for example. LLC .2504 Penalized likelihood θ = 0. If desired.4510 Penalized likelihood θ = 0. Teukolsky. TABLE 5. respectively. Press. 1 y − μ − Bβ 2 s 2 + j=1 pλ (|βj |). equal θ for all inputs can be applied.12) in terms of root mean squares of errors for diﬀerent sets of parameters is summarized in Table 5.14 Root Mean Squares of Prediction Errors for RBFs with Diﬀerent Parameter Estimation Techniques Estimation Technique RMSE θ = 2. penalized least squares 2. i. the second stage of parameter estimation for β can be easily done using the least squares method. The data set in Table 5. penalized least squares for a linear model.11 is shown in Figure 5. The steps to build RBF networks are as follows: 1.16. thus. 2. This normalization helps the training process to make the input scales comparable to each other. Apply least squares (or penalized least squares as discussed below) to estimate β. least squares 7.e. The penalized likelihood function using a SCAD penalty function for the data set from Table 5. Vetterling and Flannery (1992)) can be employed to ﬁnd the optimal θ that maximizes the penalized likelihood criterion. the exhaust manifold sealing example described in Section 5.1466 θ = 2.Metamodeling 179 set equal to the mean of y. To illustrate the process. The maximum likelihood value is attained for θ = 0.2421 © 2006 by Taylor & Francis Group.

In this section. n is a random sample from the following model: Y = m(X) + ε. If a nonlinear pattern appears in the scatter plot of Y against X. j=0 (5. 5. Applying the Taylor expansion for m(·) in a neighborhood of x.180 Design and Modeling for Computer Experiments 5. Tu (2003) and Tu and Jones (2003) gave some case studies on how to use local polynomial regression to ﬁt computer experiment data. We will further establish the connections between local polynomial regression and other computer experiment modeling techniques introduced in the previous sections. one may employ polynomial regression to reduce the modeling bias of linear regression. polynomial regression ﬁtting may nonetheless have substantial biases because the degree of polynomial regression cannot be controlled continuously. Høst (1999) discussed how to use the local polynomial regression techniques with Kriging for modeling computer experiments.7 Local Polynomial Regression Local polynomial regression is a popular modeling technique for nonparametric regression. As demonstrated in Example 18. LLC .36) © 2006 by Taylor & Francis Group. Fan and Gijbels (1996) present a systematic account of theoretical properties of local polynomial regression.7. Local polynomial regression can repair the drawbacks of polynomial ﬁtting. as it is more powerful for lower-dimensional data sets. Here m(x) = E(Y |X = x) is called a regression function.35) where ε is random error with E(ε|X = x) = 0. i = 1. These works demonstrate the potential of local polynomial regression to model computer experiments. · · · . However.1 Motivation of local polynomial regression Let us begin with a typical one-dimensional nonparametric regression model. it is assumed only that m(x) is a smooth function of x. we have p m(u) ≈ j=0 m(j) (x) def (u − x)j = j! p βj (u − x)j . In the context of nonparametric regression. Suppose that (xi . the motivation for local polynomial regression is introduced. We want to estimate m(x) without imposing a parametric form on the regression function. Suppose that the regression function m(·) is smooth. and individual observations can have a great inﬂuence on remote parts of the curve in polynomial regression models. yi ). one may try to ﬁt the data by using a linear regression model. and Var(ε|X = x) = σ 2 (x). (5. Many applications and illustrations of local polynomial regression can be found in the statistical literature. For a given data set. local polynomial regression has not been a popular approach in modeling computer experiments.

2π Other popular kernel functions include the uniform kernel K(x) = 0. Intuitively.38) The smoothing parameter h controls the smoothness of the regression function. This suggests using a locally weighted polynomial regression n (yi − xi β)2 Kh (xi − x). for − 1 ≤ x ≤ 1. p 181 m(xi ) ≈ j=0 βj (xi − x)j = xT β. then the resulting estimate misses ﬁne features of the data. (xi − x)p ) and β = (β0 . In practice. one may use a © 2006 by Taylor & Francis Group. βp ) . and let h be a positive number called a bandwidth or a smoothing parameter. The choice of the bandwidth is of crucial importance. There are many reported results on bandwidth selection in the literature. i def where xi = (1.75(1 − x2 ). · · · . p) the minimizer of (5. Thus. Denote by βj (j = 0. Jones et al. (xi − x). Then an estimator for the regression function m(x) is ˆ m(x) = β0 (x). scaled in a canonical form as discussed by Marron and Nolan (1988). β1 .Metamodeling Thus. called a kernel function. If h is chosen too large. LLC . Let K(x) be a function satisfying K(x) dx = 1. The most commonly used kernel function is the Gaussian density function given by 1 K(x) = √ exp(−x2 /2).37) ˆ where Kh (xi − x) is a weight. data points close to x carry more information about m(x). · · · . ˆ It is well known that the estimate m(x) is not very sensitive to the choice ˆ of K. an estimator for the ν-th order derivative of m(x) at x is ˆ mν (x) = ν!βν (x). while if h is selected too small. (5. · · · . it is common to take the kernel function to be a symmetric probability density function.b) gave a systematic review of this topic.5. ˆ Furthermore. (1996a. and the Epanechikov kernel K(x) = 0. then spurious sharp structures become visible. i=1 (5.37). for − 1 ≤ x ≤ 1. for xi close enough to x.

4 x 0. and ﬁtted values.39) which is the solution of a local constant ﬁtting. local polynomial ﬁtting becomes kernel regression.182 Design and Modeling for Computer Experiments data-driven method to choose the bandwidth or simply select a bandwidth by visually inspecting the resulting estimated regression function. n i=1 Kh (xi − x) (5.5 y 0 −0.04 h=0. scatter plot of collected sample. in which Var(Y |X = x) = 0.5 −2 0 0. it can be applied directly for modeling computer experiments in which there is no random error. ﬁtted curve.5 0. While local polynomial regression was originally proposed for nonparametric regression in which error is assumed to present. 2.5 1 0.8 1 −2 0 0. When p = 1. the local polynomial ﬁtting is referred to as local linear ﬁtting.2 0. © 2006 by Taylor & Francis Group. The kernel regression estimator is given by mh (x) = ˆ n i=1 Kh (xi − x)yi .5 True h=0. This is one advantage of local linear regression over kernel regression.5 2 1.5 −1 −1.5 −1 −1. (a) Plot of the true model. or equivalently.4 x 0. but also its ﬁrst derivative.8 1 (a) (b) FIGURE 5.07 h=0.5 2 1. This estimator is also referred to as an NW-kernel estimator because it was proposed by Nadaraya (1964) and Watson (1963) independently. (b) Plot of local polynomial ﬁts with diﬀerent bandwidth. The local linear ﬁtting not only estimates the regression curve. We now illustrate the local polynomial techniques via application to the data in Example 18.5 y True Sample Fitted Curve Fitted Values 2.10 1 0.6 0.17 Plots for local linear ﬁt for data in Example 18. LLC .6 0. When p = 0.2 0 −0.

The local polynomial ﬁt using Gaussian kernel with h = 0. . Høst (1999) considered the following model y(x) = m(x) + z(x). 5. . · · · .7. LLC . Figure 5. because local polynomial regression usually does not yield an exact interpolation over the observed sample. . He further proposed using local polynomial regression to model the overall trend m(x) and then using the Kriging approach to interpolate residuals z(x). β = (β0 . .17(b) presents the local linear ﬁt using Gaussian kernel with three diﬀerent bandwidths. βs ) . . · · · . . We next demonstrate the importance of selecting a proper bandwidth. X = ⎝. . we have ˆ β = (X W X)−1 X W y © 2006 by Taylor & Francis Group. the idea of local polynomial regression in the previous section can be generalized to the multivariate case. 1 xn1 − x1 · · · xns − xs Then.04 is depicted in Figure 5. it follows that s m(u) ≈ β0 + j=1 ( ∂m(x) def )(uj − xj ) = β0 + βj (uj − xj ). . h1 hs and K(u1 . Applying the Taylor expansion for m(·) in a neighborhood of x. and illustrates that a larger bandwidth yields a ﬁt with a larger bias. yi − β0 − ⎭ h i ⎩ i=1 j=1 where Kh (xi − x) = 1 K h1 · · · hs x11 − x1 x1s − xs . For simplicity of presentation.17 (a). .··· . Kh (x1 − x)}. ⎠. . where m(x) is a smoothing function of x. · · · .Metamodeling 183 Example 18 (Continued) We reﬁt the data in Example 18 using local linear regression. ∂uj j=1 s (5. we limit ourselves to local linear regression.40) Local linear regression minimizes the following weighted least squares ⎫2 ⎧ n ⎨ s ⎬ βj (xij − xj ) K (x − x). us ) is an s-dimensional kernel function satisfying K(u) du = 1. and ⎞ ⎛ 1 x11 − x1 · · · x1s − xs ⎟ ⎜. and z(x) is residual. Denote W = diag{Kh (x1 − x).2 Metamodeling via local polynomial regression Due to the need for metamodeling in most experiments.

ˆ (5.8 Some Recommendations We conclude this chapter by giving some recommendations for metamodeling. ˜ (5. splines approach. Although the ˜ estimator m(x) is derived from nonparametric regression models. t) = exp{− k=1 θk |sk − tk |2 }.184 and Design and Modeling for Computer Experiments ˆ m(x) = e1 β = e1 (X W X)−1 X W y. From (5.43) where r(x) = (r(θ.39) and (5.41) where e1 is an s + 1-dimensional column vectors whose ﬁrst element equals 1 and whose others equal 0. Fur˜ thermore. and a(x) = (a1 (x). · · · . · · · . r(θ.8. which can be rewritten as n ai (x)r(xi ). x1 ). Since Kriging models. we ﬁrst present some connections among these modeling techniques.41). LLC .42) where wj (x) is viewed as a weight of the j-th observation at location x. an (x)) = ˆ ˆ R−1 (y − Bβ). if a correlation function is taken as s r(θ. xn )) . and local polynomial regression are related to one another. we can re-express the kernel regression and local polynomial regression estimator as n m(x) = ˆ i=1 wi (x)Kh (xi − x). 5.1 Connections Silverman (1984) has shown that smoothing splines are equivalent to variable kernel regression. i=1 (5. In particular. i = 1. which is a linear combination of y since β is a linear estimator of β. In this section. ˆ Gaussian Kriging approach linearly interpolates the residual y − Bβ in the following way: ˆ r (x)R−1 (y − Bβ). wj (x) is a linear combination of yi . we discuss how Kriging. x. x. s. · · · . neural network methods and local polynomial regression all are smoothing methods. 5. it can be ˆ applied directly to modeling computer experiments. n. radial basis functions. © 2006 by Taylor & Francis Group.

This is equivalent to using the radial basis function to interpolate the residuals. i=1 The least squares estimator for the βi s can be obtained and is a linear combination of the yi s. to avoid the optimization problem for θ over a high-dimensional domain. n . while local polynomial regression assigns the weights using the idea of local modeling. however. In the radial basis function approach. Therefore. and Kriging.2 Recommendations Simpson. Selection of smoothing parameters has been extensively studied in the statistical literature. it is common to set θj = θσj . including mainly polynomial models. the θk s are considered smoothing parameters. although it is computationally intensive. neural network methods. LLC . From the point of view of local modeling. The diﬀerence between these two methods is in how they assign weights to each observation. The radial basis function approach can also be viewed as a version of local modeling. where σj is the sample standard deviation of the j-component of x. y(x) can be 2 interpolated by n βi K{ (x − xi )/θ 2 }. In the Gaussian Kriging model. 5. the set of functions is linear independent. The cross validation procedure is an easily understand approach for selecting smoothing parameters. Polynomial models are primarily intended for regression with random error. they have been used widely and successfully in many © 2006 by Taylor & Francis Group. · · · .43) have the same form because the normalized constant in the kernel function does not matter.42) with a Gaussian kernel and (5.8. They recommended the following: 1.43) regarding K{ (x − xi )/θ 2 } as a kernel function.44) It is not diﬃcult to verify that for certain choices of K(·). (5. Thus. one constructs a basis as follows: K{ (x − xi )/θ 2 } : i = 1. (5.45) This has the same form as that in (5. Koch and Allen (2001) presented a survey and conducted some empirical comparisons of metamodeling techniques. In the Gaussian Kriging model.Metamodeling 185 then (5. Peplinski. the θk s are regarded as unknown parameters which can be estimated by maximum likelihood. the predictor based on the radial basis function is given by n y (x) = ˆ i=1 ˆ βi K{ (x − xi )/θ 2 }. and it results in a minimax estimate in a certain sense (Fan (1992)). The Gaussian Kriging method employs the property of multivariate normal distributions and yields a BLUP. such as K(t) = exp(− 1 t2 ).

the number of parameters in the regression spline model will dramatically increase as the number of factors increase. Since local polynomial regression is relatively new in the literature. Kriging may be the best choice in the situation in which the underlying function to be modeled is deterministic and highly nonlinear in a moderate number of factors (less than 50.edu/˜rli/DMCE.2). This implies that the regression splines method is recommended only in the presence of a small number of well-behaved factors. Kriging models. Polynomial modeling is the best established metamodeling technique. As described in Section 5. they are recommended for exploration in deterministic applications with a few fairly well-behaved factors. Simpson et al. The website is updated frequently. However. The regression splines method introduced in Section 5. and local polynomial regression. All examples in this chapter were conducted using MATLAB code. and is probably the easiest to implement. Although polynomial models have some limitations (see Section 5.186 Design and Modeling for Computer Experiments applications and case studies of modeling computer experiments.stat. 2. (2001) did not include this method. we would expect the performance of all these methods to be similar. 3. more numerical comparisons and further research are needed.3 can be used to improve polynomial models and is quite easy to implement. The code is free and available through the authors and is posted at www. LLC . From the connections among the radial basis function approach.psu. © 2006 by Taylor & Francis Group. say). Multi-layer perceptron networks may be the best choice (despite their tendency to be computationally expensive to create) in the presence of many factors to be modeled in a deterministic application.3.

While it may be easy to interpret low-order polynomial models using traditional ANOVA or by simply inspecting the regression coeﬃcients.1 Intro duction Computer models are often used in science and engineering ﬁelds to describe complicated physical phenomena that are governed by a set of equations. it would be difﬁcult to directly understand metamodels with sophisticated basis functions such as Kriging or neural networks. including linear. a general approach. • which input variables and which interactions contribute most to the output variability. however. focusing on the concept of sensitivity analysis. we want to rank the importance of the input variables and their interactions. ordinary. 6. The Fourier amplitude sensitivity test (FAST) is also discussed. for model interpretation is needed. For example. including Sobol’s indices and their extensions. 187 © 2006 by Taylor & Francis Group. We may. Therefore. are introduced. Section 1.6 Model Interpretation This chapter focuses on how to give reasonable interpretations for models in computer experiments. nonlinear. extending the idea of traditional ANOVA decomposition. LLC .e.. model validation. Global sensitivity analyses. Such complicated models are usually diﬃcult to interpret.7 stated that the investigator may be concerned with • how well the metamodel resembles the system or the true model. The so-called sensitivity analysis (SA) was motivated by this. be able to give a good interpretation using a metamodel built by some techniques introduced in the previous chapter. These questions are particularly important when we are interested in gaining insight into the relationships and eﬀects of input variables to the response variable. Sensitivity analysis of the output aims to quantify how much a model depends on its input variables. and partial diﬀerential equations. • whether we can remove insigniﬁcant input factors and improve the eﬃciency of the metamodel. i.

We introduce global sensitivity analysis in Section 6.1) where εi is random error. (6. Their least squares estimators are denoted by a. This chapter focuses only on global SA. It typically takes a sampling approach to distributions for each input variable. The aim of factor screening is to identify which input factors among the many potentially important factors are really important. yi ). Methods in SA can be grouped into three classes: • Factor screening: Factor screening is used in dealing with models in computer experiments that are computationally expensive to evaluate and have a large number of input factors. i = 1.188 Design and Modeling for Computer Experiments • which level-combination of the input variables can reach the maximum or minimum of the output y. · · · . bs . xis . in Section 6. 6. © 2006 by Taylor & Francis Group.1 Criteria There are many criteria for measuring the importance of factors in the model as well as each input variable. βs are unknown regression coeﬃcients. · · · . (2000). n} using a linear model yi = α + β1 xi1 + · · · + βs xis + εi . For the computation of the derivatives numerically.2 Sensitivity Analysis Based on Regression Analysis Suppose that a sample of the input variables is taken by some sampling strategy and the corresponding sequence of output values is computed using the model under analysis. Let us ﬁt data {(xi . yi ) = (xi1 . The following are some of them.2. Though this technique is borrowed from linear regression. the input parameters are allowed to vary within a small interval. β1 . · · · . 6. • Local SA: The local SA concentrates on the local impact of the input factors on the model and involves computing partial derivatives of the metamodel with respect to the input variables.3 based on functional ANOVA decomposition. A comprehensive review and detailed introduction to sensitivity analysis are found in Saltelli et al.2 we review how to implement the traditional ANOVA decomposition techniques to rank importance of marginal eﬀects. To easily understand the SA approach. · · · . • Global SA: Global SA focuses on apportioning the output uncertainty to the uncertainty in the input variables. the approach can be extended and applied to more sophisticated models. and α. b1 . A rich literature exists on sensitivity analysis. Global sensitivity indices can be applied to the metamodels discussed in the previous chapter to rank the importance of the input factors. LLC .

. written as SSR(x2 |x1 ): SSR(x2 |x1 ) = SSR(x1 . C. x2 ). SSR = ˆ i=1 yi . The diﬀerence (SSR(x1 . respectively. Coeﬃcient of multiple determination: Denote the total sum of squares. take x(1) = (x1 . Thus. B. the output is most sensitive to those inputs whose SRCs are largest in absolute value. Because all of the variables have been transferred into the same scale. · · · . and regression sum of squares of model (6. Let SSR(x(1) ) and SSE(x(1) ) be the regression sum of squares and the error sum of squares.2) where y and sy are the sample mean and sample standard deviation of y. LLC . More reasonable procedure concerns standardizing all the variables: yi − y xij − xj ¯ ¯ . j = 1. we deﬁne an extra sum of squares. For example. ˆ2 respectively. Let x(1) be a sub-vector of x.1) by n n n SSTO = i=1 (yi − y )2 . SSR(x2 |x1 ) = SSE(x1 ) − SSE(x1 . x2 ) = SSR(x1 ) + SSE(x1 ). x2 ). Note that SSTO = SSR(x1 . Standardized regression coeﬃcients: One approach to assess the importance of the input variables is to use their regression coeﬃcients. R2 = 1 − SSTO SSTO R2 provides a measure of model ﬁt. j = 1. The coeﬃcient of multiple determination or model coeﬃcient of determination is deﬁned as the positive square root of SSR SSE = . and ¯ xj and sj are the sample mean and sample standard deviation of xij . Partial sum of squares: The partitioning process above can be expressed in the following more general way. error sum of squares. x2 ) + SSE(x1 .Model Interpretation 189 A. We have SSTO = SSR + SSE. SSE = ¯ i=1 (yi − yi )2 . the corresponding regression coeﬃcients are called standardized regression coeﬃcients (SRCs) and are given by bj sj /sy . then SSR(x1 . s.1) ﬁts the standardized data. respec¯ tively. s. Model (6. x2 ) − SSR(x1 ). when only x(1) is included in the regression model. sy sj (6. © 2006 by Taylor & Francis Group. · · · . x2 )−SSR(x1 )) reﬂects the increase of regression sum of squares associated with adding x2 to the regression model when x1 is already in the model. Thus. x2 ) is the regression sum of squares when only x1 and x2 are included in the regression model.

· · · . at each step. Therefore. The diﬀerence (yk − y−k ) gives predictive error at xk .1) that is constructed on n − 1 data points except the kth data point. · · · . Forward stepwise regression chooses. It is clear that SSR(x(2) |x(1) ) = SSE(x(1) ) − SSE(x(1) . F.4) The partial correlation coeﬃcient between y and xi is deﬁned as the correlation coeﬃcient between yk − yk and xki − xki . x(2) ). For the removed ˆ ˆ yk . this determination can be based on the partial sum of squares. The PCC provides the strength of ˆ ˆ the correlation between y and xi after adjustment for any eﬀect due to the other input variables xl . xki = c0 + ˆ l=i cl xkl . let y−k be estimator of yk by the model. The predictive error of sum of squares (PRESS) is n ˆ deﬁned by PRESS = k=1 (yk − y−k )2 . l = i. xi−1 . the more important the variable xi . · · · . The PRESS based on the regression model with s − 1 variables except xi is denoted by PRESS−i . we have SSR(x(1) . xi+1 . we have the following notation: SSR(x(2) |x(1) ) = SSR(x(1) . In particular. x(2) ) − SSR(x(1) ). The larger the PRESS−i . E. x(2) ) = SSR(x(1) ) + SSR(x(2) |x(1) ).190 Design and Modeling for Computer Experiments This implies that SSR(x2 |x1 ) reﬂects the additional reduction in the error sum of squares associated with x2 . and pi is called the partial sum of squares of variable xi .1). given that x1 is already included in the model. s. k = 1. the most important variable under the current model. i = 1.3). Coeﬃcient of partial determination: From (6. Similarly. partial correlation coeﬃcients (PCCs) are useful. n. D. pi = SSR(xi |x−i ). Partial correlation coeﬃcients: When one wants to estimate correlation between y and input variable xi .3) where x(1) and x(2) are two non-overlaid sub-vectors of x. xs }. LLC . · · · . taking x(2) = xi and x(1) = x−i = {x1 . (6. the extra sum of squares SSR(x(2) |x(1) ) measures the marginal reduction in the error sum of squares when x(2) is added to the regression model. Consider the following two regression models: yk = b0 + ˆ l=i bl xkl . Predictive error sum of squares: Consider model (6. measures the importance of input variable xi to model (6. This can be viewed as decomposition of the regression sum of squares. given that x(1) is already in the model. (6. © 2006 by Taylor & Francis Group.

and SSR(xk xl |x0 . The extra sums of squares SSR(xk |x0 . We next compute SSR(xk |x0 . The original outputs y(k) are replaced by k.Model Interpretation 191 Similar to the coeﬃcient of multiple determination.e. This gives another way to motivate this criterion. which is given in the second row of Table 6. x1 . Due to the hierarchical structure of the quadratic polynomial model.2 An example Example 2 (Continued) Consider a quadratic polynomial metamodel: s s g(x) = αx0 + j =1 βj xj + j =1 γj x2 + j 1≤k<l≤s τkl xk xl . Based on the ranked data. The extra sums of squares SSR(x2 | k x0 . from which we can see that SSR(x6 |x0 ) has the greatest value among all the linear eﬀects. x6 ). deﬁne the coeﬃcient of partial determination as 2 rx(2) |x(1) = SSR(x(2) |x(1) ) . We now consider the quadratic eﬀects. sort y1 . we can ﬁnd that the interaction between x1 and x6 is the most important among all of the second-order terms (i. Firstly. we ﬁnd standardized regression coeﬃcients and partial correlation coeﬃcients that are called the standardized rank regression coeﬃcients (SRRC) and partial rank correlation coeﬃcients (PRCC). are small. for i = 1. A similar procedure applies to each input variable x1i . x6 )s are shown in the oﬀ-diagonal position of Table 6. where s = 8 and x0 ≡ 1. x1 is the most important linear eﬀect given that x6 is already in the model. The rank transformation replaces the data with their corresponding ranks. Thus.1. Rank transformed statistics provide a useful solution in the presence of long-tailed input and output distributions. x6 ). More exactly. x6 ) are listed in the diagonal position of Table 6.2. The extra sums of squares for the linear eﬀects SSR(·|x0 ) are listed in the ﬁrst row of Table 6. we have SSR(x0 ) = 128.7075. LLC . listed in the third row of Table 6.. From these values. respectively. x1 . yn from the smallest to the largest and denote the sorted data by y(1) ≤ y(2) ≤ · · · ≤ y(n) . · · · . xni . the quadratic terms and interaction terms). © 2006 by Taylor & Francis Group.2. G. we begin with the linear eﬀects. s. 6. SSE(x(1) ) 2 and call the positive square root of rx(2) |x(1) the partial correlation coeﬃcient. x1 .1. · · · . Criteria based on rank transformation: The rank transformation has been widely used in the construction of testing statistics due to its robustness.1. · · · . Thus x6 is the most important linear eﬀect.2.

1244 0.0392 0. we consider the following metamodel: g(x) = αx0 + β1 x1 + β6 x6 + τ16 x1 x6 .0533 0.1830 0. Finally.9177 0. This is consistent with the model found in Section 5.8978 TABLE 6. and we can ﬁnd SSR(x|x1 x6 ) = 0. x6 .0512 x3 0.6373 0.3737.0392 x6 4. x1 x6 .0571 0.0026 0.6278 0.9899 We further compute SSR(·|x0 . the marginal eﬀect of the interaction of x1 x6 is more signiﬁcant than that of x1 .2349 0.0785 0. x1 .1486 0. x6 . all of them are small.3657 x5 0.1618 0.6487 — x2 0. 8.4280 0.2200 x4 0.192 TABLE 6.2267 0. Thus. when x0 .2986 0.2200 0.0192 0.3969 0.5450 0. (2000)) applied the criteria introduced in this section to many models. (see Chapter 2 in Saltelli et al.5679 0. when x0 and x6 are already in the model.1250 0. x1 x6 ) for the variables that have not been included in the model. x6 .3805 0.0892 0. © 2006 by Taylor & Francis Group.2417 0.2624 for x = x1 .2 yields a very good model. SSR(x|x0 ) = 4. x6 ) x1 x2 x3 x4 x5 1.1 Design and Modeling for Computer Experiments Extra Sum of Squares SS x1 SSR( · |x0 ) SSR( · |x0 .2 Extra SS x1 x2 x3 x4 x5 x6 x7 x8 Sum of Squares SSR(xi xj |x0 . x6 ) 4.2831 x6 x7 x8 0.7432 — — x7 0.6487. x1 x6 ) = 2. x6 ) = 4.2200 0.1765 0. Their examples are very helpful for understanding the concepts introduced in this section.1230 0. x1 x6 in order.8978 0. x1 .2768.1.3003 0.1280 4. x1 .3576 0.0288 0. From the above analysis. the contribution of x1 to reducing variation is not as signiﬁcant as those of x6 and x1 x6 . Hence. x6 ) SSR( · |x0 .1032 0.0117 0.1280. this model was obtained by starting with x0 and then adding x6 .1658 0. From the ﬁrst row of Table 6.0543 0. LLC .2270 0.8905 0. In this case.6937 0.0075 0.7432 for x = x1 . Thus.8978 0.2485 0. 4. respectively.0848 0.0473 0.0392 0. we have SSR(x1 |x0 . However. we can calculate SSR(x|x0 .2.0887 x8 0. x6 and x1 x6 are already in the model. This implies that penalized least squares with the SCAD penalty discussed in Section 5.0778 5. x1 . Campolongo et al.

3. and due to orthogonality of the © 2006 by Taylor & Francis Group. It can then be shown (see. that T is the s-dimensional unit cube C s = [0. The method can be extended to decomposition of a metamodel by functional analysis of variance. jv ).3 Sensitivity Analysis Based on Variation Decomp osition The decomposition of sum of squares is the key point in analysis of variance (ANOVA). in this section.5) provides us with the same interpretation as that of a classical ANOVA model.it (xi1 . The term ANOVA is used here because the representation in (6.Model Interpretation 193 6. Sobol’ (2001. while gij (xi .j g(x) and so on. · · · . dxk = g0 + gi (xi ) + gj (xj ) + gij (xi . k = i1 . and so on. · · · .5) are orthogonal in the sense that gi1 . 1]s . k=i. · · · . LLC . it is assumed. gi (xi ) can be viewed as the main eﬀects. For example. The latter has played an essential role in experimental design and regression analysis.··· . g(x) k=i dxk = g0 + gi (xi ).··· .jv dx = 0. For simplicity of presentation. xj ) + · · · + g1···s (x1 . (6.iu gj1 . iu ) = (j1 . Then all the gi1 ···il in the decomposition (6. xs ). It follows from (6.6) Integrating (6. xit )dxk = 0.5) over C s we have g(x) dx = g0 . for instance. it . 2003)) that g(x) can be represented in the following functional ANOVA form: g(x) = g0 + i gi (xi ) + i<j gij (xi .1 Functional ANOVA representation Suppose that the metamodel g(x) is an integrable function deﬁned in the experimental domain T ⊂ Rs .6) that all summands in (6. (6. Assume further that g(x) is square integrable.5) are square integrable. xj ). 6.7) whenever (i1 . xj ) may be regarded as ﬁrst-order interaction eﬀects.··· .5) where 0 1 gi1 . (6. · · · . · · · .

© 2006 by Taylor & Francis Group. Thus.8). when the interest is in understanding the main eﬀects and the ﬁrst-order interactions.−i) = 1 − S−i . the equality Si1 ···ik = 0 implies that gi1 ···ik ≡ 0. · · · . Speciﬁcally. The integer k is often called the order or the dimension of the index (6. and xi . the total sensitivity index as proposed by Homma and Saltelli (1996) can be employed by partitioning the input vector.8) Si1 ···ik = 1. and rank the importance of input variables by sorting the Di s for the main eﬀects and Dij s for the ﬁrst-order interactions. as they were proposed in Sobol’ (2001). In the context of modeling computer experiments. Note that s D= k=1 i1 <···<ik Di1 ···ik .194 Design and Modeling for Computer Experiments decomposition in (6. k=1 i1 <···ik The indices Si1 ···ik are sometimes called Sobol’ indices. into two complementary sets x−i . Global sensitivity indices can be used to rank the importances of input variables. · · · . which is similar to the decomposition in the traditional ANOVA. xj ) dxi dxj +··· + Deﬁne the total variance and partial variances by D= 2 g 2 (x) dx − g0 and Di1 ···is = 2 gi1 ···is (xi1 . s. LLC . we have 2 g 2 dx = g0 + i 2 gi (xi ) dxi + i<j 2 g12···s (x1 . · · · . where x−i is the set excluding the ith variable and xi is the variable of interest. it may be enough to just compute Di and Dij . respectively. · · · . xis ) dxi1 · · · dxis . For a (piecewise) continuous function g(x). When one is interested in understanding the total eﬀect of each variable including its interactions with other variables. for i = 1. Deﬁnition 9 The ratios Si1 ···ik = are called global sensitivity indices. x. the functional structure of g(x) can be studied by estimating the indices.7). 2 gij (xi . All Si1 ···ik s are non-negative and their sum is s s Di1 ···ik D (6. STi = Si + S(i. xs ) dx1 · · · dxs . s and j = i + 1.

the theory of uniform design itself originated from quasi-Monte Carlo methods. LLC . estimates of the global sensitivity calculations are as follows. g1 (x1 ) = Therefore.2 Computational issues It is challenging to compute the integrals presented in the deﬁnition of Di1 ···ik . By its deﬁnition.Model Interpretation 195 where S−i is the sum of all Si1 ···ik terms which do not include the ith variable. let us begin with i = 1. N } which is space ﬁlling over the unit cube in Rs . © 2006 by Taylor & Francis Group. xN from a uniform distribution over the unit cube in Rs .ik (xj ). For ease of presentation. · · · . · · · . j =1 √ It can be shown that this estimate is consistent and possesses a n convergence rate. It is clear that g0 = ˆ 1 N N g(xj ). the experimental domain. ˆ2 j=1 Estimation of Di is a little more complicated. Methods of approximating integrals have been widely studied by mathematicians in the ﬁelds of number theory and computational mathematics. we can generate a set of points {xk . QuasiMonte Carlo methods can be used to improve the rate of convergence of the Monte Carlo approach. This index is adapted from the “freezing inessential variables” approach proposed by Sobol’ (1993). Hua and Wang (1981) and Niederreiter (1992).3. i. the simplest way to estimate Di1 ···ik is by using a Monte Carlo method: generate random vectors x1 .ik (xj ) j=1 converges to Di1 ···ik much faster than randomly generated xj s. the total fraction of variance complementary to variable xi . In practice. for example. Hence. k = 1. With properly chosen xj s. D1 = 2 g1 (x1 ) dx1 . j=1 1 ˆ and D = N N g 2 (xj ) − g0 . g(x) k=1 dxk − g0 ..e. 6. Then Di1 ···ik can be estimated by 1 N N gi1 ··· . In fact. the quantity 1 ˆ Di1 ···ik = N N gi1 ··· . Using quasi-Monte Carlo methods. See.

5ex3 −0. Moreover. · · · . uk = tks . · · · . In this example. We can further calculate the Sobol’ indices for the main and total eﬀects. tk(2s−1) ) .5)(x2 − 0. we can compute Di for i = 2. k = 1. estimating D1 involves (2s − 1)-dimensional integration. tk(s−1) ) . 2. N } ⊂ C 2s−1 over the experiment domain. tks ) .5) + 0. x3 ) = 0. k = 1. To implement the quasi-Monte Carlo method. Then 1 ˆ D1 = N N g((xk1 . we consider a simple metamodel g(x) = (x1 − 0. 1 © 2006 by Taylor & Francis Group. where xi = [0. and Furthermore. g3 (x3 ) = 0. LLC . 1] for i = 1.5) dx1 = −1 (x2 − 0. but in terms of total eﬀects all variables are equally important.196 which equals { = Design and Modeling for Computer Experiments g(x) k=1 2 dxk }2 dx1 − g0 2 g((x1 . ˆ D ˆ Similarly.3 summarizes these indices. x3 ) = 0. xk ) )g((vk . · · · . and vk = (tk(s+1) . uk = (tk2 . x2 ) = (x1 − 0. uk ) )g((xk1 . g23 (x2 . Table 6. ˆ2 j =1 ST 1 = 1 − ˆ D−1 . 3. · · · . vk ) ) − g0 .5 0 ex3 −1/2 dx3 = 1 1 0. g0 = 0. we may compute D−i and ST −i for i = 2. · · · . s.5 . · · · . g12 (x1 . we generate a set of points {tk .5) dx2 = 0. · · · . v ) ) dudvdx1 − g0 .5211 since 0 (x1 − 0. g1 (x1 ) = g2 (x2 ) = 0. u ) )g((x1 . we generate a set of points {tk .5). s.5 − g0 . g13 (x1 . Let xk1 = tk1 .5ex3 −0. N } ⊂ C (s+1) . The indices identify that x1 and x2 have no main eﬀect.5)(x2 − 0. xk ) ) − g0 . and vk = tk(s+1) . Therefore. ˆ2 j =1 ˆ Similarly. Example 20 As an illustration. where u and v are (s − 1)-dimensional vectors. To compute D−1 . Then 1 ˆ D−1 = N N g((uk . Let xk = (tk1 .

xs ) . y(xi )} = s exp(−θk |xk − xik |2 ). the Kriging model with a covariance function s Cov{y(s). the metamodels for computer experiments often have a particular form. It is clear that g(x) dx = β0 + j βj Bj (x) dx. let us begin with g(x) dx. and {xk . Note that the square of g(x) in (6. Jin. k=1 where x = (x1 . Thus. · · · .2378 0. © 2006 by Taylor & Francis Group. which allows us to derive a closed expression for the constants Di1 ···ik . and Bj (x) can be represented as s Bj (x) = k=1 hjk (xk ) for some functions hjk (· · · ).7622 Both Monte Carlo methods and quasi-Monte Carlo methods can deal with any square integrable functions.10). (6. For example.10) where ri (x) = Corr{y(x).3 197 Sobol’ indices for main and total eﬀects Variable x1 x2 x3 ˆ Main Eﬀects (Si ) 0. Chen and Sudjianto (2004) studied this issue in detail. y(t)} = σ 2 exp(− k=1 θk |sk − tk |2 ) can be represented by n g(x) = μ + i=1 βi ri (x).9) has the same form as g(x).2378 0. n} is a set of design points in which the response y(x)s are collected. The way to 2 compute gi1 ···ik dxi1 · · · xik s is similar. we summarize their procedure.7622 Total Eﬀects (ST i ) 0. Note that the metamodels introduced in the previous chapter have the following form g(x) = β0 + βj Bj (x). However. LLC .0000 0.Model Interpretation TABLE 6.9) j where βj s are some constants which may depend on the data but not x. (6. k = 1. Let us illustrate the procedure using the Kriging model in (6. In what follows. · · · .0000 0.

3.4. a = (a1 .3. j)-elements aij . In order to get a closed form for g 2 (x) dx. · · · . 1). βn )T . Here we only illustrate how to compute g 2 (x) dx. it suﬃces to get a closed form for ri (x) dx and ri (x)rj (x) dx.3 Example of Sobol’ global sensitivity We now conduct a global sensitivity analysis for the case study introduced in Section 5. s ai = def [0. 6. which can be easily computed by many software packages. © 2006 by Taylor & Francis Group. × k=1 Therefore. where β = (β1 . which converts the s-fold integral to a one-dimensional integral.1]s 2 g(x) dx = β0 + 2β a + β Aβ.198 It is easy to see that Design and Modeling for Computer Experiments s Bj (x) dx = k=1 hjk (xk ) dxk .1]s s 1 k=1 s 0 1 ri (x)rj (x) dx exp{−θk {(xk − xik )2 + (xk − xjk )2 } dxk exp(−2θk [{xk − (xik + xjk )/2}2 + (xik − xjk )2 /4]) dxk = = k=1 0 s = exp{− k=1 s θk (xik − xjk )2 /2} π/θk [Φ{(1 − (xik + xjk ))/2θk } − Φ(−(xik + xjk )/2θk )]. and A is an n × n matrix with (i. we have aij = def [0. an ). · · · .1]s 1 ri (x) dx = k=1 s 0 exp{−θk (xk − xik )2 } dxk 2π/θk [Φ{(1 − xik )/θk } − Φ(−xik /θk )]. We can further compute the lower-dimensional integrals. Similarly. [0. k=1 = where Φ(·) is the cumulative distribution function of the standard normal distribution N (0. LLC .

0000 0. for instance.4 tell us that the clearance.0000 0.0000 Ovality 0. as the global sensitivity index for its main eﬀect is about 3% of the total variation and its interaction with the clearance is about 4% of the total variation. this variable is a “noise” variable). TABLE 6.2102 6.4 also indicates that skirt length may also be an important variable.0000 0.3. tabulated in Table 6.0001 0.Model Interpretation 199 The main eﬀects gi (xi ) are depicted in Figure 6.0000 0.4. The strong interaction between clearance and pin oﬀset implies that we should adjust the pin oﬀset in such a way that it minimizes the the eﬀect of clearance on the output variable. We computed D = 2.0391 Proﬁle 0. as the sum of their global sensitivity indices (0. Denote by Y the response variable and Xi the i-th component of x.0000 0.4 Correlation ratios and extension of Sobol’ indices Many authors regard the input variables x as random variables. in this application.2102) is about 93% of the total variation. we deﬁne CRi = Var{E(Y |Xi )} Var(Y ) © 2006 by Taylor & Francis Group. Chan.0000 0.0001 0. and their interaction are most important variables. LLC . Note that.0000 Length 0. outputs of computer experiments are viewed as realizations of a random variable (see.0000 0. Saltelli and Tarantola (1997)). Table 6. Consider the following variance decomposition from McKay (1995) Var(Y ) = Var{E(Y |Xi )} + E{Var(Y |Xi )}. clearance variation due to manufacturing variability is a variable that engineers choose not to control (in robust design terminology.0012 0.4628 Press 0. Results in Table 6.2588+0. Motivated by this decomposition.4 Indices Si and Sij Factor Clearance Clearance 0.1. pin oﬀset. from which we can see that clearance and pin oﬀset have the strongest main eﬀects.0000 Pin Oﬀset 0. where Var{E(Y |Xi )} is called variance conditional expectation (VCE).0001 0.8328.0000 Length Proﬁle Ovality Pin Oﬀset 0. Si and Sij . where the diagonal elements correspond to Si and the oﬀ-diagonal elements to Sij .4628+0. and hence.0000 0.2588 Press 0.0276 0.

6844.4 FIGURE 6.8 Pin Offset 1. The solid line stands for g0 + gi (xi ). © 2006 by Taylor & Francis Group.5 2 Skirt Profile 2.1 Main eﬀects plots for piston noise.6 1 0.5 2 Skirt Ovality 2.2 1.200 Design and Modeling for Computer Experiments 60 59 58 57 Noise 60 59 58 57 Noise 56 55 54 53 52 0 56 55 54 53 20 60 40 Clearance 80 100 52 12 14 16 18 Press 60 59 58 57 Noise 60 59 58 57 Noise 56 55 54 53 52 21 22 56 55 54 53 23 Skirt Length 24 25 52 1 1. and the dots stand for the individual observations.5 3 52 0.5 3 60 59 58 57 Noise 60 59 58 57 Noise 56 55 54 53 52 1 56 55 54 53 1.4 0. where g0 = 56. LLC .

Note that when the joint distribution of input variables is the uniform distribution over the unit cube C s . · · · . where Var{g(x)} = g 2 (x)p(x) dx © 2006 by Taylor & Francis Group. s. LLC . · · · . · · · . it can be shown that the variance of g can be expressed as the summation of the eﬀect variances of gi1 ···ik (xi1 . xik ). k=i. The CRi is called the correlation ratio in McKay (1995). That is. · · · .2).5).j g(x)p(x−(ij) ) and so on. the correlation ratios CRi are equal to the Sobol’ indices Si deﬁned in Section 6. we have g(x)p(x) dx = g0 . (1993) discussed a modiﬁed version of the Hora and Iman (1989)’s approach which relates to Krzykacz (1990). s Var{g(x)} = k=1 i1 <···<ik Var{gi1 ···ik (xi1 . Jin et al. Note that the square root of VCE is not scale-invariant. Iman and Hora (1990) further suggested the use of Var[E{log(Y )|Xi }] LCRi = Var{log(Y )} as another measure of the importance of Xi . and p(xi1 . Based on the orthogonality feature of the above decomposition. xik )}. The idea of the correlation ratio can be extended to the partial correlation ratio. Hora and Iman (1989) advocated using the square root of the VCE as a measure of the importance of Xi . xik ) to be the marginal density function of (xi1 . Saltelli et al.3. See.Model Interpretation 201 for i = 1. It is clear that the CRi reﬂects the the proportion of variation of outputs explained by Xi . · · · . xik ). Denote p(x) to be the density function of x. Similar to the functional ANOVA decomposition in (6.1. dxk = g0 + gi (xi ) + gj (xj ) + gij (xi . paralleling the partial correlation coeﬃcient in linear models (see. The correlation ratios play a similar role in the nonlinear regression setting to that of the usual correlation coeﬃcient ρ for linear relationships between the output and the input variables (Kendall and Stuart (1979) and Krzykacz (1990)). Section 6. g(x)p(x−i ) k=i dxk = g0 + gi (xi ). The Sobol’ indices can be extended to the case in which x has a density function with supports C s . xj ). (2004). for example.

xik )} . deﬁned by a set of parametric equations xi (u) = Gi (sin(wi u)). Levine and Schuler (1978). computational issues. Reedijk (2000) provided a detailed description of FAST indices. LLC . The greatest advantage of the FAST is that the evaluation of sensitivity estimates can be carried out independently for each variable using just a single set of runs. · · · . · · · . © 2006 by Taylor & Francis Group. This provides insight into why the correlation ratio CRi can be used as a measure of importance of Xi .202 and Design and Modeling for Computer Experiments k Var{gi1 ···ik (xi1 . We can further deﬁne the generalized Sobol’ indices as GSi1 ···ik = Var{gi1 ···ik (xi1 . s. i = 1. s coincide with the correlation ratios CRi . 6. and applications. The main idea of FAST is to convert the s-dimensional integral of variance computation into a one-dimensional integral. In what follows. xik )p(xi1 . xik )} = 2 gi1 ···ik (xi1 . Again. and {wi } is a set of integer angular frequencies. the generalized Sobol’ indices GSi . In other words. i = 1. Under the assumption of independence. it is assumed that the domain of input variables is the s-dimensional unit cube C s . · · · . · · · . xik )} = 0.1. · · · . xik ) l=1 dxil as E{gi1 ···ik (xi1 . Var{g(x)} It is clear that when the joint distribution of x is the uniform distribution over the unit cube.3. proposed by Cukier.5 Fourier amplitude sensitivity test Another popular index to rank the importance of input variables is the Fourier amplitude sensitivity test (FAST).3. where u is a scalar variable with range R1 . we give a brief introduction of the FAST. This can be done by applying the ergodic theorem (Weyl (1938)) representing C s along a curve. for simplicity of presentation. · · · . The FAST indices provide a way to estimate the expected value and variance of the output and the contribution of individual input variables to this variance. · · · . s p(x) = i=1 pi (xi ). where pi (xi ) stands for the marginal density function of xi . the generalized Sobol’ indices are exactly the same as the Sobol’ indices deﬁned in Section 6. It is common to assume the input variables are independent.

the frequency assigned to xi so that the quantity Dwi Si = D is the fraction of the variance of g due to the input variable xi . The coeﬃcient Dwi can be interpreted as a “main eﬀect” in the model. the part of variance due to the input variable xi is Dwi . Further deﬁne ∞ Dwi = 2 p=1 2 (A2 i + Bpwi ). Gs (sin(ws u))) du. xs ) dx1 · · · dxs π −π 1 2π g(G1 (sin(w1 u)). w w=1 ∞ ≈2 (6. For the analysis of computer experiments. −π and x(u) = (G1 (sin(w1 u)). Following Cukier et al.Model Interpretation With proper choice of wi and Gi . · · · . the Dwi are referred to as the ﬁrst-order indices or classical FAST. Cukier et al. · · · .11) where Aw and Bw are the Fourier coeﬃcients. we have the following decomposition: D= Cs 2 g(x)2 dx − g0 2 (A2 + Bw ). LLC . one must choose Gi and wi . deﬁned by Aw = and Bw = 1 2π 1 2π π g(x(u)) cos(wu) du −π π g(x(u)) sin(wu) du. This representation allows us to apply Fourier analysis for g(x). it can be used to rank the importance of input variables. the fraction of the total variance due to interactions cannot be computed by this technique at present. we may simply take Gi (t) = 1 1 + arcsin(t) 2 π © 2006 by Taylor & Francis Group. Therefore. · · · . Gs (sin(ws u))) . To compute Si . Unfortunately. Therefore. (1978). pw The summation over p in the deﬁnition of Dwi is meant to include all the harmonics related to the frequency associated with the input variable xi . Using Parseval’s theorem. it can be shown (Weyl (1938)) that g0 = = def Cs 203 g(x1 . (1978) proposed a systematic approach for determining the search curve Gi (u) by solving a partial diﬀerential equation with boundary condition Gi (0) = 0.

1 0.9 1 x1 FIGURE 6. The easiest way to obtain an incommensurate set of frequencies is by using irrational numbers.3 0. The summations in the deﬁnition of D and Dwi are taken from 1 to ∞. In practice.2 Scatter plot of sampling points for two dimensional problem with {w1 . When the frequencies are incommensurate the generated curve is space-ﬁlling.2. 2 π The search curve drives arbitrarily close to any point x ∈ C s if and only if the set of frequencies wi is incommensurate. w2 } = {11. © 2006 by Taylor & Francis Group.9 0. s. which is actually an approximation.5 0. Dwi can be approximated by M 2 p=1 2 (A2 i + Bpwi ). 21}.8 0.204 Design and Modeling for Computer Experiments for all i = 1. which is usually taken to be 4 or 6 in practice.3 0. 1 0. · · · .7 0. a two-dimensional case generated from curves with w1 = 11 and w2 = 21 as shown in Figure 6. LLC . That is. pw where M is the maximum harmonic.4 0.5 0. See Reedijk (2000) for details. one may use integer frequencies. A set of frequencies is said to be incommensurate if none of them can be obtained by a linear combination of the other frequencies with integer coeﬃcients. As an example.2 0.8 0. due to limited computer precision.6 x2 0. Since the Fourier amplitudes decrease as p increases.6 0.1 0 0 0.4 0. xi (u) = 1 1 + arcsin(sin(wi u)).2 0.7 0.

This is the total eﬀect of any order that does not include the ith variable and is complementary to the variance of the ith variable. Instead of a polynomial model as described in Section 1.5 FAST Indices for Main and Total Eﬀects Variable x1 x2 x3 ˆ Main Eﬀects (Si ) 0. we use a Gaussian Kriging © 2006 by Taylor & Francis Group. By assigning w1 = 21 and w2 = w3 = 1.56 0. 2. we assign a high frequency wi for the ith variable and a diﬀerent low frequency value w−i to the remaining variables. we can calculate the partial variance D−i . For example. 3. LLC . and x3 . The indices identify that x1 and x2 have no main eﬀect. but in terms of total eﬀects all variables are equally important. By evaluating the spectrum at w−i frequency and its harmonic. consider a function with three variables. These frequencies contain information about interactions among factors at all orders not accounted for by the main eﬀect indices. the total variance due to the ith variable is DT i = D − D−i and the total eﬀect is STi = Si + S(i. Example 21 As a demonstration.−i) = 1 − S−i . x2 .00 0.5.44 Total Eﬀects (ST i ) 0. we can calculate the the total eﬀect of x1 . x1 . they allow a full quantiﬁcation of the importance of each variable.5.Model Interpretation 205 Saltelli et al. we consider a simple quadratic polynomial metamodel: g(x) = x1 x2 + x2 3 where xi = [−1. 1] for i = 1. (2000) proposed extended FAST (EFAST) to not only calculate the main eﬀects but also total eﬀects to allow full quantiﬁcation of the importance of each variable.56 0.3. The calculated FAST indices for the main and total eﬀects are summarized in Table 6.44 6. Consider the frequencies that do not belong to the set pwi .5.6 Example of FAST application Consider again the borehole example presented in Section 1.00 0. TABLE 6. Similar to Sobol’ indices. where S−i is the sum of all Si1 ···ik terms which do not include the term i. Note that while the total eﬀect indices do not provide full characterization of the system of 2s − 1 interactions. To calculate the total eﬀects.

0.0878.0269. 143. This result is in agreement with that of traditional ANOVA decomposition from polynomial models as discussed in Section 1. Hu .0093} for which rw has the largest main eﬀect followed by Hu and Hl . 0. 0. 0. 0. r. 0. Hl . Kw }. k The FAST total eﬀects for the variables are {0.0001.8257.5. 0. 59. only the ﬁrst wmax /2 components are used in calculating D∼i .0000.0268. the orders of importance suggested by the main and total eﬀects are about the same.0850. 0. 0. 0. Tu . 0. The computation for the total eﬀects is conducted using wi = 21 and w i = 1.0866. the FAST method is applied to the metamodel with the set of frequencies {19. Thereafter. This result indicates that the eﬀects are dominated by the main eﬀects. L. To avoid potential overlap. 0. 0.0411. Tl . 0. 91.0414. © 2006 by Taylor & Francis Group.0000.0451}.0390. 157} corresponding to the eight variables {rw . wmax /2 D∼i = 2 k=1 2 (A2 + Bk ).206 Design and Modeling for Computer Experiments metamodel to ﬁt the data. LLC .8798. 113.0270. 149. In this case. The FAST main eﬀects for these variables are {0. 133.

Three classes of models. spatial temporal models. Functional data can be in the form of one-dimensional data such as a curve or higher dimensional data such as a two. With the advent of modern technology and devices for collecting data. there is little literature on modeling of computer experiments with functional response. Ramsay and Silverman (1997) present many interesting examples of functional data and introduce various statistical models to ﬁt the functional data. outputs of computer models are often presented in a higher dimensional image to gain better physical insight.7 Functional Response This chapter concerns computer experiments with functional response. Figure 7. Similarly. we provide a more sophisticated approach to naturally analyzing functional responses which may suggest more insightful conclusions that may not be apparent otherwise. With the advent of computer model and visualization. and functional data analysis is becoming popular in various research ﬁelds. and we want to know how the factors aﬀect the shape of the resulting curves. LLC . This 207 © 2006 by Taylor & Francis Group. Here. and modeling computer experiments is typically limited to point outputs. Let us introduce two typical examples of functional response in computer experiments. people can easily collect and store functional data.1 shows an output of a computational ﬂuid dynamics mode for cylinder head port ﬂow simulation in an internal combustion engine. and partially functional models. along with related estimation procedures. computer experiments with functional responses have increased in complexity leading up to today’s sophisticated three-dimensional computer models. functional linear models. each 20 seconds over a period of an hour. and a total of 6 outputs were collected for each design condition. For example. will be introduced. The output was collected over diﬀerent rotations per minute (RPMs). Example 22 below is a typical example of functional response with sparse sampling rate. However.or three-dimensional image. Some case studies are given for illustration.1 Computer Exp eriments with Functional Resp onse Functional data are data collected over an interval of some index with an intensive sampling rate. 7. suppose that data for each experiment unit were collected.

Example 23 is a typical example of functional response with intensive sampling rate. this model is used to evaluate structural response during operating conditions as a function of engine RPM and frequency. and because the response is several outputs of one variable rather than outputs of several variables. Tournour. e To optimize the design of the cylinder block. LLC .g. we refer to it instead as functional response because the data could conceivably have been collected over the whole interval of RPM. The vibroacoustic relationship between the engine vibrations and the acoustic pressure ﬁeld may then be applied to calculate the radiated engine noise (see. Vibration and Harshness) Computer models are commonly employed in the design of engine structure to minimize radiated noise. El Masri.. Cremers. The response was recorded for every degree of crank angle over [0. as well as combustion forces.1 Cylinder head port ﬂow generated by a computational ﬂuid dynamic model. Felice and Selmane (2001)). response is similar to that of growth curve models in the statistical literature (see. for example. 17 design variables (e. Together with the ﬁnite element structural model of the engine component or system (see Figure 7. 500]. G´rard. Although one may view such output as multi-response. The model includes multi-body dynamic simulation to estimate acting mechanical forces during engine operations from the cranktrain and valvetrain. Pan and Fang (2002)).208 Design and Modeling for Computer Experiments FIGURE 7. Example 22 (Engine Noise. bulk- © 2006 by Taylor & Francis Group. for example.2 below).

TABLE 7.. head thickness.Functional Response 209 (a) (b) FIGURE 7. The outputs are presented in Table 7. and (b) engine block ﬁnite element model.. 2000.g..1) were chosen. which directly inﬂuences the radiated noise through the vibro-acoustic relationship. the selected response variable is the structural response of the oil pan ﬂange in terms of acceleration at various engine RPMs (e.5).2 Engine models for Example 22: (a) engine system ﬁnite element model.. oil pan rail thickness.3. 1000.2...e. Name x1 Bulkhead thickness x2 Oil pan rail thickness x3 Skirt wall thickness x4 Main cap thickness x5 Liner thickness x6 Valley wall thickness x7 Bore wall thickness x8 Valley plate thickness x9 Skirt ribbing Var. In this example.e. valley wall ribbing. LLC . 6000 RPM). x10 x11 x12 x13 x14 x15 x16 x17 x18 Name Side wall ribbing RFOB ribbing FFOB ribbing Valley wall ribbing Bearing beam Dam with value 1 Dam with value 2 Den Young modulus © 2006 by Taylor & Francis Group. etc..1 Name of Design Variables Var. response is a vector of numbers) or functional response (i. a detailed list is given in Table 7. This experiment can be treated either as multi-response (i.. Figure 7. response is a sample of points from a function) since for each run six responses can be ﬁtted by a curve (cf. A uniform design with 30 runs was used for the experiment and is presented in Table 7.

One indicator of a stable valvetrain is that the valve movement must follow a prescribed motion determined by the camshaft proﬁle. engineers attempt to minimize motion errors by ﬁnding the best level-combination of the factors. where the the spring behavior becomes highly nonlinear (see Figure 7. engineers synthesize the design of components in the valvetrain system (e. The model is very crucial for guiding valvetrain performance optimization while maintaining a stable valvetrain dynamic behavior.4. various computer-aided engineering models are simultaneously employed to achieve the best balance of performance and durability attributes. particularly to minimize valve bounce during valve closing. noise and vibration.4 shows the motion errors of the valve compared to the prescribed motion by the camshaft for the ﬁrst four designs listed in Table 7. and emissions. Detailed threedimensional geometric information as well as other physical properties such as valve seat. cam proﬁle. Design optimization is performed to maximize performance capability while maintaining the durability of the system. spring. especially at high engine speeds..) to meet the intended performance target.210 Design and Modeling for Computer Experiments Example 23 (Valvetrain) The valvetrain system is one of the most important elements of internal combustion engines for the delivery of desired engine performance in terms of horsepower/torque. Table 7. The model can be used to study typical valvetrain system characteristics in terms of durability. and valve sealing. instead of a single value. Multi-body dynamic computer models are commonly employed to optimize the dynamic behavior of a valvetrain. fuel economy. To achieve such goals. hydraulic lash adjuster. Schamel and Meyer (1989)).g. the response variable (i. spring. During the optimization process. LLC . we can see that the variability of the motion errors varies for various design conﬁgurations. etc. such as cylinder head stiﬀness. and lash adjuster dynamics are included in the models. noise. it is reasonable to model the data in Example 22 using linear models for repeated measurements. From Figure 7. To achieve this goal. however. A general framework for modeling is necessary. thus. cam proﬁle. For Example 23. as the eﬀect of inertia on the dynamic behavior becomes prominent. etc. A perfectly stable valve train should have a straight line or zero error throughout the crank angles. it is considered a functional response. cam-follower hydrodynamic. Figure 7. one is interested in understanding the eﬀects of each design variable on the motion errors.e. this may not be the case.4. spring (spring surge and coil clash). © 2006 by Taylor & Francis Group. camshaft torsional and bending.3 below) in an internal combustion engine (Philips.. In practice. amount of motion error) is deﬁned over a crank angle range.4 depicts an experimental matrix which was applied to the computer model to minimize the motion error. This criterion is especially crucial during valve closing to minimize valve bounce. In this situation. nonparametric and semiparametric modeling techniques may be used to construct a good predictor for the functional response. rocker arm stiﬀness. In this case. At high speed valvetrain operations.

5 20.5 20.91 2.5 20.63 2.5 2 5.5 10 6 6 10 x9 1 0 1 0 0 0 1 0 1 1 1 1 1 0 1 0 1 0 0 1 0 0 0 1 1 0 1 0 0 1 x10 0 1 1 0 1 0 1 1 1 0 0 1 1 1 0 0 0 0 1 0 1 0 1 1 0 1 1 0 0 0 x11 1 1 0 0 1 0 1 0 0 1 0 0 1 1 0 0 1 0 0 0 0 1 1 1 1 0 0 1 1 1 x12 0 1 1 0 1 1 0 0 1 1 0 0 1 0 1 0 1 0 1 1 0 0 1 0 0 1 1 0 1 0 x13 1 1 0 0 1 0 0 1 1 0 0 0 0 1 1 1 1 1 0 0 0 1 0 1 0 1 1 0 0 1 x14 0 1 0 1 1 1 1 0 0 1 1 0 0 0 0 1 0 0 1 0 0 1 1 1 0 1 1 0 0 1 {x15 .5 8.49 73.8 77.5 25.8 70.5 2 2 5.5 3.5 20.11 77. LLC .5 18.49 77.5 22.5 3.49 70.11 73.11 70.77 2.5 8.11 70.TABLE 7.63 2.5 2 5.5 9 9 9 7 9 5.8 73.11 Functional Response 211 © 2006 by Taylor & Francis Group.5 24.5 5 7.5 7.5 22.5 3.77 2.5 22.11 73.5 3.5 24.5 2 2 3.5 25.5 13 6 6 3.5 5.5 25.5 7 7 9 7 5.5 20.5 9 7 5.5 5 8.5 18.5 5.77 2.5 20.11 70.63 x18 77.5 20.77 2.8 70.5 24.91 2.5 22.5 18. x16 } 3 2 1 3 1 1 2 3 3 2 3 1 1 2 2 2 2 1 3 2 2 2 3 3 1 1 3 1 3 1 x17 2.8 77.5 22.5 25.5 20.2 Design of Engine Noise Vibration and Harshness Run # 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 x1 25.5 2 3.5 5.5 5.91 2.5 25.5 18.5 x5 5.5 x4 22.77 2.5 5.5 5.49 77.91 2.49 73.8 70.63 2.5 6.5 5.5 22.5 7 x8 8.5 x2 9 9 21 13 17 9 13 17 17 9 9 13 13 13 9 21 21 23 21 17 23 23 23 23 21 17 13 21 17 23 x3 7.5 13 13 3.5 6.5 x6 9 13 11 13 6 4 13 13 9 11 11 6 13 4 6 9 11 11 4 9 11 13 6 6 9 4 4 6 4 9 x7 5.8 73.77 2.5 24.5 25.11 77.5 18.5 7 7 5.5 22.5 18.5 2 2 3.77 2.5 5.5 3.5 8.5 5 6.77 2.5 24.5 25.5 18.5 3.5 9 8.91 2.63 2.5 8.91 2.5 18.5 20.63 2.5 24.91 2.5 9 7 9 9 7 5.8 77.63 2.5 3.11 77.5 25.8 70.5 13 13 8.5 9 5.5 22.63 2.5 3.5 25.49 70.5 22.11 73.5 20.5 24.5 18.5 13 10 8.8 73.5 9 7.5 25.5 24.5 2 3.77 2.5 25.5 10 6 6 8.49 73.5 5 6.91 2.5 5 6.5 9 7.5 24.5 22.5 8.5 18.91 2.49 70.49 73.63 2.5 24.5 20.5 5.63 2.49 77.5 5 9 9 6.5 3.5 22.5 18.5 7.5 24.5 20.77 2.5 9 8.91 2.5 10 10 3.5 18.5 24.

77 14.81 9.06 18.11 0.15 8.14 20 0.42 13.33 13 0.48 9.37 15 0.71 28 0.51 14.10 0.81 24 0.02 3.77 19 0.16 6 0.10 0.54 15.92 16.89 4 0.49 1.01 17.41 1.10 0.14 3.23 3.04 10.13 3.09 0.39 1.99 23 0.09 0.39 1.95 14. LLC .72 30 0.28 8.42 1.35 21 0.42 1.40 1.01 2.84 8.17 8.19 9.12 3.46 9.41 1.10 0.79 18.27 3.44 1.3 Response of Engine Noise Vibration and Harshness Run # 1000 2000 3000 4000 5000 6000 1 0.45 1.46 9.16 3.11 0.17 3.31 12.11 0.62 19.72 18.39 18.23 3.42 1.08 17.10 0.12 0.09 22 0.76 18.63 17 0.09 0.20 10 0.72 11.14 3.48 1.52 9.24 8.02 15.38 1.42 © 2006 by Taylor & Francis Group.52 14.44 1.05 3.89 12 0.09 14.19 3.50 10.67 8.60 17.27 4.09 0.01 3.20 3.16 9.10 3.76 17.10 0.35 9.70 9.32 18 0.95 10.11 0.13 8.83 14 0.42 1.43 1.09 0.17 7 0.33 9.40 1.10 0.16 8.15 3.01 3.92 14.03 26 0.32 3 0.46 1.39 1.76 16.09 8.02 11 0.09 0.46 1.10 0.17 3.06 3.41 1.07 3.212 Design and Modeling for Computer Experiments TABLE 7.40 1.09 0.47 25 0.39 1.09 0.20 4.76 14.39 1.44 1.48 27 0.92 9 0.09 8.63 9.39 16.62 13.04 3.10 0.09 0.11 3.54 16 0.41 8.73 8 0.10 0.38 1.41 1.09 0.42 1.07 3.17 14.03 3.15 8.06 29 0.46 9.73 2 0.46 1.13 3.10 8.74 14.95 14.41 7.10 0.46 17.10 3.30 19.76 5 0.43 1.14 3.10 0.11 0.10 0.10 0.

4.04 −0.04 Motion Errors 0.06 −0.08 −0.4 Valve motion errors of the ﬁrst four valvetrain designs listed in Table 7. LLC . 0.Functional Response 213 FIGURE 7.02 −0.08 0.1 0.06 0.3 Roller ﬁnger follower valvetrain system. © 2006 by Taylor & Francis Group.1 50 100 150 200 300 250 Crank Angle 350 400 450 500 FIGURE 7.02 0 −0.

LLC . © 2006 by Taylor & Francis Group. therefore. there are many possibilities. e. Diﬀerent models propose diﬀerent structures for the mean function. All these three models can be written in a uniﬁed form: y(t. the functional linear model. We introduce partially functional linear models and illustrate their applications in Section 7.1) where μ(t. Section 7. x). In the classic case of growth curve modeling.214 TABLE 7. x) = μ(t.g. x) + z(t.4 Design and Modeling for Computer Experiments Design for Valvetrain Experiment Head RA Lash Cam Stiﬀness Stiﬀness Adjuster Phasing 1 1 1 1 2 1 1 1 1 2 1 1 2 2 1 1 1 1 2 1 2 1 2 1 1 2 2 1 2 2 2 1 1 1 1 2 2 1 1 2 1 2 1 2 2 2 1 2 1 1 2 2 2 1 2 2 1 2 2 2 2 2 2 2 Clearance 1 1 2 2 2 2 1 1 2 2 1 1 1 1 2 2 Spring Height 1 2 1 2 2 1 2 1 2 1 2 1 1 2 1 2 Ramp 1 3 3 1 2 2 2 2 2 2 2 2 3 1 1 3 A general framework for functional response Various statistical models have been proposed for functional data in the statistical literature. x are the input variables. in Example 23. but. Section 7. Section 7. we introduce the spatial temporal model.2 introduces the spatial temporal model and a related estimation algorithm. (7.5. z(t. t is time. and the partially functional linear model. In this chapter. one needs diﬀerent ways to estimate the parameters involved in the model. and t stands for some index. x) is the mean function of y(t.. x) is a random term with mean zero.4 presents functional linear models and their estimation procedures. x).3 gives an overview of the penalized regression splines method which is used to estimate functional linear models and partially functional linear models. in general. t stands for the crank angle.

xu ). xv )}. y(t. where θ t represents the unknown parameter in the correlation function r.3) where Ψ is a J × J positive deﬁnite matrix. and z(t. In practice. μ(tJ . for example. Let J denote the number of outputs for each run.2) where μ(t. · · · . Newton. When all θ j s are the same. x) in (7. LLC . Chen. the response is a vector of outputs of one response variable over diﬀerent values of t. it views the output of a computer experiment as a realization of a Gaussian random ﬁeld. x). x) is a Gaussian random ﬁeld indexed by t and x. Denote θ = (θ 1 . x) = μ(t. The spatial temporal model is a generalization of the Gaussian Kriging model and has been widely used in the literature of spatial statistics (see. · · · . One may further assume that the correlation function has the correlation structure introduced in Section 5. Then model (7. v)-element is r(θ t .Functional Response 215 7. © 2006 by Taylor & Francis Group. (7.4 for the spatial temporal model. 1). x) is a Gaussian random ﬁeld (GRF) with mean 0. xv ) = Corr{y(t. xi ). Carroll. x). Let us use matrix notation for succinctness of presentation. xu . x) + z(t. Thus.1 Functional response with sparse sampling rate Consider a spatial temporal model for functional response with sparse sample rate.2 Spatial Temp oral Mo dels The spatial temporal model regards the mean function μ(t. Schmiediche.1) as an overall trend and assumes that z(t.4.2) can be represented as Y = M + R−1/2 ZΨ−1/2 . We can extend the estimation procedure and prediction procedure of the Gaussian Kriging model in Section 5. Li. 7. j)-element y(tj . Denote Y and M to be n × J matrices with (i. we assumed that m can be represented or approximated by m(x) = b (t. and R is a correlation matrix whose (u. respectively. x) is the mean function of y(t. Thus. Let m(x) = (μ(t1 . θ J ) and R = R(θ). The spatial temporal model extends the Gaussian Kriging model by incorporating index-direction correlation by considering y(t. x)). as in Example 22. then θ can be simply set to be θ 1 . x)β. Wang and George (1997)).2. (7. Z is an n × J random matrix with all elements being independent variates on standard normal distribution N (0. x). xi ) and μ(tj .

where B = (m(x1 ) . Let y(x) = (y(t1 . · · · . Numerical algorithms such as the Newton-Raphson or Fisher scoring algorithms may be used to search for the solution. we update β using (7. (7.5) β = (B R−1 (θ)B)−1 B R−1 (θ)Y. x)) with t = (t1 . LLC . for a given R. 1 ˆ ˆ ˆ Ψ = (Y − Bβ) R−1 (θ)(Y − Bβ). tJ ) and β is an L × J unknown parameter matrix. y(tJ . · · · . such as the NewtonRaphson algorithm and the Fisher scoring algorithm. the least squares estimator of β. x)β + r(θ. we have the following algorithm.4) Using the theory of linear models. Therefore the mean matrix has the following form M = Bβ. x) = (b1 (t. · · · . we can predict the response at the unobserved site x.6) n The maximum likelihood estimator for θ t does not have a closed form. · · · .6) and update θ by solving the following equations ∂ (θ)/∂θ = 0. x)). to solve the score equation. Then the prediction of y(x) is ˆ ˆ ˆ ˆ ˆ y(x) = b (t. x)R−1 (θ)(Y − Bβ). where ˆ ˆ ˆ (θ) = log(|R(θ)|) + tr{R−1 (θ)(Y − Bβ)Ψ−1 (Y − Bβ) }. m(xn ) ) is an n × L known matrix. x). we need numerical iteration algorithms. Step 4: Iterate Step 2 and Step 3 until it converges. Step 3: For a given θ. In summary. In this step.1: Step 1: Set the initial value of β to be (B B)−1 BY. and © 2006 by Taylor & Francis Group. After estimating the unknown parameters. model (7. Thus. Step 2: For a given β.4). (7.3) can be written in a form of linear model: Y = Bβ + R−1/2 (θ)ZΨ−1/2 . x). we update Ψ using (7. Speciﬁcally. Algorithm 7.216 Design and Modeling for Computer Experiments where b(t. ˆ (7. bL (t. Note that the ith-row of M corresponds to m(xi ). we can have an explicit form for the maximum likelihood estimate of β and Ψ if R is known.

r(θ n . © 2006 by Taylor & Francis Group.Functional Response where r(θ. 217 3 2 1 log(Engine Noise) 0 −1 −2 −3 1000 1500 2000 2500 3000 3500 4000 Acceleration(RPM) 4500 5000 5500 6000 FIGURE 7. x)). The following response variables are deﬁned to analyze the rate of increase of EN as the engine speed increases. x) = (r(θ 1 . ˆ Furthermore. · · · . and indicates that the EN increases approximately exponentially as the acceleration increases. 5000RPM). r (x)) y 0 B B R(θ)−1 b(x) r(x) Ψ. LLC . xn .5 Plot of log(engine noise) versus engine acceleration. EN for short) at diﬀerent speeds.5 depicts the plot of the logarithm of EN versus the acceleration (RPM). Example 22 (Continued) We are interested in studying the impact of the design variables on the ratio of responses (engine noise. Deﬁne y1 y2 y3 y4 y5 = log(EN = log(EN = log(EN = log(EN = log(EN at at at at at 2000RPM)−log(EN 3000RPM)−log(EN 4000RPM)−log(EN 5000RPM)−log(EN 6000RPM)−log(EN at at at at at 1000RPM). 2000RPM). Figure 7. 3000RPM). x1 . the covariance matrix of y(x) may be estimated by Cov{ˆ (x)} = 1 − (b (x). x). 4000RPM).

y5 ).4787 0. 17. The resulting estimate is ˆ μ = (1.1072 5. LLC . the mean function of the residuals is typically close to zero. one may estimate the mean function μ(t. |u − v|). zt (v)} = σt r(θ t .218 Design and Modeling for Computer Experiments Let y = (y1 .4 and 7. · · · . v) = exp(− k=1 θk |uk − vk |2 ).1120.5139 −0. The above estimation procedure is applied for the model.7) where zt (x) = z(t.1786 0. we can conduct sensitivity analysis for each component of the response vector using the methods described in Chapter 6. 14. · · · . 0.5470 −0.5582). In this section. θ9 .1072⎠ 0.0640 0. Ψ ⎜ ⎟ 1000 ⎝ −0.2. def This yields a spatial temporal model y(t. θ2 .3) with M = 1n × μ and μ = (μ1 .1579 −1. 7.6537 4.2455 ⎟ ⎜ ⎟ ˆ = 1 ⎜−0. and then apply the spatial temporal model for the resulting residuals. We can obtain a predictor for untried sites using the resulting spatial temporal model.1475. As usual.5. 3. θ18 ) = 1 (8. 79.2182 0.2431.1561 −2. (7. 9.1561⎟ . Consider model (7.2182 −1. θ5 .5859. one may consider a simple case in which μ(t. 63. θ6 . x) is a Gaussian random ﬁeld with mean 0.0640 0. 0. 1. x) has the following structure: L μ(t. x) = 0. x) = b(x) β t + zt (x). x) using nonparametric smoothing methods as described in Sections 7.4464. Thus. x) = l=0 Bl (x)βl (t) = b(x) β t . The elements of R(θ) are assumed to have the following form: s r(θ.8234 −2.7816).5969 −0. and we assume its covariance function has the following form: 2 Cov{zt (u). θ17 .4787 ⎜−0.4517 and ˆ ˆ ˆ ˆ ˆ ˆ ˆ ˆ ˆ (θ1 .2912. © 2006 by Taylor & Francis Group. u. 2.9898.1786 3.6537 −0. we assume μ(t.5969 −0. θ8 . θ13 .7130. 4.5139 0.3692.2 Functional response with intensive sampling rate In practice. Furthermore. 100 ˆ and all other θs are less than 10−10 .9799. ⎛ ⎞ 1. μ5 ). 0.3003.4729.6829 −0.2455 −0.

θ t for t = tj . apply the estimation procedure described in Section 5. · · · . J. tJ ]. LLC . for a given t. Step 2: Calculate prediction y(tj . Thus. t = tj . x).e. y(tJ . · · · .Functional Response 219 where the parameter θ t depends on t. this procedure is expected to work well. xi )}. y(tj . x). σt . · · · . y (tJ . y(tj . The penalized splines method was proposed for nonparametric regression which is assumed to present random error by Eilers and Marx (1996) and Ruppert and Carroll (2000). Algorithm 7. · · · . 7. x) for t ∈ [t1 . model (7. ˆ ˆ2 ˆ Step 3: Obtain the prediction y(t. In this chapter. σt .4 and the semiparametric regression models discussed in Section 7. We further predict y(t. · · · . yi ). (7.4. n. We will illustrate this estimation procedure in Sections 7.. x).8) © 2006 by Taylor & Francis Group.5. x) at the unobserved site x and over the whole interval [t1 . we will systematically use the penalized splines method due to its ease of implementation and lower computational cost. therefore.7) with data {xi . and θ t using the the estimation procedure described in Section 5. Let us start with a simple nonparametric regression model. n} is a random sample from yi = μ(ti ) + εi . we can estimate β t .16). xi ) is the output associated with the input (tj . n and j = 1. n and estimates μt . x) at t = tj and the unobserved site x. · · · . Sudjianto and Zhang (2005). and obtain ˆ ˆ2 ˆ estimates β t . Parts of the material in this section and next section are extracted from Li.5.19) based on data {xi . · · · . · · · . θ t . When the sampling rate is intensive. Suppose that y(tj . i = 1.7) can be viewed as a universal Gaussian Kriging model (5. x) using (5. xi ).2: Step 1: For j = 1. J. x) at unobserved t and x by using a linear interpolation of y(t1 .3 Penalized Regression Splines Nonparametric regression methods (mainly splines smoothing and local polynomial regression) can be employed for estimating the nonparametric smoothing coeﬃcients in the functional linear models introduced in Section 7. Let us summarize the above procedure as an algorithm for calculating a prediction of y(t. xi )}. Suppose that {(ti . the gaps between two consecutive index values are small. J. x). i. For t = tj . · · · . i = 1. i = 1. We also obtain the BLUP for y(t. i = 1.4 and 7. tJ ] using linear interpolation ˆ of y (t1 . j = 1. σ 2 (t). ˆ The above algorithm is easily implemented for those who have used Gaussian Kriging models for modeling computer experiments.4 for model (7.

Ruppert and Carroll (2000)) advocate instead the use of penalized splines methods. some authors (Eilers and Marx (1996). βK+p+1 ] is given by ˆ β = (B B + λD)−1 B y. Thus.25)). + i ⎩ ⎭ i=1 j=1 k=1 k=1 where λ is a tuning parameter. To solve these problems. · · · . Therefore. (t − κ1 )p . (t − κK )p ] + + which is a (K + p + 1)-dimensional column vector. · · · . and μ(t) is an unspeciﬁed smooth mean function. (t − κ1 )p . In other words. Thus.220 Design and Modeling for Computer Experiments where εi is a random error with zero mean. The penalized splines approach is to estimate βj by minimizing the following penalized least squares function: ⎧ ⎫⎤2 ⎡ p n K K ⎨ ⎬ 2 ⎣yi − β0 + βj tj + βk+p (ti − κk )p ⎦ + λ βk+p . which is described below. mainly because of collinearity. ˆ © 2006 by Taylor & Francis Group. t. · · · . the penalized least squares estimate of β = [β0 .8) is approximated by p K yi ≈ β0 + j=1 βj tj + i k=1 βk+p (ti − κk )p + εi . such as cross validation or generalized cross validation. and denote B = [b(t1 ). 0. t. yn ) and D is a diagonal matrix with the ﬁrst p + 1 diagonal element being 0 and other diagonal elements being 1. which can be selected by data driven methods. D = diag{0. · · · . (t − κK )p + + where a+ = (a + |a|)/2. + (7. Deﬁne b(t) = [1. μ(t) can be estimated by ˆ μ(t) = b(t) β. · · · .9) is very challenging. one selects signiﬁcant terms in (7. tp . To estimate μ(t). In the presence of a large number of basis terms. However. the approximation to μ(t) in (7. · · · . · · · . κK }: 1. LLC . Over-parameterization causes some problems. model (7. tp . This method is termed the regression splines approach (Stone (1985)).9) may be over-parameterized in order to avoid large modeling bias. the most severe of which is that the resulting least squares estimate of μ(t) may have large variance and yield a very poor prediction. b(tn )] which is an n × (K + p + 1) matrix. 1}.9) using some variable selection procedure to linear regression model. · · · . 1.9) The ordinary least squares method can be used to estimate the coeﬃcient βj s. t2 . we expand μ(t) using a set of power truncated splines basis with a given set of knots {κ1 . the positive part of a (see (1. where y = (y1 . variable selection for model (7. That is. · · · . · · · .

where t ∼ U (0. the GCV score is deﬁned as GCV(λ) = and we may select λ to be ˆ λ = min GCV(λ). Thus. but piecewise continuous third-order derivatives. ˆ which is the linear combination of yi . To demonstrate © 2006 by Taylor & Francis Group. the generalized cross validation (GCV. for this example. which corresponds to a cubic spline. Craven and Wahba (1979)) method can be used to select the tuning parameter. and how to choose the knots. LLC . Thus. our focus is on estimating the mean function μ(t.6 depicts the resulting estimates of penalized spline with tuning parameter selected by GCV. two natural questions arise here: how to determine the order p. In practical implementation. Thus. we simulate a random error in order to view the error as the portion of the model that cannot be explained by the mean function. the knot κk is taken to be 100k/(K + 1)-th sample percentile of ti s for unevenly distributed ti . In practice. Thus. Throughout this chapter. As to the choice of knots. 1) and ε ∼ N (0. Empirical studies show that the resulting penalized splines estimate is not very sensitive to the choice of knots or the order p.Functional Response 221 A crucial question for penalized splines is how to select the tuning parameter. this kind of estimator is termed a linear smoother. x). one usually sets p = 3. Figure 7.3 and 7. if ti is evenly distributed over an interval. Cubic spline has continuous second-order derivatives. In Sections 7. then the knots can be simply taken to be evenly distributed over the interval. Pλ = B(B B + λD)−1 B can be viewed as a projection matrix and e(λ) = trace(Pλ ) as a degree of freedom (or eﬀective number of parameters) in the model ﬁtting. Note that ˆ y = B(B B + λD)−1 B y. Note that μ(t) = b(t) (B B + λD)−1 B y. p is taken to be 3. For linear smoothers. 1).4. λ ˆ y − Bβ 2 1 n {1 − (e(λ)/n)}2 The minimization is carried out by computing the GCV score over a set of a grid point for λ. Alternatively. we generate a random sample of size 50 from the model y = sin(2πt) + ε. Example 24 As an illustrative example.

and true function.25 Estimated μ(t) 1.6 0. and dotted lines are the penalized spline estimate. LLC .e. a stronger penalty on the coeﬃcients) for K = 30 than for K = 10. (a) K=10 (b) K=10 5 1. the x in our setting can be © 2006 by Taylor & Francis Group.222 Design and Modeling for Computer Experiments the impact of choice of knots.2 0 GCV scores 0 0.1 −5 0. Indeed.6 also depicts the resulting spline estimate without penalty. but it yields a very under-smoothed estimate when K = 30.15 1.4 t (c) K=30 1.25 Estimated μ(t) 1. the GCV selects a larger λ (i. From Figure 7.6 0. we compare the resulting estimate with K = 10 and 30. The spline estimate with K = 10 oﬀers reasonable results. Furthermore.4 Functional Linear Models Due to the curse of dimensionality (Bellman (1961)).8 1 1. it is very challenging to estimate a multivariate surface.4 t 1.15 1.8 1 1.2 0.2 0.6 Spline and penalized spline estimates. As an illustration.. Figure 7.05 −15 −10 log(λ) (d) K=30 −5 0 5 1. This implies that spline regression without penalty may be sensitive to choice of knots. respectively. In (a) and (c). spline estimate. dashed.1 −5 0.2 0 GCV scores 0 0. solid. (b) and (d) are the plot of GCV scores versus the values of log(λ). we can see that the resulting estimate is not sensitive to choice of knots.05 −15 −10 log(λ) −5 0 FIGURE 7.6. 7.

we can see that the estimates of both β2 (t) and β3 (t) vary over the index t. Example 25 In this example. β2 (t) = sin(2πt). Figure 7. Rice and Wu (2001). Fully nonparametric models cannot directly be applied for estimating μ(t.7 shows that the mean function in this example is semiparametric.1). x) = β0 (t) + x β(t). I3 ). Nair. and J = 80. and references therein). Here we consider β0 (t) = β1 (t) = 1.4. the least squares method can be used to estimate β0 (t) and β(t).7(a) and (b) show that the estimates for both β0 (t) and β1 (t) hover around 1. for i = 1.11) is a linear regression model.1) with the mean function (7. (7. x2i . Let us demonstrate the idea by a simulated data set. Hastie and Tibshirani (1993).10) This model enables us to examine the possible index-varying eﬀect of x. Wu and Yang (1998). · · · . From Figure 7. x3i ) ∼ N3 (0. The plot of the resulting estimate can serve as a graphical tool for checking whether the coeﬃcient functions really vary over the index t. and because it becomes a linear model for given t. Taam and Ye (2002). Rice. x) = β0 (t) + x β(t) + z(t. namely. The least squares estimate for the coeﬃcient functions is depicted in Figure 7. Chiang. · · · . for example. Thus. Chiang and Hoover (1998). 1).10) becomes y(t.11) For given t. Hoover. LLC . (7. Wu. Functional linear models have been employed to analyze functional data and longitudinal data in the statistical literature (see. 7. x) in (7. Fan and Zhang (2000). This implies that the true coeﬃcient functions are index-dependent.7(c) and (d). Wu and Zhou (2002). xi = (x1i . (7. and β3 (t) = 4(t − 0. J with n = 50. so a certain structure should be imposed on μ(t. Figure 7. Faraway (1997).5)2 . and is referred to as a functional linear model because its coeﬃcients are assumed to be smooth functions of t. Furthermore. and εi (tj ) ∼ N (0.7.12) where tj = j/(J + 1). xi and εi (tj ) are independent. while the others © 2006 by Taylor & Francis Group.1 A graphical tool Model (7.Functional Response 223 discrete variables. n and j = 1. x). we consider allowing the coeﬃcient to vary over t μ(t. we generate a typical random sample from the following model: yi (tj ) = β0 (tj ) + β1 (tj )x1i + β2 (tj )x2i + β3 (tj )x3i + εi (tj ). x). (7. As a natural extension of linear regression model. This indicates that the true coeﬃcients likely are constant. some coeﬃcients are index-invariant. Huang.

.2 0.6 1.7 Least squares estimates of the coeﬃcient functions.5 2 −0.8 1 t t (c) LSE of β2(t) 1. calculate the least squares es˜ ˜ ˜ ˜ timates of β0 (t) and β(t).5 3 0 −1 −1.5 β (t) β (t) 0 0.6 0.5 1 0.2 (b) LSE of β1(t) 1.4 0.2 β0(t) β1(t) 1 1 0.4 1.4 0.4 0. where d is the dimension of β(t). · · · .8 1 −0.11) is an example of a semiparametric regression model. LLC . · · · . ˜ ˜ ˜ Step 2 Use penalized spline to smooth β0 (t). Note that the varying-coeﬃcient model (7. Thus.6 0.5 0 0. 7.5 0 0. βd (t) over t componentˆ ˆ ˆ ˆ0 (t) and β(t) = (β1 (t).. their eﬃciency can be dramatically improved by smoothing over t. (a) LSE of β0(t) 1.6 0.5.4 1.4 0 0.4 0.2 0. · · · only use the data collected at t. and therefore.6 0. βd (t)). these models will be systematically introduced in Section 7.2 Eﬃcient estimation procedure The least squares estimates of regression coeﬃcients β0 (t).6 0.224 Design and Modeling for Computer Experiments are index-variant. denoted by β0 (t) and β(t) = (β1 (t). This example demonstrates the usefulness of the graphical tool. wise. we propose a two-step estimation procedure to β0 (t) and β(t) as follows: Step 1 For each t at which data were collected.2 0..8 0. βd (t)).8 1 0 0. Denote the smoothed estimates as β © 2006 by Taylor & Francis Group.6 0.8 0. β1 (t)..4 0.8 1 t t FIGURE 7. β1 (t). The smoothing method can eﬃciently estimate the coeﬃcients using data collected not only at t but also at other places.5 1 (d) LSE of β3(t) 1.2 0.6 1.4.

6 0. 1).13) yi (tj ) = β0 (tj ) + β1 (tj )xi + εi (tj ).8 1 FIGURE 7. J = 80. tj ∼ U (0.6 0.2 0.Functional Response (a) β 1. Solid line is the smoothed estimate. This estimation procedure can be easily implemented. · · · . 1). The twostep estimation procedure was proposed for longitudinal data by Fan and Zhang (2000). · · · . Before we demonstrate our application. we generate a typical random sample from the following model: (7. ti and εi (tj ) are independent.5)2 which are depicted in the dotted line in Figure 7. LLC . Moreover. and β1 (t) = 4(t − 0.2 0.4 t 0. Furthermore.5 0 −0. where xi ∼ U (0. Dotted line is the true value. J with n = 50.5 −1 −1. let us give a simulation example to illustrate the estimation procedure. it allows diﬀerent coeﬃcient functions using diﬀerent tuning parameters. n and j = 1. Furthermore.4. and εi (tj ) ∼ N (0. xi .5 1 β1 0. © 2006 by Taylor & Francis Group.4. From Figure 7. for i = 1.4 t (b) β1 0. Example 26 In this example.5 0 225 0 0 β −0.1. we take β0 (t) = sin(2πt).8 Penalized spline estimates for the coeﬃcient functions.5 1 0. although they use local polynomial regression in the second step. in which the smoothed estimates are also displayed as the solid line.5 0 0.5 0 0. 1). we can see that the estimate and the true curve are very close.8 1 1.1. This demonstrates that the proposed estimation method performs very well.

4.16. it is interesting to investigate why Step 2 is necessary. In other with the simple least squares estimate (SLES) β words. we present the box-plot of log10 (RMSE) in Figure 7. To demonstrate the comparison for each simulation.5 presents a global comparison of β and β.0011) β1 Mean (sd) 0. MSE(β) = J j =1 ˆ ˜ In order to make a direct comparison between β and β.0214 (0. we consider the ratio of MSE (RMSE) which is deﬁned as RMSE = ˜ MSE(β) . the RMSE is about 100. from which we can see that the smoothed estimate performs much better than the simple least squares estimate. This is easily explained since the simple least squares estimate uses only data at tj to estimate β(tj ). The sample mean and standard deviation of the 5000 MSE for each coeﬃcient are depicted in Table 7. ˆ MSE(β) TABLE 7. Table 7. we calculate the MSE and RMSE.0204 (0. 7.0019 (0. we apply the proposed functional linear model and its estimation procedure for the valvetrain system discussed in the beginning of this © 2006 by Taylor & Francis Group.5. while the smoothed estimate uses the full data set to estimate ˆ ˜ β(tj ).13).5 Summary of Mean Squared Errors β0 Mean (sd) ˜ SLSE(β) 0. and for each simulation data set.0036) ˆ Smoothed Est. LLC .0014 (0. Let us introduce ˆ some notation. which means that the smoothing in Step 2 reduced MSE by two-thirds in the worst case. (β) 0.0011) We now generate N = 5000 simulation data sets from model (7.5 = 3.9.3 An illustration In this section.226 Design and Modeling for Computer Experiments ˆ It is interesting to compare the smoothed estimate β obtained in Step 2 ˜ obtained in Step 1. Deﬁne the mean squared errors (MSE) for an estimate β of β to be J 1 ˆ ˆ (β(tj ) − β(tj ))2 .0056) 0. from which we can see that even in the worst case. the smoothed estimate reduces MSE by 90% relative to the simple least squares estimate. Overall.

x) = β0 (t) + β1 (t)x1 + · · · + β8 (t)x8 + z(t. x7 = 0. chapter. the response variable (i. from which we can see that the variability of the motion errors varies for various design conﬁgurations. similarly.Functional Response 3 227 2.e.4. engineers attempt to minimize motion error by adjusting the following design variables: • cylinder head stiﬀness (HS) • rocker arm stiﬀness (RS) • hydraulic lash adjuster (LA) • cam phasing (CP) • clearance (Cl) • spring stiﬀness (SS) • ramp proﬁle (Ra) To do this. we consider the following model: y(t. x7 = 1 when the ramp is set at level 1. one must understand the eﬀect of each design variable on motion errors. we are interested in studying the dynamic behavior of the valve closing event (crank angle from 360o to 450o ) at a high speed (i. Speciﬁcally. Denote HS by x1 .4 are shown in Figure 7. we use two dummy codes for this factor.5 1 2 FIGURE 7. In this case.. LLC . and so on. Since there are three levels for ramp. In this example.9 Boxplot of the log ratio MSEs. x). otherwise. A computer experiment with 16 runs is depicted in Table 7.5 log10(ri) 2 1. In this study. © 2006 by Taylor & Francis Group.5 1 0. The motion errors for the ﬁrst four designs listed in Table 7.. A perfectly stable valve train should have a straight line or zero error throughout the crank angles. 5500 rpm). x8 = 1 when ramp is set at level 2.4. is observed during a closing event which is deﬁned over a crank angle range. x8 = 0. otherwise.e. In particular. instead of being a single value. motion errors).

and the dashed line and dash-dotted line are the simple least squares estimate and smoothed estimate for β8 (t). respectively.228 (a) β0 Design and Modeling for Computer Experiments (b) β1 6 2 4 1. LLC .10 Estimated coeﬃcient functions within the crank angle region (360.5 1 0 0 −0. In (h).5 −2 0 −4 −0. respectively.5 4 0 2 −0. respectively.5 −2 360 380 400 420 Crank Angle 440 −2 360 380 400 420 Crank Angle 440 (g) β6 1 6 (h) β7 and β8 0. In (a)–(g). 450) using the functional linear model for the valvetrain data.5 −6 360 380 400 420 Crank Angle (c) β2 440 −1 360 380 400 420 Crank Angle (d) β3 440 2 2 1 1 0 0 −1 −2 −1 −3 360 380 400 420 Crank Angle (e) β4 440 −2 360 380 400 420 Crank Angle (f) β5 440 2 1 0. dotted and solid lines are the simple least square estimate and smoothed estimates.5 2 1 0 0. © 2006 by Taylor & Francis Group.5 0 −1 360 380 400 420 Crank Angle 440 −2 360 380 400 420 Crank Angle 440 FIGURE 7.5 −1 −1 −1. the dotted and solid lines are the simple least squares estimates and smoothed estimate for β7 (t).

and 14) are presented in Figure 7.10. ˆ ˆ Figure 7. After estimating the coeﬃcient functions. we further compute the ﬁtted values for all runs.2.10.1 Motion Error 0 Motion Error 380 400 420 Crank Angle (c) Run 10 0. in which the simple least squares estimates are displayed as dotted lines.2.11. Fitted values for four typical runs (Runs 2. From Figure 7. The resulting estimates are depicted in Figure 7.2 for the residuals yielded obtained by the simple least squares approach and the estimation procedure introduced in Section 7.12(b).05 0 −0. The solid line is FLM estimates. 10. we can see that the smoothed estimates follow the trend of the simple least squares estimates very well. this is because this approach does not involve smoothing and hence provides an unbiased estimate for μ(t).05 −0. we can see that μ(t) = 0 ˆ ˆ for residuals obtained by simple least squares approach. 6. LLC . We can see from Figure 7.05 360 360 380 400 420 Crank Angle 440 FIGURE 7.1 Motion Error 0 Motion Error 380 400 420 Crank Angle 440 0. respectively.05 0. The estimate of γ(t) based © 2006 by Taylor & Francis Group.11 Valve motion errors compared to FLM estimates. The functional linear model ﬁts the data quite well.10 that all coeﬃcient functions vary dramatically with angle. The proposed estimated procedure is used to estimate the coeﬃcient functions.12(a). in which y(t. and the dotted line is CAE simulations.05 0 −0.1 0. The resulting estimate σ 2 is displayed in Figure 7.13 displays the resulting estimate γ (t). We apply the spatial temporal model proposed in Section 7.Functional Response (a) Run 2 (b) Run 6 229 0.05 0. From the ﬁgure.05 360 440 360 380 400 420 Crank Angle (d) Run 14 440 0. The resulting estimate μ(t) is depicted in Figure 7. x) is set to be 100 times the amount of motion error.1 0. The scale transformation is merely to avoid rounding error in numerical computation.4. from ˆ which we can ﬁnd that σ 2 (t) are very close for these two diﬀerent residuals.05 −0.

5.3.230 Design and Modeling for Computer Experiments on the residuals of the FLM model hovers around that based on residuals of the simple least squares estimate. LLC .5 Semiparametric Regression Mo dels In the previous section. j = 1. + Thus.15) Thus. 7. for example. Speckman (1988).5. n be the input vector. Rice and Weiss (1986). In Section 7. An illustration is given in Section 7. J. The partially linear model is deﬁned as model (7. · · · . In Section 7. and references therein). we introduced functional linear models. This leads us to consider semiparametric regression models for the mean function. and yij be the output collected at tj . the most frequently used semiparametric regression model in the statistical literature.1. κK }: p K α(t) ≈ α0 + k=1 αk tk + k=1 αk+p αk (t − κk )p . Fan and Li (2004).2. This model has been popular in the statistical literature (see. · · · . A direct approach to estimating α(t) and β is to approximate α(t) using a spline basis for given knots {κ1 . (7. i = 1. we consider partially functional linear models and introduce their estimation procedure. some coeﬃcient functions in the functional linear models change across t. we introduce the partially linear model.1 Partially linear model Let xi . © 2006 by Taylor & Francis Group.1) with yij = α(tj ) + xi β + z(tj . corresponding to the input xi . p K yij ≈ α0 + j=1 αk tk + j k=1 αk+p αk (tj − κk )p + xi β + z(tj . xi ). Heckman (1986). + (7. We will further introduce two estimation procedures for the partially linear model. xi ). Engle.5. 7. · · · . Commonly.14) This partially linear model naturally extends the linear regression model by allowing its intercept to depend on tj . Granger.5. the least squares method can be used to estimate αj and β. Lin and Ying (2001). while others do not. Direct approach.

2 Estimate 1 0.2 0.15 −0.3 360 370 380 390 400 410 Crank Angle 420 430 440 450 (a) Estimated σ (t) 1.6 0.05 −0.6 1.4 0.8 0.1 −0. and the dotted line is the estimate based on residuals of the SLSE estimate.15 0.05 0 Estimate −0.2 −0.25 −0.1 0.4 1. The solid line is the estimate based on residuals of the FLM model.2 0 360 370 380 390 410 400 Crank Angle 420 430 440 450 (b) FIGURE 7. © 2006 by Taylor & Francis Group. LLC .Functional Response 231 Estimated μ(t) 0.8 2 1.12 Estimate of μ(t) and σ 2 (t).

13 Estimate of γ(t). © 2006 by Taylor & Francis Group. LLC .232 Design and Modeling for Computer Experiments (a) Estimated γ (t) 1 (b) Estimated γ (t) 2 4 6 3 Estimate 5 4 2 3 1 2 1 0 0 360 6 380 400 420 440 360 6 380 400 420 440 (c) Estimated γ3(t) (d) Estimated γ4(t) 5 Estimate 5 4 4 3 3 2 2 1 1 0 0 360 6 380 400 420 440 360 6 380 400 420 440 (e) Estimated γ5(t) (f) Estimated γ6(t) 5 Estimate 5 4 4 3 3 2 2 1 1 0 0 360 6 380 400 420 440 360 6 380 400 420 440 (g) Estimated γ7(t) (h) Estimated γ8(t) 5 Estimate Estimate 5 4 4 3 3 2 2 1 1 0 0 360 380 400 420 Crank Angle 440 360 380 400 420 Crank Angle 440 FIGURE 7. The solid line is the estimate of γ(t) based on the residuals of the FLM model. and the dotted line is the estimate of γ(t) based on the residuals of the SLSE estimate.

we are able to calculate partial residuals rj = ¯ ˆ y·j − x β. For each i. LLC . xi ).14). The partial residual approach (Speckman (1988)) is intuitive and may be easily implemented. i=1 y= ¯ 1 J J y·j . rJ ). i=1 zj = ¯ 1 n n z(tj . xi is ﬁve-dimensional. where ej ∼ N (0. ¯ ¯ and therefore. Further εij s are mutually independent for diﬀerent i. ¯ j =1 and zi = ¯ 1 J J z(tj . from model (7. xn − x) and yc = (y1· − y . Example 27 We generated 5000 data sets.16) where tj = j/(J + 1) with J = 80. © 2006 by Taylor & Francis Group. xi ). i=1 Then. ¯ ¯ Furthermore. each consisting of n = 50 samples. yij = α(tj ) + xi β + εij . 1). and each component of xi is independent and identically distributed from Bernoulli distribution with success probability 0. ¯ yij − y·j = (xi − x) β + {z(tj . xi ) − z (tj )}. yn· − y ) . After ¯ ¯ we obtain an estimate of β. Several other estimation procedures have been proposed for the partially linear model. a least squares estimate for β is ˆ β = (Xc Xc )−1 Xc yc ¯ ¯ where Xc = (x1 − x. ¯ yi· − y = (xi − x) β + (¯i − z ). i=1 ¯ x= 1 n n xi . we take ρ = 0. (7.3.1 and 0. from the following partially linear model. ¯ ¯ z ¯ where yi· = ¯ 1 J J yij . · · · .5. and thus we obtain an estimate of α(t). ¯ y·j = α(tj ) + x β + z (tj ). εij is an AR(1) random error: εij = ρεij−1 + ej . Denote r = (r1 .5. We next illustrate some empirical comparisons between the direction approach and the partial residual approach. · · · . Then we smooth rj over tj using penalized ¯ splines. In our simulation. · · · .Functional Response 233 Partial Residual Approach. Denote y·j = ¯ 1 n n yij . j =1 Thus. α(t) = sin2 (2πt) and β is a 5-dimensional vector with all elements being 1. introduced in Section 7.

066) (a) ρ=0.037) 0.2 0.5 0.5. We found that the results for the direct approach and the partial residual approach are identical for this example.1 0.4 0. and the dotted lines are the true intercept function The mean and standard error of the resulting estimates in the 5000 simulations are summarized in Table 7.998(0. 7. the partially linear model is a special case of the partially functional linear model. However.998(0.002(0.1 0.9 1 1. Let {wi .2 0. in general.8 0.5 0. corresponding to input © 2006 by Taylor & Francis Group.4 0.2 1 0. Figure 7. LLC .065) 0.6 0.6 Mean and Standard Deviation ρ β1 β2 0.4 0.2 1 0. In fact.037) 1.037) 0.1 1. the partial residual approach may result in a more eﬃcient estimate than the direct approach.2 0 0.6.037) 0.998(0.8 α(t) 0.3 0. Solid lines stand for the resulting estimate.1 0.6 0.999(0.2 0 0.8 0.14 gives plots of a typical estimate of α(t).001(0.001(0.999(0.2 Partially functional linear models The partially functional linear model is a natural combination of functional linear model and linear regression model.066) β3 1.234 Design and Modeling for Computer Experiments TABLE 7.066) β5 1.9 1 FIGURE 7.4 0.6 0.14 Estimated intercept function.067) β4 0.3 0.037) 1.999(0.6 0.5 t (b) ρ=0.2 0 −0.5 t 0.7 0.8 α(t) 0.001(0.2 0 −0. xi } consist of input variables and yij be the output collected at tj .7 0.

Step 4 (Iteration) Iterate Step 2 and Step 3 until the estimates converge. LLC . Since the initial estimate obtained in Step 1 will be a very good estimate for α(t).17) is used to ﬁt the data. xi . xi ). wi . and estimate β using the least squares approach. and apply the estimation procedure for functional linear models to estimate α(t). n is a sample from a computer experiment. we may set the ﬁrst component of wi to be 1. ˜ Thus. 1975 for other settings). 235 (7. Suppose that {tj . The back-ﬁtting ∗ algorithm is intuitive. and compared with the proﬁle least squares approach (Fan and Huang (2005)) for model (7. let yij = yij − wi α(tj ). But their implementation may be a little complicated.Functional Response {wi . The proposed estimation procedure can be summarized as follows: Step 1. Step 3. xi }. and that model (7. We have ˜ yij = xi β + z(tj . Compared with the direct estimation procedure. Set the resulting estimate to be the initial values of α(t). we can use the least squares approach to estimate β. then ∗ yij = wi α(tj ) + z(tj . xi ). On the other hand. Step 2. (Estimation of α(t)) Substitute β with its estimate in model (7. © 2006 by Taylor & Francis Group. (Estimation of β) Substitute α(t) with its estimates in model (7. For a given β. Here we introduce a back-ﬁtting algorithm.4. xi ). The partially functional linear model is deﬁned as follows: yij = wi α(tj ) + xi β + z(tj . and the estimation procedure for such a model can be used to estimate α(t).17). · · · . The back-ﬁtting algorithm iterates the estimation of α(t) and the estimation of β until they converge. Alternatively. Both the direction estimation procedure and the proﬁle least squares approach for partially linear models can be extended to model (7. wi . (Initial values) Estimate α(t) using the estimation procedure of the functional linear model proposed in Section 7. The resulting estimate will be as eﬃcient as that obtained by a full iteration (see Bickel.17). i = 1. for a given α(t). which is a functional linear model. wi . the back-ﬁtting algorithm is easier to implement in practice.17) To include an intercept function. the back-ﬁtting algorithm yields a more accurate estimate for β. we may stop after a few iterations rather than wait until the iteration fully converges. wi . regarding the β as constant. deﬁne yij = yij − xi β. we may set the ﬁrst component of xi to be 1 in order to introduce a constant intercept in the model.17). yij }.17).

From Figure 7.236 Design and Modeling for Computer Experiments Example 25 (Continued) The true model from which data are generated in this example is a partially functional linear model. Compared with the least squares estimate displayed in Figure 7.7.2.1. 5.15.16 from which we further consider the following partially functional linear models y = β0 (t)+β2 x1 +β3 x2 +β4 x3 +β5 x4 +β5 x5 +β6 x6 +β7 (t)x7 +β8 (t)x8 +z(t.2857x6 + β7 (t)x7 + β8 (t)x8 . σ 2 (t). The resulting estimate ˆ μ(t).20.0006x4 + 0. and the resulting model is given by y = β0 (t) − 0.4. The algorithm converges very quickly.0306x1 + β2 (t)x2 + β3 (t)x3 .18. Figure 7. Let us apply the backﬁtting algorithm described above for the data we analyzed before. The resulting simple least squares estimates are depicted in Figure 7. respectively. LLC . and γ (t) are depicted in Figure 7. we can see the reasonably good ﬁt to the data. we consider the data collected over crank angles from 90◦ to 360◦ . ˆ ˆ ˆ where β2 (t) and β3 (t) are depicted in Figure 7. β7 (t).3 for the data and use the simple least squares approach to estimate the coeﬃcients. 9.18 depicts four typical ones (Runs 1. we illustrate the partially functional linear model and its estimation procedure using the valvetrain example presented in Section 7.2409x5 ˆ ˆ ˆ ˆ −0. In other words. The back-ﬁtting algorithm is applied to the partially functional linear model.5.2184x1 + 0.2 for the residuals derived from the simple least squares estimate approach for the functional linear model and from the partially function linear model. ˆ ˆ ˆ where β0 (t). The resulting model is ˆ ˆ y (t. we ﬁrst apply the functional linear model in Section 7. To determine which eﬀects are angle-varying eﬀects and which eﬀects are constant.17.5147x3 − 0.19 and 7. x). the resulting estimate of the backﬁtting algorithm is smooth and very close to the true coeﬃcient function. The ﬁtted values of the partially functional linear models are computed for all runs. x) = 1. 13). As a demonstration.3 An illustration In this section. the intercept and the coeﬃcients of the covariate Ramp are considered to vary over the crank angle and other coeﬃcients are considered to be constant. and β8 (t) are shown in Figure 7. 7. ˆ ˆ © 2006 by Taylor & Francis Group.0015 + 1.0392x2 − 0. We apply the spatial temporal model in Section 7.

9 1 FIGURE 7.5 t 0.2 0.1 0.5 2 0 −0.3 0.5 −1 0 0.2 0 0 0.7 0. LLC .Functional Response (a) Estimate of β2(t) 237 1.6 0.15 Estimates of β2 (t) and β3 (t).4 0.8 0.4 0.8 0.2 0. Solid lines stand for the estimated coeﬃcient.7 0. Estimated Coefficients 6 4 2 Least Squares Estimate 0 −2 −4 −6 −8 100 150 200 Crank Angle 250 300 350 FIGURE 7.1 0.6 0.6 0.4 0. The thick solid line stands for the intercept function ˆ β 0 (t). Dotted lines are estimated coeﬃcients of other factors.2 1 β (t) 0.4 1.8 3 0.5 t 0. The two thin solid lines are estimated coeﬃcients of Ramp. and dotted lines for the true coeﬃcient function.16 Estimated coeﬃcients.5 1 β (t) 0. © 2006 by Taylor & Francis Group.3 0.9 1 (b) Estimate of β3(t) 1.

05 Motion Error Motion Error 0 0 −0. In (a)–(c).05 0 0 −0. The solid line is PFLM ﬁt.05 150 250 200 Crank Angle 300 350 100 150 250 200 Crank Angle 300 350 (c) Run 9 0. dotted and solid lines are the simple least square estimate and smoothed estimates.1 (d) Run 13 0.05 0.238 (a) β 6 4 2 0 Design and Modeling for Computer Experiments 0 (b) β 6 4 2 7 7 0 0 β −2 −4 −6 100 150 200 250 Crank Angle (c) β8 6 4 2 8 β −2 −4 300 350 −6 100 150 200 250 Crank Angle 300 350 0 β −2 −4 −6 100 150 200 250 Crank Angle 300 350 FIGURE 7. respectively. 360) using functional linear model for the valvetrain data.1 Motion Error Motion Error 0.17 Estimated coeﬃcient functions within crank angle region (90. LLC .1 100 150 250 200 Crank Angle 300 350 FIGURE 7.1 0. © 2006 by Taylor & Francis Group.05 0.18 Valve motion errors compared to PFLM ﬁt.1 (b) Run 5 0. (a) Run 1 0.1 100 150 250 200 Crank Angle 300 350 −0.05 −0. and the dotted line is motion error.05 100 −0.05 −0.

the dotted line is the estimate using the residuals of the SLSE estimate.5 −1 −1.Functional Response 239 Estimated μ(t) 2 1. © 2006 by Taylor & Francis Group.5 1 0.5 −3 100 150 200 250 300 350 Crank Angle (a) Estimated σ (t) 35 2 30 25 20 Estimate 15 10 5 0 100 150 200 250 300 350 Crank Angle (b) FIGURE 7.5 −2 −2.19 Estimate of μ(t) and σ 2 (t). The solid line is the estimate using the residuals of the PFLM model.5 0 Estimate −0. LLC .

The solid line is the estimate of γ(t) based on the residuals of the PFLM model. LLC . © 2006 by Taylor & Francis Group.240 Design and Modeling for Computer Experiments (a) Estimated γ (t) 1 (b) Estimated γ (t) 2 6 10 5 Estimate 8 4 3 6 2 4 1 2 0 0 100 10 150 200 250 300 350 10 100 150 200 250 300 350 (c) Estimated γ3(t) (d) Estimated γ4(t) 8 Estimate 8 6 6 4 4 2 2 0 0 100 10 150 200 250 300 350 10 100 150 200 250 300 350 (e) Estimated γ5(t) (f) Estimated γ6(t) 8 Estimate 8 6 6 4 4 2 2 0 0 100 10 150 200 250 300 350 10 100 150 200 250 300 350 (g) Estimated γ7(t) (h) Estimated γ8(t) 8 Estimate 8 Estimate 6 6 4 4 2 2 0 0 100 150 200 250 Crank Angle 300 350 100 150 200 250 Crank Angle 300 350 FIGURE 7. and the dotted line is the estimate of γ(t) based on the residuals of the SLSE estimate.20 Estimate of γ(t).

For example. ⎥. . . Linear regression analysis has played an important role in design and modeling for computer experiments. we give only their deﬁnitions. a data set of 6 runs and 4 variables is given by ⎤ ⎡ 30 50 0.2. simple properties.112 4. linear regression analysis.0 ⎥ ⎥ D=⎢ ⎢ 28 52 0.8 ⎥ .1 Some Basic Concepts in Matrix Algebra This section reviews some matrix algebra concepts used in the book.2 ⎢ 25 54 0. where n is the number of runs and s the number or variables (factors).1. Matrix: An n × s matrix A is a rectangular array ⎤ ⎡ a11 · · · a1s ⎢ .7 where rows represent runs and columns represent variables. Section A. Data sets can be expressed as a matrix of size n × s. A review of matrix algebra will be given in Section A. probability. ⎥ ⎢ ⎣ 32 51 0. and method of linear regression analysis.230 4. and selection of variables in regression models.078 5. an1 · · · ans where aij is the element in the ith row and jth column. A.220 4. theory. . 241 © 2006 by Taylor & Francis Group. LLC .9 ⎥ ⎥ ⎢ ⎢ 31 48 0.1 ⎦ 29 55 0.150 5. statistics. We always assume in the book that all the elements are real numbers. and illustration examples. in this appendix we introduce many basic concepts in matrix algebra. ⎦ A = (aij ) = ⎣ . Due to limited space.125 5.3 gives a brief introduction to basic concepts. Section A.4 reviews some variable selection criteria for linear regression models.App endix To make this book more self-contained. Some basic concepts in probability and statistics needed to understand in the book are reviewed in Section A. .

in this case. is the s × n matrix with elements aji . i = 1. otherwise A is non-singular. ann ). Trace: The sum of the diagonal elements of a square matrix A is called the n trace of the matrix A. A square matrix A = (aij ) is called an upper triangle matrix if aij = 0 for all i > j. if at least one of its r-square sub-matrices is non-singular while every (r + 1)-square sub-matrix is singular. A square matrix A is called a diagonal matrix if aij = 0. • rank(A) = s < n is called the full column rank. denoted by A . A square matrix with full rank is non-singular. a. if s = 1. written A = diag(a11 . A is a column vector. A square matrix A is called symmetric if A = A. lowercase letter. · · · . i. Rank: A rank of a matrix A of size n × s is said to have rank r.242 Design and Modeling for Computer Experiments Transpose: The transpose of A. • rank(A) = s = n is called the full rank. rank(A) = n < s is called the full row rank. • rank(A) ≤ min(n.e. written as A = In . The original deﬁnition of |A is based on a complicated formula that we will not explain here due to space restrictions. and called a lower triangle matrix if aij = 0 for all i < j. In this case we write A = O. Zero matrix: A zero matrix is a matrix having 0’s as its elements. We shall use a bold-face. where I is the identity matrix. Determinant: The determinant of a square matrix A is denoted by |A| or det(A). If A is a diagonal matrix with aii = 1. trace(A) = i=1 aii . Inverse of a matrix: Matrix B is said to be the inverse of a square matrix A. the inverse is unique. or an n-column vector. A is called singular if |A| = 0. Row/column vector: If n = 1. LLC . then A is called the identity matrix or unit matrix of order n. n. s). A matrix has an inverse if and only if it is non-singular. © 2006 by Taylor & Francis Group. · · · . if AB = BA = I. j=1 Square matrix: A matrix A is called a square matrix if n = s. A is a row vector or s-row vector. written as rank(A) = r. • rank(A) = rank(A ). to denote a column vector. Product: The product of two matrices A = (aij ) : n × m and B = (bkl ) : m × s is an n × s matrix given by ⎞ ⎛ AB = ⎝ m aij bjl ⎠ . i = j.

© 2006 by Taylor & Francis Group. Non-negative deﬁnite/positive deﬁnite matrix: A symmetric matrix A of order n is called non-negative deﬁnite.. Hadamard matrix: A Hadamard matrix of order n. Let X be an n × s matrix with rank s ≤ n. A square matrix A is called a permutation matrix if A is an orthogonal matrix with elements either 1 or 0. denoted by A ≥ 0. Projection matrix: A square matrix A satisfying AA = A is called idempotent. Orthogonal matrix: A square matrix A is called an orthogonal matrix if A A = I. i.e. The vector li is called the eigenvector of A associated to λi . there exists a vector li such that Ali = λi li . the matrix A has n eigenvalues. The following properties are useful: • there are n eigenvalues. • the sum of the eigenvalues of A equals trace(A). • the eigenvalues of A are real numbers if A is symmetric.2). i = 1. For example. H. if for any nonzero column vector a ∈ Rn we have a Aa ≥ 0. Now H = X(X X)−1 X is a projection matrix and is called a hat matrix in regression analysis. if a Aa > 0. • the eigenvalues of A are non-negative. · · · . if A ≥ 0. There are n solutions. n. denoted by A > 0. For each eigenvalue.Appendix 243 Eigenvalues and eigenvectors: The eigenvalues of a square matrix A of order n are solutions of the equation |A − λI| = 0. and which satisﬁes that HH = nIn . denoted by λi . Section 1. is a square matrix of order n for which every entry equals either 1 or −1. LLC . ⎡ ⎤ 1 1 1 1 ⎢ 1 1 −1 −1 ⎥ ⎥ H=⎢ ⎣ 1 −1 1 −1 ⎦ 1 −1 −1 1 is a Hadamard matrix of order 4. a symmetric and idempotent matrix is called a projection matrix. it is called positive deﬁnite. λi . The following properties are useful: • X X ≥ 0 for any n × s matrix X. Deleting the ﬁrst column we obtain L4 (23 ) (cf. • the product of the eigenvalues of A equals |A|. Many 2-level orthogonal designs are generated via Hadamard matrices by deleting the column of ones of H if H has such a column. if A > 0. • the eigenvalues of A are positive.

Kronecker product: The Kronecker product (or tensor product) of A = (aij ) of size n × p and B = (bkl ) of size m × q is an nm × pq matrix deﬁned by ⎤ ⎡ a11 B · · · a1p B ⎥ ⎢.1) ⎦. because it takes on diﬀerent values in the population according to some random mechanism.1 Some Concepts in Probability and Statistics Random variables and random vectors In this subsection we review concepts of random variables/vectors and their distributions and moments. . Random variables: Suppose we collect piston-rings in a process.2. A random variable is called continuous if it is measured on a continuous scale and discrete if it is limited to a certain © 2006 by Taylor & Francis Group. B = ⎢ 1 1 −1 ⎥ ⎦ ⎣ 1 −1 1 ⎦ 2 1 1 1 1 1 ⎢2 A=⎢ ⎣3 4 Then 2 1 4 3 and C= 1 −1 . A ⊗ B = (aij B) = ⎣ . We ﬁnd that values of the diameter of piston-rings are spread through a certain range.244 Design and Modeling for Computer Experiments Hadamard product: The Hadamard product. The number of telephone calls from 9:00 to 10:00 in a commercial center is another example of random variable. also known as the elementwise product. −2 ⎥ ⎥ 6⎥ ⎥ −1 ⎦ 3 A. We might visualize piston-ring diameter as a random variable. (A.2 A. . an1 B · · · anp B Let ⎡ ⎤ ⎡ ⎤ 4 1 −1 −1 ⎢ ⎥ 3⎥ ⎥ . . . of two same size matrices A = (aij ) and B = (bij ) is the matrix C = (cij = aij bij ) and denoted by C = A B. LLC . −2 3 A 1 ⎢ −2 ⎢ ⎡ ⎤ ⎢ 2 1 −2 −4 ⎢ ⎢ ⎢ 2 1 −3 ⎥ ⎢ ⎥ and A ⊗ C = ⎢ −4 B=⎣ ⎢ 3 3 −4 2 ⎦ ⎢ ⎢ −6 4 3 1 ⎢ ⎣ 4 −8 ⎡ −1 2 3 −4 −2 1 6 −2 −3 4 9 −8 −4 3 12 −6 −2 4 6 −8 −1 3 3 −6 −4 2 12 −4 −3 1 9 −2 ⎤ −4 12 ⎥ ⎥ −3 ⎥ ⎥ 9⎥ ⎥.

Discrete distribution: When the possible values of X are ﬁnite or countable values. Xjk = xjk ) has a similar meaning to the previous one. Xp ) is called a random vector if all the components X1 . · · · form a discrete distribution. i = 1. Continuous distribution: If there is a continuous function p(x) such that x F (x) = −∞ p(y)dy for any −∞ < x < ∞. the function. · · · . x2 . the conditional distribution Fi1 . yp )dy1 · · · dyp . We write X ∼ F (x). In this case F (x1 . The binomial distribution. where F (x. xp ) = x1 −∞ ··· xp −∞ p(y1 . · · · . and called the cumulative distribution function (cdf) or more simply. If their distribution functions satisfy F (x. etc. xp ) is called the joint distribution of X1 . © 2006 by Taylor & Francis Group. Independence: Let X and Y be two random variables.Appendix 245 ﬁnite or countable set of possible values. Random Vector: A vector x = (X1 . respectively. The normal distribution and the exponential distribution are examples of continuous distribution. the Poisson distribution. is called the marginal distribution of (Xi1 . · · · . Xiq ). we say that X follows a continuous distribution F (x) with a probability density function (pdf) or density p(x). 2. · · · . The function F (x1 .··· . 2. · · · . Xiq ). are discrete distributions. · · · . Y ) and Fx (·) and Fy (·) are distributions of X and Y . Xp are random variables. · · · . LLC . xp ). we call X and Y statistically independent. where Fi is determined by F (x1 . for any x and y in R.iq (xi1 . · · · . Xp . The piston-ring diameter is a continuous random variable while the number of telephones is a discrete random variable. the function p is called the probability density function (pdf) of x. · · · . the distribution function of X. y) is the joint distribution function of (X. The distribution of (Xi1 . or just independent. Let X be a random variable. deﬁned by F (x) = P (X ≤ x). xp ) such that F (x1 . · · · . q < p. The distribution of Xi given Xj = xj is called the conditional distribution and denoted by Fi (xi |Xj = xj ). · · · say. Probability distribution function: A probability distribution function is a mathematical model that relates the value of a variable with the probability of occurrence of that value in the population. · · · . If there exists a non-negative and integrable function p(x1 . x1 . the probabilities pi = P (X = xi ). xiq |Xji = xj1 . such as the integers 0. · · · . Xp ≤ xp ) is called the cumulative distribution function (cdf) of x. 1. xp ) = P (X1 ≤ x1 . As an extension. · · · . · · · . y) = Fx (x)Fy (y). · · · .

Y ) = Obviously. Fx and Fy are the joint distribution function of x and y. Yq ) be two random vectors. Y ) = −∞ (x − E(X))(y − E(Y ))p(x. Mean vector. Expected value: (a) The expected value of a discrete random variable X with pi = P (X = xi ). y) is deﬁned by ∞ ∞ −∞ Cov(X. Y ) = Cov(Y. · · · . © 2006 by Taylor & Francis Group. i = 1. Xp ) have a pdf p(x1 .246 Design and Modeling for Computer Experiments Likewise. Correlation coeﬃcient: The correlation coeﬃcient between two random variables X and Y is deﬁned by Corr(X. where the integral is taken over the range of X. X). y)dxdy. LLC . · · · . (b) The expected value of a continuous random variable X with density p(x) is deﬁned as E(X) = xp(x) dx. we call x and y statistically independent. · · · . yq ). where F . respectively. covariance. Xp ) and y = (Y1 . · · · . Cov(X. · · · . · · · xp . Cov(X. where E(XY ) = xyp(x. Y ) = 0. Y ) = E(X − E(X))(Y − E(Y )). · · · xp )Fy (y1 . Y ) Var(X)Var(Y ) . yq ) = Fx (x1 . If their distribution functions satisfy F (x1 . xp ). or just independent. Assume each Xi has a Cov(X. · · · . Random variables X and Y are called uncorrelated if Cov(X. and we write Cov(X. The covariance of X and Y has the following properties: • • • • Cov(X. · · · . y)dxdy. the distribution function of x and of y. X) = 1. Y ) = E(XY ) − E(X)E(Y ). X) = Var(X). y1 . and correlation matrices: Let a random vector x = (X1 . 2. where the summation is taken over all possible values of X. let x = (X1 . Corr(X. Covariance: The covariance of two random variables X and Y with a pdf f (x. for any x ∈ Rp and y ∈ Rq . is deﬁned as E(X) = xi pi . Y ) = ρ(X.

then we denote x ∼ U (T ). 0 Σ22 where Σ11 = Cov(x1 ) and Σ22 = Cov(x2 ). univariate and multivariate normal distributions. Obviously. Xi ) = Var(Xi ). LLC . E(Xp ) 247 Let σij = Cov(Xi . . ⎥ . Cov(Xi . . 1) is also called simply a random number. ⎥ . . σii = Cov(Xi . Uniform distribution: The uniform distribution on a ﬁnite domain T has a constant density 1/Vol(T ). denoted by U (C s ). Let S be the diagonal matrix with diagonal elements σ1 .Appendix ﬁnite variance. ⎥ E(x) = ⎣ . The relation between Σx and Rx is given by Σx = SRx S. A random variable X that follows U (0. x2 ) . and the Gaussian process.1]. denoted by U (0. ⎦ . If a random vector x has the uniform distribution on T . · · · .2 Some statistical distributions and Gaussian process This subsection introduces the uniform distribution. x = (x1 . σp . Denote by σi the standard deviation of Xi . Xj ) = σij = σi σj ρij . Divide x into two parts. . Such random numbers play an important role in simulation. where ρij = Corr(Xi . 1]s . In particular. ⎦ Rx = Corr(x) = ⎣ . . Xj ) for i. σp1 · · · σpp is called the covariance matrix of x and the matrix ⎤ ⎡ ρ11 · · · ρ1p ⎢ . Then. the uniform distribution on [0. and the uniform distribution on the unit cube C s = [0.2. © 2006 by Taylor & Francis Group. The matrix ⎤ ⎡ σ11 · · · σ1p ⎢ . A. where Vol(T ) is the volume of T . 1]. If the covariance matrix of x has the form Σ11 0 Σ(x) = . ρp1 · · · ρpp is called the covariance matrix of x. has a constant density 1 on [0. 1). has a constant density 1 on C s . The mean vector of X is deﬁned by ⎡ ⎤ E(X1 ) ⎢ . · · · p. j = 1. we say that x1 and x2 are uncorrelated. Xj ). . . ⎦ Σx = Cov(x) = ⎣ .

2 = Σ11 − Σ12 Σ−1 Σ21 .2 .3) (A. where E(x1 |x2 ) = μ11. 2 then we say that x follows a multivariate normal distribution Np (μ. Partition x. In particular. Σ) are E(x) = μ. μ. denoted by X ∼ N (μ. LLC . · · · Xp are independently identically 2 2 variables following N (0.2 ). When p = 1. 1).2) where x1 . then X is said to be a normal random variable with mean μ and variance σ 2 . The density of Y is given by p(y) = e−y/2 y (p−2)/2 . The mean vector and covariance of x ∼ Np (μ. x2 μ2 Σ21 Σ22 (A. any marginal distribution of a multivariate distribution is a multivariate normal distribution. respectively. q < p.4) Chi-squared distribution: If X1 .x = . and Σ11 : q × q. 22 Cov(x1 |x2 ) = Σ11. In this case.μ = . μ1 ∈ Rq . 2π Multivariate normal distribution: If a p-dimensional random vector x has a pdf 1 p(x) = (2π)−p/2 |Σ|−1/2 exp − (x − μ) Σ−1 (x − μ) . In other words. The marginal distributions of x1 and x2 are x1 ∼ Nq (μ1 . then Ax follows a multivariate normal distribution.2 = μ1 + Σ12 Σ−1 (x2 − μ2 ). and Σ into x= μ1 Σ11 Σ12 x1 . Σ11 ). 22 (A. Σ) and A is a k × p constant matrix with k ≤ p. 1). respectively. Cov(x) = Σ. if a random variable X has a density p(x) = √ 1 1 exp − 2 2πσ x−μ σ 2 . If x ∼ Np (μ. When μ = 0 and σ = 1. Σ). it reduces to a univariate normal distribution. any linear combination of a multivariate random vector is still a multivariate normal distribution. its density reduces to 1 2 1 p(x) = √ e− 2 x . y > 0. The conditional distribution of x1 for given a x2 is Nq (μ1. Σ22 ) and x2 ∼ Np−q (μ2 . the corresponding distribution is called the standard normal distribution and is denoted by X ∼ N (0. then Y = X1 + · · · + Xp is said to follow a chi square distribution with p degrees of freedom. 2p/2 Γ(p/2) © 2006 by Taylor & Francis Group. σ 2 ). Σ11.248 Design and Modeling for Computer Experiments Normal distribution: For real number μ and positive number σ.

e. i = 1. j = 1. · · · . 2 ) 2 p 1+ x q −(p+q )/2 . . ·) is the beta function deﬁned by B(a. Its density is given by p(x) = ( p )p/2 x(p−2)/2 q q B( p . for any t1 . where tj = (t1j . many t-tests can be replaced by F -tests. tpj ). X(t2 )) depends only on |ti1 − ti2 |. tp }. t ∈ T } indexed by t is said to be stationary if its μ(t) and σ 2 (t) are constant (independent of the index t). F -distribution: Let X ∼ χ2 (p) and y ∼ χ2 (q) be independent. i. t ≥ 0.. Therefore. . t2 ) = Cov(X(t1 ). X(tp )) has an p-dimensional normal distribution. and covariance functions by μ(t) = E{Y (t)}. If X ∼ t(p). Kutner. t-distribution: Let X ∼ N (0. σ 2 (t) = Var{Y (t)}. where Γ(·) is the gamma function deﬁned by ∞ Γ(a) = 0 e−t ta−1 dt. Denote the mean. 1) and Y ∼ χ2 (p) be independent. a > 0. Draper and Smith (1981). and its Cov(X(t1 ). Nachtsheim and Wasserman (1996). A. . . Gaussian processes: A stochastic process {X(t). variance. 2. p). Details on the linear models can be found in textbooks such as Weisberg (1985). © 2006 by Taylor & Francis Group. p.3 Linear Regression Analysis There are many textbooks for linear regression analysis. . b) = Γ(a)Γ(b)/Γ(a + b) for any a > 0 and b > 0. · · · . t2 ∈ T . X(t2 ). Then the distribution of F = X/p is called the F -distribution with degrees of freedom Y /q p and q and denoted by F ∼ F (p. Then the distribution of random variable t = X/ Y /p is called the t-distribution with p degrees of freedom and denoted by t ∼ t(p). if for any ﬁnite set {t1 . q). (X(t1 ). . and Neter.Appendix 249 and is denoted by Y ∼ χ2 (p). then X 2 ∼ F (1. refreshing some of the basic and important concepts on linear regression analysis. and σ(t1 . Its density is given by Γ( p+1 ) p(t) = √ 2 p pπΓ( 2 ) 1+ t2 p −(p+1)/2 . This section gives a brief introduction to linear models. LLC . where B(·. . x ≥ 0. . t ∈ T } indexed by t is said to be a Gaussian process if any of its ﬁnite dimensional marginal distribution is a normal distribution. A Gaussian process {X(t).

Xs = xs ). or independent variables. given the values of X variables. m(x1 . the “random error” here refers to the part of the response variable that cannot be explained or predicted by the covariates.3 introduces spline regression method in details. that is. The parameters βs are usually unknown and have to be estimated. The polynomial regression. denoted by m(x1 . · · · . is still called a linear model as it is linear in the parameters βs. Thus. · · · . In general. · · · . It is common to further consider the s-th degree polynomial regression model: Y = β0 + β1 X + · · · + βs X s + ε. Note that the polynomial regression.3. while the X variables are often called explanatory variables. is the conditional expected value of the response variable. though nonlinear in x. We may ﬁrst try to ﬁt a linear line through the data. Xs = X s . Xs . may be used. the simple linear regression model does not ﬁt the data well. · · · . such as splines. The regression function. The term “linear model” refers to the fact that the regression function can be explicitly written as a linear form of βs. In other words. Example 28 Let {xi .250 Design and Modeling for Computer Experiments A. n be a bivariate random sample from the model Y = m(X) + ε. i = 1. This sometimes causes confusion.5) The Y variable is called the response variable. xs ) = E(Y |X1 = x1 . · · · . · · · . xs ). yi }. we approximate the model by a simple linear regression model Y = β0 + β1 X + ε. while ε is termed random error.1 Linear models A linear regression model is deﬁned as Y = β0 + β1 X1 + · · · + βs Xs + ε. or dependent variable. or regressors. X2 = X 2 . the polynomial model is now linear in the covariates X1 . other forms of regression functions. In general. may not ﬁt every data set well. being smooth everywhere. · · · . In many situations. (A. we consider linear models to be all models that are linear in unknown parameters. with E(ε|X) = 0 and var(ε|X) = σ 2 . LLC . © 2006 by Taylor & Francis Group. xs ) = β0 + β1 x1 + · · · + βs xs . but by deﬁning new variables X1 = X. This assumption is not always valid and needs to be checked. Section 5. such as m(x1 . or covariates.

(A. · · · . the method of least squares considers the deviation of each observation value yi from its expectation and ﬁnds the coeﬃcient βs by minimizing the sum of squared deviations: n s S(β0 . .Appendix 251 A.2 Method of least squares Suppose that {(xi1 .⎠ ⎝. ⎠ ⎝. · · · .1 If the linear model (A.7) can be written as S(β) = (y − Xβ) (y − Xβ). ⎟. and εi is random error. · · · . (A. yi ). If X X is invertible. they are uncorrelated random variables with mean 0 and common variance σ 2 . βs ) = i=1 (yi − j =0 xij βj )2 .7) Matrix notation (cf. · · · . xip . The method of least squares was advanced by Gauss in the early nineteenth century and has been pivotal in regression analysis. xis . Denote ⎞ ⎛ ⎞ ⎛ ⎛ ⎞ ⎛ ⎞ y1 x10 · · · x1s β0 ε1 ⎜. . the normal equations yield the least squares estimator of β ˆ β = (X X)−1 X y. Section A. yn xn0 · · · xns βs εn Then model (A.⎟ ⎜ . i = 1. we obtain the normal equations X y = X Xβ. · · · . then © 2006 by Taylor & Francis Group. X = ⎝ . . namely. ⎟. It is usually assumed that random errors {εi } are homoscedastic. (A. y = ⎝ . β = ⎜ .6) where we may set xi0 ≡ 1 to include the intercept in the model. n}. .1) can give us a succinct expression of the least squares solution. ··· .⎠ . ε = ⎜ .6) can be written in the matrix form y = Xβ + ε.6) holds. The S(β) in (A. For data {(xi1 . Diﬀerentiating S(β) with respect to β.8) The matrix X is known as the design matrix and is of crucial importance in linear regression analysis. . PROPOSITION A. ⎟. The properties of the least squares estimator have been well studied and are summarized as follows without proof. n} is a random sample from the linear model s yi = j =0 xij βj + εi .3. yi ). i = 1. LLC . ⎠.

For example. a β is a linear unbiased estimator of the parameter θ = a β. ˆ A.10) The residual sum of squares. −1. That is. . From Proposition A. .252 Design and Modeling for Computer Experiments ˆ (a) Unbiasedness: E(β|X) = β. . An unbiased estimator of σ is σ 2 = RSS/(n − s − 1). LLC . . 0) and h = 0. denoted by RSS1 and RSS0 under the full model and the null hypothesis. . . for any linear unbiased estimator b y of θ. ˆ (c) Gauss-Markov theorem: The least-squares estimator β is the best linear ˆ unbiased estimator (BLUE). This leads to ˆ ˆ ˆ β 0 = β − (X X)−1 C {C(X X)−1 C }−1 (Cβ − h). respectively. 0. ˆ ˆ where yi = ˆ xij βj is the ﬁtted value of the i-th datum point. . are ˆ ˆ RSS1 = (y − Xβ) (y − Xβ). (A. . and furthermore. 0) and h = 0. and ˆ ˆ RSS0 = (y − Xβ 0 ) (y − Xβ 0 ).11) © 2006 by Taylor & Francis Group. 0. the corresponding C = (0.3 Analysis of variance In practice. The F -test for H0 is based on the decomposition RSS0 = (RSS0 − RSS1 ) + RSS1 . For instance. Under H0 . its variance is at least as ˆ large as that of a β. deﬁned by n RSS = i=1 s j=0 2 (yi − yi )2 . or if one wants to test H0 : β1 = β2 against H1 : β1 = β2 . (A. the covariance formula involves an unknown parameter σ 2 . (A.9) where C is a q × p constant matrix with rank q and h is a q × 1 constant vector.3. 1. This can be formulated as the following linear hypothesis problem: H0 : Cβ = h versus H1 : Cβ = h. we can choose C = (0. for any given vector a. A natural estimator is based on the residual sum of squares (RSS).1 (b). one may ask whether some covariates have signiﬁcant eﬀects. various hypothesis testing problems arise. 1. ˆ (b) Covariance: Cov(β|X) = σ 2 (X X)−1 . the method of least squares is to minimize S(β) with constraints Cβ = h. if one wants to test H0 : β1 = 0 against H1 : β1 = 0.

1. the third column shows the degree of freedom.6) holds and {εi } are independently and identically distributed as N (0. and the fourth column is the mean squares. One can include more information in the ANOVA table. This row is typically omitted. The null hypothesis H0 is rejected if the observed F -statistic is large.2 If the linear model (A. q Proposition A. we illustrate the concepts of linear regression via analysis of an environmental data set.3. Further. TABLE A. The third row indicates the additional contribution of the variables that are in the full model but not in the submodel..n−s−1 distribution. σ 2 ).9) F = (RSS0 − RSS1 )/q RSS1 /(n − s − 1) has an Fq. then RSS1 /σ 2 ∼ χ2 n−s−1 . the one under the alternative hypothesis.2 shows that under the hypothesis H0 in (A. The total refers to the sum of the second through the fourth rows. LLC . under the null hypothesis H0 . It is important for gauging whether the contributions of those variables are statistically signiﬁcant.e.1 Analysis of Variance for a Multiple Regression Source SS df MS ¯ )2 − RSS1 Full-model SSR1 ≡ (Yi − Y s SSR1 /s ¯ s−q SSR0 /(s − q) Sub-model SSR0 ≡ (Yi − Y )2 − RSS0 Diﬀerence RSS0 − RSS1 q (RSS0 − RSS1 )/q Residuals RSS1 n − s − 1 RSS1 /(n − s − 1) Total SST = (yi − y )2 ¯ n−1 A. The second column is the sum of squares reduction (SSR). © 2006 by Taylor & Francis Group. which are the sum of squares divided by its associated degree of freedom.4 An illustration In this section.Appendix The sampling properties of these residuals sum of squares are as follows. i. (RSS0 − RSS1 )/σ 2 ∼ χ2 and is independent of RSS1 . 253 PROPOSITION A. This test may be summarized in the analysis of variance as in Table A. by adding a column on F -statistic and a column on P-value. The ﬁrst column indicates the models or the variables. The ﬁrst row shows the sum of squares reduction by using the full model.

0 0. and the outputs (mortality) are depicted in the right panel of Table A.0 Y1 19.0 20.37 31.0 0.85 32.0 0..1 1. we take the average of the three outcomes as the response variable. .0 20.52 78.01 32.0 0.0 Pb 16.0 10.2 0. it is recommended to standardize all six variables.79 40.0 0.254 Design and Modeling for Computer Experiments Example 7 (continued) Uniform design U17 (176 ) (see Chapter 3 for the basics of uniform design) was used to conduct experiments.2. That is.0 16.0656 are the sample mean and standard deviation of 17 Cd contents.01 1.12 33. and P-values of the regression coeﬃcients are depicted in Table A.1 12. This implies that the contents of metals may aﬀect the mortality. respectively.25 55.0 0.28 79.0 4. from which it can be seen that only Cd and Cu are very signiﬁcant.56 79.0 16.0 0.0 Cr 14.97 50. standard errors.0 0. For each combination in the left panel of Table A.0 10.62 32.90 31.57 Y2 17.0 0.2 0.01 0.05 0.0 0.0656.48 Note that the ratio of maximum to minimum levels of the six metals is 2000.0 1.87 37.0 1.05 0.4 0.0 5.65 31.8 5.0 0.01 8.0 0.0 2.79 25.14 39. under the assumption that the model under consideration is appropriate.05 50.6 22.77 29.1 0.0 16.61 41.0 4.3.8 1.01 16.99 30.0 18.0 5.0 18.29 60. © 2006 by Taylor & Francis Group.0 14. Zn.22 60.4 0.0 14.05 2. three experiments were conducted.18 30.2 Environmental Data Cd 0.05 18. Other metals seem not to be statistically signiﬁcant. is higher than the other ones.0 8.5624)/7.66 67.0 8.1 0.0 0. As an illustration.68 69.0 10.0 2.0 14.65 51.0 14.2 0.0 4.66 39.74 39. x1 = (Cd − 6.75 31.01 0.8 18.05 20.4 16.18 40.09 31.0 12. Denote x1 .0 5.0 Cu 0.95 22. β6 x6 + ε.0 8.0 0. Cr and Pb.87 55.54 59.0 12.71 67.2 5.0 4.1 4.0 12.0 1..87 33.48 24.0 10.69 67.0 20.43 71.0 0.0 12.2.0 Zn 0. The estimates. extracted from Fang (1994).0 0..70 30. The data were ﬁtted by a linear regression model y = β0 + β1 x1 + · · · .2 2.05 10.0 2. The actual design is displayed in the left panel of Table A.80 43.0 Ni 5.0 12.8 10. TABLE A.4 8. Cu. for example.1 20. where 6.2 14. Ni.94 67.86 28.4 4.22 22.0 0.0 8.8 0.0 2.0 0.0 20. corresponding to high levels of all metals.2.43 Y3 18.0 0.04 56. To stabilize numerical computation.86 24.01 18. The mortality in the last row.8 0. LLC .81 42. respectively.5624 and 7.0 18. x6 for the standardized variables of Cd.4 0.

0. β5 and β6 equal 0. it is of interest to test whether the eﬀects of Zn. which involves 22 items. Here we consider this example only for the purpose of illustration.9149 0.45 146.7 10 Total 5139.7 2 Diﬀerence 257.7765.3745 Ni 3. β4 . Standard Errors and P-Value X-variable Estimate Standard Error Intercept 42. and h = 0.4412 with P-value=0. The ANOVA table for the hypothesis is given in Table A. Hence we take C = (0.4 ANOVA Table for Example 7 Source SS df Sub-model 3420.4. TABLE A. 0.1410 P-value 0.1168 3.07 © 2006 by Taylor & Francis Group.2 16 MS 1710.0284 0.8514 3. only a few of which we believe are signiﬁcant. Cr.3579 3.1125 3. selection of variables is an important issue.Appendix TABLE A.45/146. This indicates in favor of H0 .7108 3.0000 0.1694 Cr −0.1410 Pb 0.0099 0.9115 From Table A. where I4 is the identity matrix of order 4.9313 Cd 10.07 = 0.2522 0. 0. from which we can easily calculate the F -statistic: F = 64. One may further consider interaction eﬀects and quadratic eﬀects of the six metals.3.3443 3.3745 Cu 8.8 4 Residuals 1460. The model becomes 6 y = β0 + i=1 βi xi + 1=i≤j≤6 βi xi xj + ε.3 64. and Pb are signiﬁcant. Thus.8639 2. Ni.1694 Zn −2.3 255 Estimates. This leads us to consider the following null hypothesis: H0 : β3 = β4 = β5 = β6 = 0 versus H1 : not all β3 . LLC . I4 ).5445 0.

log log n(σ/ n) and √ 2 log(s + 1)(σ/ n). where pλ (·) is a pre-speciﬁed non-negative penalty function. In this section. · · · .12) to be the L0 penalty: 1 pλ (|β|) = λ2 I(|β| = 0). although these criteria were motivated by diﬀerent principles. where xi0 ≡ 1. Suppose that {(xi1 . © 2006 by Taylor & Francis Group. and Shibata √ (1984)). and Var(εi ) = σ 2 . For instance. 2 s where I(·) is the indicator function. n} is a random sample from the following linear regression model: s yi = j =0 xij βj + εi . respectively.256 Design and Modeling for Computer Experiments A. E(εi ) = 0. Then a penalized least squares is deﬁned as Q(β) = 1 2 n s s (yi − i=1 j =0 xij βj )2 + n j =0 pλ (|βj |). β0 is the intercept. Using matrix notation deﬁned in the last section. LLC . log n(σ/ n). AIC (Akaike (1974)). the penalized least squares can be rewritten as Q(β) = 1 y − Xβ 2 s 2 +n j=0 pλ (|βj |). The factor “1/2” in the deﬁnition of the penalized least squares comes from the likelihood function with normal distribution for the random errors. i = 1. we brieﬂy review some recent developments of this topic. 2(σ/ n). φ-criterion (Hannan and Quinn (1979). · · · .1.4 Variable Selection for Linear Regression Mo dels As discussed in Section 5. Note that j=1 I(|βj | = 0) equals the number of nonzero regression coeﬃcients in the model. yi ). the Cp (Mallows (1973)). variable selection can be viewed as a type of regularization and may be useful for choosing a parsimonious metamodel.12) with the L0 penalty by choosing diﬀerent values of λ. There is a considerable literature on the topic of variable selection. (A. and λ is a regularization parameter. and RIC (Foster and George (1994)√ corre) √ √ √ √ √ √ spond to λ = 2(σ/ n).12) Many classical variable selection criteria in linear regression analysis can be derived from the penalized least squares. Miller (2002) gives a systematic account of this research area. BIC (Schwarz (1978)). xis . Hence many popular variable selection criteria can be derived from (A. Take the penalty function in (A.

7 (cf. ∞) as nonconvex penalized least squares in order to distinguish from the L2 penalty. which yields a “bridge regression. λ and a = 3.5 . Figure A. This penalty function involves two unknown parameters.4.1. with pλ (0) = 0. +∞). but it may be viewed as a concave function over (0. Fan and Li (2001)). Furthermore. for example. and L1 penalty functions. it requires an exhaustive search over all 2d possible subsets of x-variables to ﬁnd the solution.” The issue of selection penalty function has been studied in depth by various authors. to select a good subset of variables. people usually employ some less expensive algorithms. Fan and Li (2001) suggested the use of the smoothly clipped absolute deviation (SCAD) penalty.1 depicts the plots of the SCAD. deﬁned by pλ (β) = λ{I(β ≤ λ) + (aλ − β)+ I(β > λ)} for some a > 2 (a − 1)λ and β > 0. LASSO retains the virtues of both best subset selection and ridge regression which can be viewed as the solution of penalized least squares with the L2 penalty: pλ (|β|) = λβ 2 .5 penalties are nonconvex over (0.12) with the L1 penalty deﬁned below: pλ (|β|) = λ|β|. which can be viewed as the solution of (A. the three penalty functions all are singular at the origin. Therefore. The SCAD is an improvement over the L0 penalty in two aspects: saving computational cost and resulting © 2006 by Taylor & Francis Group. and the lack of stability. This approach is very expensive in computational cost.Appendix 257 Since the L0 penalty is discontinuous. Furthermore. subset variable selection suﬀers from other drawbacks. A. by Breiman (1996). Antoniadis and Fan (2001). Miller (2002)). As shown in Figure A. LLC . the most severe of which is its lack of stability as analyzed. The L1 penalty is convex over (−∞. Frank and Friedman (1993) considered the Lq penalty: pλ (|β|) = λ|β|q . L0. the SCAD and L0. Tibshirani (1996) proposed the LASSO. +∞) in order to reduce estimation bias. ∞). heavy computational load. such as forward subset selection and stepwise regression (see. for instance. We refer to penalized least squares with the nonconvex penalties over (0. for instance. q > 0.1 Nonconvex penalized least squares To avoid the drawbacks of the subset selection. This is a necessary condition for sparsity in variable selection: the resulting estimator automatically sets small coeﬃcients to be zero.

12).5 L 1 0 −6 −4 −2 0 β 2 4 6 FIGURE A. when βj = 0. the SCAD may also improve the L1 penalty by avoiding excessive estimation bias because the solution of the L1 penalty could shrink all regression coeﬃcients by a constant. 2 for βj ≈ βj0 . and they do not have continuous second-order derivatives. then set βj = 0. we set the ordinary least squares estimate as the initial value of β. Fan and Li (2001) proposed using a quadratic function to locally approximate the penalty functions. A. Otherwise the penalty functions can be locally approximated by a quadratic function as [pλ (|βj |)] = pλ (|βj |)sgn(βj ) ≈ {pλ (|βj0 |)/|βj0 |}βj .258 6 Design and Modeling for Computer Experiments Penalties 5 4 pλ(β) 3 2 1 SCAD L 0.2 Iteratively ridge regression algorithm The Lq for 0 < q < 1 and the SCAD penalty functions are singular at the origin. This poses a challenge in searching for the solution of the penalized least squares. Their procedure can be summarized as follows. LLC . the SCAD improves bridge regression by reducing modeling variation in model prediction. As usual.4. Suppose that we are given an initial value β 0 that is close to the minimizer of (A. Although similar in spirit to the L1 penalty. ˆ If βj0 is very close to 0. 1 2 2 pλ (|βj |) ≈ pλ (|βj0 |) + {pλ (|βj0 |)/|βj0 |}(βj − βj0 ). for instance. Furthermore.1 Plot of penalty functions in a continuous solution to avoid unnecessary modeling variation. the soft thresholding rule (Donoho and Johnstone (1994) and Tibshirani (1996)). (A. In other words.13) © 2006 by Taylor & Francis Group.

and further deﬁne generalized cross validation (GCV) scores. Alternatively. pλ (|βsk |)/|βsk |}.5863 and 0. Step 5. With the local quadratic approximation. we may deﬁne the eﬀective number of parameters using the expression (A. and let i = 1 and λ = λi . from which we can see that all procedures select exactly the same signiﬁcant variables. With λi . Step 3. In the course of iterations. We are surprised that all variable selection procedures except the LASSO yield exactly © 2006 by Taylor & Francis Group. compute the β using the iterative ridge regression algorithm. Detailed description of GCV scores has been given in Fan and Li (2001). The corresponding values of λ equal 0. the solution for the penalized least squares can be found by iteratively computing the following ridge regression: β k+1 = {X X + nΣλ (β k )}−1 X y for k = 0. λS ) say. with β 0 being the least squares estimate.3 to select the λ. and φ-criterion. The resulting estimate and the corresponding residual sum of squares (RSS) after variable selection is depicted in Table A.14) A. RIC. ˆ Step 2. Step 4.Appendix 259 Set the least squares estimate as an initial value for β. · · · . · · · .1) can be directly used to select the λ. Compute the CV or GCV score with λ = λi . (λ1 . (A. the tuning parameter λ is chosen by minimizing the GCV statistic. Repeat steps 2 and 3 until all S grid points are exhausted. Choose a grid point set. if βjk is set to be zero. respectively. Choice of the regularization parameter is important.5877 for the SCAD and the LASSO. We summarize the variable selection procedure via nonconvex penalized least squares as the following algorithm: Algorithm for Nonconvex Penalized Least Squares Step 1. Hunter and Li (2005) studied the convergence of this algorithm. BIC. Let i = i + 1.3 An illustration We next illustrate the variable selection procedure introduced in previous sections by application to the data in Example 7. 1. Thus. For stepwise regression with Fin and Fout . For the LASSO and the SCAD. It is clear that the cross validation method (5. The ﬁnal estimator for β is the one that has the lowest GCV. where Σλ (β k ) = diag{pλ (|β1k |)/|β1k |. the variable Xj is deleted from the model. For the AIC.14). we can use the GCV procedure presented in Section 7. · · · .5. We call this algorithm iteratively ridge regression algorithm. the subset is also selected by using the stepwise regression algorithm. LLC .4. we follow the traditional regression analysis by setting the F values to be the 95% percentile of the corresponding F distributions.

864 42.276 10.087 φ 42.087 101.528 10.864 10. LASSO slightly shrinks the regression coeﬃcients and yields a slightly larger residual sum of squares.087 42.528 10.528 8.506 0 0 0 0 102.029 © 2006 by Taylor & Francis Group.5 Estimates and RSS after Variable Selection Variable F-Values AIC BIC RIC Intercept 42.087 SCAD LASSO 42.260 Design and Modeling for Computer Experiments the same ﬁnal model.087 101.864 10.014 8.528 8. TABLE A.528 Cu 8.020 7.864 42.014 8.014 0 0 0 0 101.014 8.014 Zn − − − − Ni − − − − Cr − − − − Pb − − − − RSS 101.864 42.014 − − − − 101.528 10.087 101. LLC .864 Cd 10.

LLC . vibration and harshness Orthogonal array OA-based Orthogonal array-based Latin hypercube design Orthogonal column Latin designs Partial correlation coeﬃcient Probability density function Prediction error Partial rank correlation coeﬃcient Predicted error of sum of squares Reliability-based design Radial basis function Resolvable balanced incomplete block design Restricted maximum likelihood Risk information criterion Rotation per minute Response surface methodology Residual sum of squares 261 © 2006 by Taylor & Francis Group.Acronyms AIC Akaike information criterion ANOVA Analysis of variance MLHS BIBD Balanced incomplete block design MLP BIC Bayesian information criterion Mm BLUE Best linear unbiased predictor MMSE BLUP Linear unbiased predictor BPH Balance-pursuit heuristic MS CD Centered L2 -discrepancy MSE cdf Cumulative distribution NVH function CFD Computational ﬂuid dynamics OA CR Correction ratio CV Cross validation LHD DACE Design and analysis of computer experiments OCLHD df Degrees of freedom EFAST Extended Fourier amplitude PCC sensitivity test ESE Enhanced stochastic pdf evolutionary FAST Fourier amplitude sensitivity PE test PRCC FFD Fractional factorial design FLM Functional linear model PRESS GCV Generalized cross validation glp Good lattice point RBD IMSE Integrated mean squares error RBF LASSO Least absolute shrinkage and RBIBD selection operator LHD Latin hypercube design REML LHS Latin hypercube sampling LHSA Latin hypercube sampling RIC algorithm RPM LS Local search RSM MAR Median of absolute residuals MARS Multivariate adaptive RSS regression splines Midpoint Latin hypercube sampling Multi-layer perceptron Maximin distance design Maximum mean squared error Mean squares Mean square error Noise.

262 SA SCAD Design and Modeling for Computer Experiments Simulated annealing or sensitivity analysis Smoothly clipped absolute deviation SE Stochastic evolutionary SLHD Symmetric Latin hypercube design SLSE Simple least squares estimate SRC Standardized regression coeﬃcient SRRC Standardized rank regression coeﬃcient SS Sum of squares SSE Sum of squares of error SSR Sum of squares of regression SST Sum of squares of total TA Threshold accepting UD Uniform design URBD Uniformly resolvable block design VCE Variance conditional expectation WD Wrap-around L2 discrepancy WMSE Weighted mean square error © 2006 by Taylor & Francis Group. LLC .

B. A. Oxford. A. and Oppenheim. Giglio. Maximum likelihood identiﬁcation of gaussian autoregressive moving average models. D. (1995). Regularization of wavelets approximations (with discussions). P. M. 19. Buck. Chemom. Oxford Science Publications. 96. J. J. Hong Kong Baptist University. (1974). and McKay. W. Royal Statist. (1996). (2001). R. in Workshop on Quasi-Monte Carlo Methods and Their Applications. Lattices and dual lattices in experimental design for Fourier model. P. Ge. (2001). Contr. G. R. J. Resolvable balanced incomplete block designs with a block size of 5. M. 716–723. IEEE Trans. Akaike. Riccomagno. E. 246–255. (1995).. Antoniadis. Experimental design and observation for large systems (with discussion).. Beckman. LLC . An. 49–65. B 58. R. Atkinson. and Bagacki. The relationship between variable selection and data augmentation and a method for prediction. Bates. D. J.. Antoniadis. J. and Wynn.. Schwabe. 185–198. Complexity 17. A. J. Inference 95. A. (1992). R. Assoc. pp. Technometrics 16. Allen. Hong Kong. Systems 43. Akaike. R. H. 939–967. (1973). 1–14. R. and Owen. Ser. A. L. A comparision of three methods for selecting values of input variables in the analysis of 263 © 2006 by Taylor & Francis Group. Lab. M. P. Atkinson.. J. Wavelets and Statistics. A. H. SpringerVerlag. Intell. and Wynn. A new look at the statistical model identiﬁcation. E.. J. Statist. B. J. 77–94.References Abel.and T -optimum designs for the kinetics of a reversible chemical reaction. Greig. on Autom. Plann. Bates. Bogacka. 588–607. and Fan. H.. (2001). and Wynn. Technometrics 45. (1974). Conover. 125–127. (1979). (2003). and Donev. G. A. Amer. H. R. A. (1998). H. Bates.. and Zhu. A global selection procedure for polynomial interpolators. A. M. Biometrika 60. Statist. Soc. B. Quasi-regression. 255–265. Optimum Experimental Designs.. R. D. New York. Riccomagno. J.

O. and Trosset. Healy. R. and Zhu. Handbook of Statistics 13. R. Proceedings of the 1997 Winter Simulation Conference pp. Box. Technometrics 37. 261–268. 2350–2383. Letters 54.. J. W. L. Statist. (1984). R. 63. LLC . in. Bellman. J. The intrinsic Bayes factor for model selection and prediction. Heuristics of instability and stabilization in model selection. Amer. Y. One-step Huber estimates in linear models. R. Dennis. Amer.. V. (2001). Amsterdam. Schmiediche. (1996). 115–133. Y. eds. Technometrics 21. J. S. 357–367. & Probab. 1055–1098. V. Wiley. (1996). 428–433. Meth.. Seminar pp. Statist. Modeling ozone exposure in Harris County. Oxford. R. T. L. Y. Wang. P. 109–122. E. Carroll. 24. New York. Hunter. J. Abhandlungen aus Math. Statist. I. Princeton. J. P. 1–13. Statist. Assoc. Breitung. D. Asymptotic approximations for multinomial integrals. J..L. Oxford University Press. P. Breiman. and Tarantola. Bickel. Engineer. Li.. I. N. (1997). W. H.. G. Seraﬁni. A rigorous framework for optimization of expensive function by surrogates. (1995). Andrad´ttir. Booker.. J. Texas. H. Ghosh and C. Statist. and Mukerjee. E. J. and George. Structural Optimization 17. Breiman. and Pericchi. C. T. Rao. E. Chan. Newton. Chang. P.. J. A. (1975). R. S. Statistics for Experimenters. 373–384. H. D. 70. (1995). Neural Networks for Pattern Recognition. Princeton University Press. (1999). Torczon. in S. and Model Building. in S. Chen. © 2006 by Taylor & Francis Group. Adaptive Control Processes. 92. An Introduction to Design. Better subset regression using the nonnegative garrote. J. Sensitivity analysis of model output: variance-based methods make the diﬀerence. A method for exact calculation of the discrepancy of low-dimensional ﬁnite point sets (I). K. (1961). 189–192. M. 239–245. (1993).. K. Bundschuh. Nelson eds. 392–413. Assoc. Div. Bishop. (1996). Saltelli. L. Data Analysis. Fang.. Frank. (1978). Univ. C. Model robust designs. Berger. Withers and B.. Hamburg. Amer. Chan. and Notz. H.. Ann. J. L. (1997).. 91. K.264 Design and Modeling for Computer Experiments output from a computer code. o K. A characterization for orthogonal arrays of strength two via a regression model. J. Bd. and Hunter. J. G. Elsevier Science B. A. 110. Assoc.

Statist. CRC Press. T. S. and Qin. Chen. (1992). LLC . (2001). J.. Smoothing noisy data with spline functions: estimating the correct degree of smoothing by the method of generalized cross-validation. Daubechies. Philadelphia. 478–485. I. and Novak. © 2006 by Taylor & Francis Group. CRC Handbook of Combinatorial Designs. A. 1–42. Cheng. 593–607. J. with applications to the design and analysis of computer experiments. Approximation by superposition of a sigmoidal function. in J. Zwick. Nonlinear sensitivity analysis of multiparameter model systems. A procedure for robust design: Minimizing variations caused by noise factors and control factors. 86. Amer. Morgan Kaufmann. H. and Ylvisaker. Statistica Sinica 7. San Matco. and R. and Tsui.. 1009–1016. P.. K. Cybenko. Toward faster stochastic gradient search. Amer. (1996). and Wu. C. H.References 265 Chatterjee. (1993). T. K. J. J. Cheng. D. Inference 128. Tripathy. Mitchell. (1989). Wavelets: A Tutorial in Theory and Applications. Signal Systems 2. H. Cukier. Advances in Neural Information Processing System 3. C. J. M. O. A. 953–963. B.. C.. Bayesian prediction of deterministic functions. C.. Cressie. ASME Journal of Mechanical Design 18. Assoc. N. C. E. (1992). J. G. (1992). G. C. R. F. Rice.. Assoc. Smoothing spline estimation for varying coeﬃcient models with repeatedly measured dependent variables. and Moody. K. Darken. 96. SAE paper 2002-01-0663. 31. Levine. K. S. CA. 8. Allen. Colbourn.. 303–314.-L. 3D engine analysis and mls cylinder head gaskets design. Orthogonal arrays with variable numbers of symbols. 377–403. Contr. Currin. Society of Automotive Engineers. Boston. (2002). (1996). New York. Numer. C. Chui. E. Statist. (1997). Ten Lectures on Wavelets.. Academic Press Inc. S. Mistree. and Schuler. (1991). Statist. Y. J. CBMS-NSF. 929–939. New York. E(s2 )-optimal superaturated designs. Wiley. Chen. Statist. J. Math. A lower bound for centered L2 discrepancy on asymmetric factorials and its application. C. B. P. 605–619. SIAM. G. J. J. Math. Ann. Morris. and Wahba.. Lippmann eds. J. Craven. Statistics for Spatial Data. T. I. 447–453. Chiang. Fang. (2004). (1980). T. and Dinita. Computing Physics 26. (1979). W. Hanson. Moody. Plann. (1978). K.

New York. J. R. 225–233. L. (1996). Du. Statist. Sudjianto. and Gijbels. N. the minimal l1 -norm near-solution approximates the sparsest near-solution. Assoc. 1199–1207. D. 161–175. (2004). I. 99–104. 425–455. (1999). Du. (1991). and Smith. and Mukerjee. J. Fan. (1988). Granger. 87. A. Threshold accepting: A general purpose algorithm appearing superior to simulated annealing. Fractional Factorial Plans. J. P. (1996). Meth. Semiparametric estimates of the relation between weather and electricity sales. Applied Regression Analysis. Assoc.. Donoho. Piston slap excitation: literature review. Y. J. (1978). Gupta and J. L. SAE paper 962396. ASME J. (2004). Dueck. H. W. Biometrika 81. de Luca. X. D. G. and Sudjianto. Local Polynomial Modelling and Its Applica- © 2006 by Taylor & Francis Group. F. For most large underdetermined systems of linear equations. J. 42.. and Weiss. (1996). W. P. Fan. D. (1986). ASME Journal of Mechnical Design 126. 11. John Wiley.. Statistical Decision Theory and Related Topics IV.266 Design and Modeling for Computer Experiments De Boor. 163–175. New York. A. (1994). Society of Automative Engineering. A Practical Guide to Splines. (1992). and Marx. C. Ideal spatial adaptation by wavelet shrinkage. 2nd ed. Computational Physics 90. N. S. J. B. Donoho. L. the minimal l1 -norm solution is also the sparsest solution. 310–320. Amer. D. Design-adaptive nonparametric regression. New York. Sequential optimization and reliability assessment method for eﬃcient probabilistic design. Statist. Wiley. An integrated framework for optimization under uncertainty using inverse reliability strategy. R. X. J. A. S. J. Engle. eds. O. Berger. For most large underdetermined systems of linear equations. W. Dey. Du. (2004). Rice. 81. I. Statist. in S. Manuscript. Sci. and Gerges. Flexible smoothing with b-splines and penalties. Design 126. C. AIAA J. R. and Johnstone. and Scheuer. A.. H. LLC . (2004a). New York. 1–9. Springer–Verlag. and Chen. Donoho. Springer-Verlag. and Chen. X. (2004b). The ﬁrst order saddlepoint approximation for reliability analysis. Diaconis. M. C. C. Eilers. Draper. 89–121. Amer. Bayesian numerical analysis. (1981). Manuscript. T.

N. T. Soc. 10–26. Fang. Liu. Royal Statist. Quality. Monte Carlo and Quasi-Monte Carlo Methods 2000 pp. J. N. Amer. Hickernell and H. G. Construction of e(fN OD )optimal supersaturated designs via room squares. T. G. J. J. New estimation and model selection procedures for semiparametric modeling in longitudinal data analysis. Science in China (Series A) 33. Construction of optimal supersaturated designs by the packing method. Beijing. Inter. Discussion of the papers by Atkinson. Fang. Proﬁle likelihood inferences on semiparametric varying-coeﬃcient partially linear models. J. and Liu. Comp. in K. 339–349. Fan. K. Construction of uniform designs via super-simple resolvable t-designs. and Li. Construction on minimum generalized aberration designs. Fang. Fang. and Safety Engineering 9. G. M. Science Press. and Hickernell. (in Chinese). (1995). Q. (2003). J. (1980). R. © 2006 by Taylor & Francis Group. Ser. Liu. 37–50. Ge. T. Chaudhuri and M.. H. F. Statist. Fang. B 62. Two-step estimation of functional linear models with applications to longitudinal data. (2004). H. J. K. K. T. and Huang. Assoc. New York. T. 15–32. and Hickernell. J. Uniform Design and Uniform Design Tables. T. 446–458. J. (1996). K. N. (2002). (2004). Niederreiter.. Math. and Zhang. K. J. 305–315.. F. Ge. (2005).. in A. Theory. K.. 843–851. Some applications of quasi-Monte Carlo methods in statistics. M. Q. T. method and applications of the uniform design. K. 363–372. 71–84. London. T. N. (2002). N. Q. F. (2002). Fang. K. J. T. Q. Ge. Acta Math. Ghosh. K. and Qin. Variable selection via nonconcave penalized likelihood and its oracle properties. Fang. Fang. Sinica 3. J. 267 Fan. (2001). 96. (2001). Ge. K. Fang. and Ge. T. T. Beijing. Springer.References tions. (1994). (2004). T. J. and Liu. LLC . (2000). G. Utilitas Mathematica 66. Metrika 57. 303– 322. K. An eﬃcient algorithm for the classiﬁcation of Hadamard matrices. R. Fang. 73. 710–723. The uniform design and its applications. Fan. in Bulletin of The International Statistical Institute pp. M. Assoc. 1348–1360. Fan. and Qin. 99. and Li. Fang. Statist. Fang. The uniform design: application of number-theoretic methods in experimental design. G. T. Calcutta Statistical Association Bulletin 52. Bernoulli to appear. Reliability.. Appl. M. Chapman & Hall. Amer.

and Ma. Rao. Y. K. 71. (1994). Fang. Fang. Orthogonal and Uniform Experimental Desigs. K. Maringer. Khattree and C. T. Uniformity in fractional factorials. Latin hypercube and uniform designs. (2003). Uniform design: Theory and applications. (2003). and Winker. (2002). Fang. K. X. Springer. C. J. in R. J. and Winker. Fang and F. K. Elsevier. Fang. Lower bounds and stochastic optimization algorithms for uniform designs with three or © 2006 by Taylor & Francis Group. 279–291. (2003). F. K.. Fang. Monte Carlo and Quasi-Monte Carlo Methods 2000 pp. P. T. Fang.. Lin.. 268–272. C. and Zhang. K. (2000).. Springer. Y. On the construction of multilevel supersaturated designs. (2000). and Winker. J. New York. Centered L2 -discrepancy of random sampling and Latin Hypercube design. K. Handbook on Statistics in Industry pp. (2003). K. K. X. K. J. 239–252. T. Inference 86. and Li. Monte Carlo and Quasi-Monte Carlo Methods 2000 pp. K. Fang. 103. Relationship between uniformity. Fang. (2005). (2001). D. K. J. Fang. (2001a). North–Holland. 692–711. 131–170. Ser. T. P. X. Fang. Lin. Lin. T. T. P. T. K. (2002). Hickernell and H. Royal Statist. Math. Complexity 20. in K. Centered L2 -discrepancy of random sampling and latin hypercube design. Fang. in K. and construction of uniform designs. Lu. Q. K. Tang. K. R. Wrap-around L2 -discrepancy of random sampling. K. and Winker.268 Design and Modeling for Computer Experiments and Bates et al. Math. (2000). Some new results on uniform design. C. Hickernell and H.. Ma. 275–296. Niederreiter. X. T. J. T. X. Chinese Science Bulletin 40. D. Comp. C. Fang. 237–248. D. Science Press. Tang. and construction of uniform designs. aberration and correlation in regular fractions 3s−1 . Fang. T. D.. T. Plann. T. B 58. J. D. J. and Lin. LLC . New York. 213–231. and Liu. J. C. 275–296. Uniform designs and their application in industry. Niederreiter. X. Lu. Statist. Ma. and Ma. 608–624.R. C. Complexity 17. Amsterdam.. Ma. Fang.. Optimal mixed-level supersaturated design and computer experiment.. X. K. X. Y. K. J. Comp. P. and Mukerjee. Soc. Discrete Mathematics 19. Constructions of uniform designs by using resolvable packings and coverings. K. 268–272. and Ma. C.. M. T. T. Technometrics 42.. Metrika 58. J. T. 232–241. (2001b).. Winker. and Yin. J. T. P. Fang. Beijing. Lower bounds for centered and wrap-around l2 -discrepancies and construction of uniform. 71.. and Ma.

Biometrika 87. Faraway. Variable selection via gibbs sam- © 2006 by Taylor & Francis Group. T. T. J. and Mukerjee. and Qin. H. Regression analysis for functional response. Chapman & Hall. Technometrics 39. T. (2000). D. 19. X. Fang. Uniformity pattern and related criteria for two-level factorials.. (2000). T. Technical Report MATH-175. On uniform design of experiments with restricted mixtures and generation of uniform distribution on some domains. H. Ann. Tang. A connection between uniformity and aberration in regular fractions of two-level factorials. and Yin.References four levels. J. Fang. (1997). K. W. Statist. 22. K. E. & Probab. and Friedman. K. Metrika 60. Ann. Uniformity and orthogonality. 59–72. Fang. Z. and Qin. T. 1–12. (1999). Lower bounds for wrap-around l2 -discrepancy and constructions of symmetrical uniform designs. K. Statist. to appear. and Wang. in Proceedings of The 3rd International Chinese Statistical Association Statistical Conference Beijing. E. Hong Kong Baptist University. George. Assoc. Statistica Sinica 9. K. Fang. I. and Pan. Uniform designs based on Latin squares. R. LLC . K. R. 76. & Probab. 269 Fang. 254–261. 1–141. Friedman. T. (2005). Number-Theoretic Methods in Statistics. Shiu. Fang. 1993– 198.. Fang. A. London. Comp. T. and Yang. (1994). Fang. 215–224. Technometrics 35. George. Y. 95. H. Foster. 109–148. J. I. H. P. Math. The risk inﬂation criterion for multiple regression. (1991). (1995). Science in China Ser. (1993). E. 905–912. Relationships between uniform design and orthogonal design. 1304–1308. Y. Letters 61. I. P. 1947–1975. K. (2004a). Statist. Complexity 21. Y. (2004b). and Qin. E. 113–120. Discrete discrepancy in factorial designs. T. C. H. Statist. K. I. J. E. Frank. (1994). Statist. A statistical view of some chemometrics regression tools. H. T. (1993). 47. Multivariate adaptive regression splines. and Winker. Letters 46. to appear. and McCulloch. Fang. J. and George. The variable selection problem. J. Amer. (2002). Fang. Math. (1998). A note on construction of nearly uniform designs with large number of runs. K. J. (1999).

El Masri. Dropps. The determination of the order of autoregression. 190–195. in P. and Quinn. Society of Automotive Engineers. and Selmane. Giglio. Hickernell. 757–796. New York. G. Hastie. Haldar. Larcher. Random and Quasi-Random Point Sets pp. N. N. B 41.. B. Goodness-of-ﬁt statistics. (1999). Springer–Verlag. Neural Network Design. discrepancies and robust designs. Spline smoothing in partly linear models. 67. PWS Publishing. Anderson. e A. Hannan. S. Hagan. F. M. M. 106– 166. (1986). Soc. Amer. Exact and invariant second-moment code format. J. J. (2002). Riccomagno. Royal Statist. B. Heckman... and Stufken. T. M. H. J. 923–938.. (2003). Transient non-linear fea and tmf life estimates of cast exhaust manifolds. NJ. LLC . J. Massachusetts. and Wynn. T. H. Statist. 73–78. Statist. Hickernell. C. Ser. 299–322. Hassoun. Letters 44. (2001). R. Varying-coeﬃcient models (with discussion). G´rard. Hickernell. Lattice rules: How well do they measure up?. Appl. Hazime. Neural Networks: A Comprehensive Foundation. Royal Statist. Felice. M. Ser. M. Orthogonal Arrays: Theory and Applications. Statist. and Mahadevan. (1996). & Probab. Numerical modeling of engine noise radiation through the use of acoustic transfer vectors . J.. H. B. Hasofer. Meth. Sloane.. D. 111–121. M. F. J. Springer. Royal Statist.a case study. J. (1993). Society of Automotive Engineer.. Soc. S. L. M. Y. 244–248. Upper Saddle River. E. 991–889. Math. (1998a). M. H. and Liu. Soc. Cambridge. Gr¨bner bases strategies o in regression. (1998b). Prentice Hall. Q. and Beale. R. Boston. M. S.. J. A. Assoc. Bio- © 2006 by Taylor & Francis Group. Haykin. Tournour. B 55. A. F. (2000). J. Fundamentals of Artiﬁcial Neural Networks.270 Design and Modeling for Computer Experiments pling. S. F.. N. S. J. Comp. Uniform designs limit aliasing. (1998). Hedayat. P. (1995). New York. and Tibshirani. SAE Paper 2001-01-1514. 100. Demuth. J. A. (2000). A generalized discrepancy and quadrature error bound. H. (1979). (1999). H. N. Ser. J. SAE 2003-01-0918. B 48. E. John Wiley & Sons. 88. 27. (1974). Hellekalek and G. Engineer. The MIT Press. Cremers. Reliability Assessment Using Stochastic Finite Element Analysis. and Ali. New York. F. Hickernell. J. and Lind. A.

R. 893–904. and Helton. and Iman. A robust measure of uncertainty importance for use in 44 fault tree system analysis. R. Reliability Engineering and System Safety 52. Q. (1999).. (2000). (1988). © 2006 by Taylor & Francis Group. 29. Huang. K. L. 38. applications of uniform design and modern modeling techniques. and Zhou. O. J. E. and Xu. J. Z. Springer and Science Press. Importance measures in global sensitivity analysis of nonlinear models. Iman. Technical Report SAND85-2839. (1970). 809–822. (2002). A. Comm. Sudjianto. Case studies in computer experiments. A distribution-free approach to inducing rank correlation among input variables. L. Nonparametric smoothing estimates of time-varying coeﬃcient models with longitudinal data. M. Technometrics 12. Ho. 401–406. M. Applications of Number Theory to Numerical Analysis. 1617–1642. Hua. Du. Y. Risk Analysis 10. 33. Iman. Høst. and Hora. Homma. and Wang. Hong Kong Baptist University. P. (2003). 295–312.References metrika 89. Biometrika 89. S. Society of Automotive Engineers. Hoﬀman. Chinese Statist. C. 271 Ho. and Li. Hoover... L. D. 111–128. A. R. L. Sandia National Laboratories. (2001). Simulation and Computation 11. L. R. J. Statist. Robust piston design and optimization using piston secondary motion analysis. C. Hora. (1981). O. (1990).. W. (1996). A. G. W. Statist. Comp. & Data Anal. T. Risk Analysis 8. X. Kriging by local polynomials. (1989). Wu. and Saltelli. 1–17. Berlin and Beijing. Iman. R. LLC . L. An investigation of uncertainty and sensitivity analysis techniques for computer models. Wu. W. A comparison of maximum/boulding and Bayesian/Monte Carlo for fault tree uncertainty analysis. J. Rice. L. 55–67. Hunter. Ridge regression: Biased estimation for non-orthogonal problems. Ann. R. 71– 90. Biometrika 85. and Yang. and Stout. C. C. 311–334. R. C. Z.. J. R. Assoc. Varying-coeﬃcient models and basis function approximations for the analysis of repeated measurements. J. S. Master’s thesis. and Kennard. W.. SAE Paper 2003-01-0148. R. (2005). Statist. 395–410. A. and Conover. M. Hoerl. (1982). (1998). Variable selection using mm algorithms. Applications of uniform design to computer experiments. D.

(1994). Statist. MacMillan Publishing Co. 41. R. J. I. 4th ed. S. Rao. 99–116. (1990). W. Gelett. A. and Vecchi. An eﬃcient algorithm for constructing optimal design of computer experiments. Marron. P. J. 272–297. 401–407. R. Amer. E. and Ylvisaker. J. (1983). and Sloan. N. Marron. Royal Statist. 268–287. J. (1959a). 495–502. (2004). in S. S. Ann. (1979). L. Kleijnen. Society of Automotive Engineer. Computer experiments. Akad. (1959b). S. B 21. Kirkpatrick. J. Assoc. (1997). Statist. (1970). Kalagnanam. and Sheather. Structural and Multidisciplinary Optimization 25. The use of metamodeling techniques for design under uncertainty. and Owen. W. Analytical metamodel-based global sensitivity analysis and uncertainty propagation for robust design. M. A. New York. Optimum experimental design. The approximate computation of multiple integrals. Nauk. and Sudjianto. C. Minimax and maxmin distance design. Korobov. Joe. J. Marcel Decker. A. C. Jin. J. M. R.. Statist. Soc. N. S. and Chen.. Kimeldorf. Jones.V. W. B.. X. J.. Jin. 261–308. Inference 26. 337–381. (1996). H. Chen. New York.. (2005). Kendall. D. M. Korobov. and Sheather. (1959). Elsevier Science B. C. Statist. M. Handbook of Statistics 13. U. Progress in databased bandwidth selection for kernel density estimation. and Sudjianto. G. Computat. Du. and Stuart. R. A correspondence between Bayesian estimation on stochastic processes and smoothing by splines. Statistical Tools for Simulation Practitioners. (2003). and Wahba. Ser.. Amsterdam. Ghosh and C.. 91. A. M. Plann. (1996b). Plann. A brief survey of bandwidth selection for density estimation. M. J. J. SSSR 124. (1996a). J. Lattice Methods for Multiple Integration. The Advanced Theory of Statistics. Inference 134. 11. 621–630. LLC . S. (1987). R. Koehler. J. Math. 1207–1210. Computation of multiple integrals by the method © 2006 by Taylor & Francis Group. Optimization by simulated annealing.272 Design and Modeling for Computer Experiments Jin. S. G. Moore. C. Dokl. M. S. Statist. and Diwekar. eds. Chen. 308–319.. SAE Paper 2004-01-0429.. M. 131–148. Jones. Clarendon Press. Oxford. Science 220. Johnson. An eﬃcient sampling techniques for oﬀ-line quality control. Kiefer. Technometrics 39.

Liang. J. and Sudjianto. W. 156–163. Technometrics 37. 19–25. (1993). Le Cun. (1951). S. C. Technometrics 39. M. Lab. W. Z. Technical Report GRS-A-1700. and Fang. Giles. Intell. Li. Cl . Him. K. and Pearlmutter. Quality. S. D. B. cross-validation and generalized cross validation: discrete index set. Lee. and Xu. P. © 2006 by Taylor & Francis Group. (2002). Math. Krzykacz. Q. (1993). 958–975. Master’s thesis. Li. Garching. 213–225. Lab. (2005). K. (1990). Reliability. Li. San Mateo. T. Statist. Columnwise-pairwise algorithms with applications to the construction of supersaturated designs. Tse. Simard. Sec. J. F. Y. Li. Gesellschaft fur Reaktorsicherheit (GRS) mbH. D. W. Automatic learning rate maximization by on-line estimation of Hessian’s eigenvector. J.. Lin. Yuen. A. in S. Fiz. L. Society of Automative Engineers. in eds. Y. Generating systematic supersaturated designs. K. Germany. C. Z. G. J.. Advances in Neural Information Processing System 5. T. (1987). 15. Cowan. Technometrics 47. Chemom. Model selection for analysis of uniform design and computer experiment. R. J. 4. (1999). and Zhang. J. 305– 315. Samos: a computer program for the derivation of empirical sensitivity measures of results from large computer models.. An example of a sequential uniform design: Application in capillary electrophoresis..References 273 of optimal coeﬃcients. and C. Fang. A new class of supersaturated designs.. Ann. S. J. Asymptotic optimality for Cp . Inter. (2005). Chan. F. Systems 39. D. H. SAE paper 2005-01-1397. 111– 120. 11–18. 171–179. Lin. (2001). Morgan Kaufmann. and Safety Engineering 9. R. Lin. Astr. (1995).. Y. Systems 58. Z. W. D. and Wu. Uniform design and its applications in chemistry and chemical engineering. F. (1997). K. CA. D. P. A. A statistical approach to some mine valuations and allied problems at the Witwatersrand. Sudjianto. Intell. Li. A. 28–31. Liang. Y. R. Modeling computer experiments with functional response. K. K. Technometrics 35.. (1997). K. Hanson. Vestnik Moskow Univ. 43–57. B. Recent developments in supersaturated designs. Analysis of computer experiments using penalized likelihood in Gaussian Kriging models. University of Witwatersrand. LLC . Krige. Chemom.

Applications of the uniform design to quality engineering. 195–199. C. K. J. in C. Supersaturated designs with more than two levels. Lo. J. Plann. J. C. R. J. X. Semiparametric and nonparametric regression analysis of longitudinal data (with discussions). and Lin. Inter. (1973). X. 85–93. 133– 166.. and Fang. Y. A note on generalized aberration in factorial designs. Lin. Ser. Society of Automative Engineers. (1998). (1979). Statist. S. A comparison of three methods for selecting values of input variables in the analysis of © 2006 by Taylor & Francis Group. (2001). M. 931–939. McKay. 661–675. W. Lu. Chinese Statist. C. J. Dordrecht. (2001). 38. Kluwer. Complexity 17. Some comments on Cp . Matheron. LLC . Fang. (2002). ed. Nuclear Regulatory Commission and Los Alamos National Laboratory. Y. M. Statist. P. Chapter 18. (2000). and Hickernell. 323–334. Principles of geostatistics.274 Design and Modeling for Computer Experiments Park and G. T. W. (2004). J. Q. Ma. Technometrics 15. G. Z. Inference 113.. G. eds. J. A new approach to construction of nearly uniform designs. D. D. Micelli. 1246–1266. J. D. D. Y. and Lin. Statist. Inference 86. X. Bishop. Evaluating prediction uncertainty. J. Introduction to gaussian processes. A new method in the construction of two-level supersaturated designs. G.. Beckman. and Conover. D. T. Marron. X. Econm. X. 229–238. NATO ASI Series pp. Liu. Vining. Technical Report NU REG/CR-6311.. Marcel Dekker. (1995). and Sun. K. 86–97. Assoc. K. Loibnegger. Statistical Process Monitoring and Optimization. and Nolan. M. Math. J. (1988). On isomorphism of factorial designs. Zhang. B 22. K. C. Fang. (2002). Y. and Turino. D.. (2001). Mallows. Bernard. C. E(s2 )-optimality and minimum discrepancy in 2-level supersaturated designs. D. 69–73. Materials and Product Technology 20. K. and Han. Ma. J. Rainer. A note on uniformity and orthogonality. An integrated numerical tool for engine noise and vibration simulation. X.. L. Chinese Ann. Statist. D. T. (2000). M. Neural Networks and Machine Learning. and Ying. J. L. T. (1963). K. J. C. Assoc. G. McKay. F. Metrika 53. Ma. (2001). 58. Amer. (1997). 103–126.. and Fang. Geol. New York. Plann. and Meng. Statistica Sinica 12. 96. K. Lu. SAE paper 971992. Letters 7. J. MacKay. 115–126.. Ma. Canonical kernels for density estimation. 411–428. & Probab. B.. M.

Neural Comput. Design problems for optimal surface interpolation. Q. and Ye. 83.. Analysis of functional responses from robust design studies. J. A. D. Ziegler. (1993). 677–696. Qual. J. J. Amer. J. 34. Statist. D. New York. Moody. and Wahba. Mitchell. (2002). Miller. Morris. Design and Analysis of Experiments. LLC .. (1995). CA. C. (1996). Exploratory design for computational experiments. Micchelli. Myers. (1983). Niederreiter. K. Theory Prob. Statist. Chichester. (2002). Inference 43. 9. J. C. Belmont. Mitchell. J. A. (1999). Irwin. Miller. Inter. Neter. (1989). Subset Selection in Regression. R. Chem. (1992). H. and Beauchamp. On estimating regression. C. Nair. in Z. Approximation Theory and Applications Academic Press. and Mitchell. 1. J. Morris. Philadelphia. 2nd ed. C. A. Contr. SIAM CBMS-NSF Regional Conference. D. Montgomery. J. Plann. John Wiley & Sons. Technometrics 21. 1023–1036. C. (1981). Classical and Modern Regression with Applications. 141– 142. Applied Mathematics. J. 4th ed. Sensitivity analysis and parameter estimation in dynamic modeling of chemical kinetics. H. 275 Melchers. Kutner. Appl. H. 2nd ed. T. © 2006 by Taylor & Francis Group. New York. 239–245. and Montgomery. (1988). W. W. D. eds.References output from a computer code. Applied Linear Statistical Models. Kinetics 15. Duxbury Press. E. (1995). John Wiley & Sons. Technometrics 35. Chicago. Bayesian design and analysis of computer experiments: Use of derivatives in surface prediction. and Wasserman. Structural Reliability Analysis and Prediction. R. 355–369.. and Frenklach. M. J. (2001). Response Surface Methodology: Process and Product Optimization Using Designed Experiments. J. (1964). Myers. V. J. T. and Ylvisaker. H. Nadaraya. Assoc. D. D. (1990). Chapman & Hall/CRC. Bayesian variable selection in linear regression (with discussion).. 243–255. Nachtsheim. T. M. Taam. 381–402. Wiley. Fast learning in networks of locally-tuned processing units. Random Number Generation and Quasi-Monte Carlo Methods. England. R. New York. and Darken. 281–294. London. M. G. J. M.

eds. J. A. D. (1992b). New York. O. Qin. Cambridge University Press. Metrika 60. C. N. Growth Curve Models and Statistical Diagnostics. A. and Pourhassan. J. and Flannery. M. Lattice sampling revisited: Monte Carlo variance of means over randomized orthogonal array. B. ASME J.D thesis. Randomly orthogonal arrays for computer experiments. J. W. 89. and Thompson. Owen. Schamel. and Silverman. Pukelsheim. 34. Cambridge. M. in J. A. Optimal Latin-hypercube designs for computer experiments. Powell. Monte Carlo variance of scrambled net quadrature. A central limit theorem for Latin hypercube sampling. Oxford.. 143–167. A. Statist. Ann. H. LLC . (1993). Functional Data Analysis. 59–72. and Fang. 22. Owen. B. B. K. SIAM J. 541–551. Assoc. (1992a). Anal. © 2006 by Taylor & Francis Group. Patterson. Amer. Pan. B. H. Statistica Sinica 2. Englewood Cliﬀs. Prentice Hall. UK. S. integration and visualization. Wiley. 2nd ed. Philips. and Meyer. Controlling correlations in Latin hypercube samples. C. Statist. B 54. Radial basis functions for multivariable interpolation: a review. Construction of uniform designs and usefulness of uniformity in fractional factorial designs. (1994b). Recovery of inter-block information when block sizes are unequal.. K. Cox. D. A. J. SAE paper 890619. A. W. Ser. 95–111. J. T. J. T. Owen. 1884–1910. Press. Statist. Optimum Design of Experiments. X. (1971). 1517–1522. Mech. Ramsay. (1993). Society of Automotive Engineers. (1994a). H. B. Mason and M. A. B. Biometrika 58. Hong Kong Baptist University. Park.. Clarendon Press. H. (1989).. (1987). Springer. Vetterling. Ph. P. (2002). A general approach for robust optimal design. P. Owen. A. Sorensen. Inference 39. (1997). Algorithm for Approximation pp. Springer. J. Discrete discrepancy in factorial designs. B. (1997). Teukolsky.-S. Soc. (1989). R. and Fang. 439–452. (1994). F. J. Parkinson. W. New York. (2004). (1992). Plann. NJ. T. G. R. Qin.276 Design and Modeling for Computer Experiments Owen. Quality Engineering Using Robust Design. An eﬃcient model for valvetrain and spring dynamics. Royal Statist. 545–554. Numer. Numerical Recipes in C: The Art of Scientiﬁc Computing. 74–80. (2002). Design 115. Phadke. J. 930–945. New York.

D. Cambridge. M. Selecting the number of knots for penalized splines. Sack. Ruppert. T. J. Sacks. (1984). Designs for computer experiments. Wiley. Development of © 2006 by Taylor & Francis Group.. Statistica Sinica 4. Reedijk. and Ylvisaker. Hinton. & New Zealand J.. P. lecam and R. and Ylvisaker. H. Austral. 735–757. B. J. Technometrics 31. Schwabe. J. D. Satoh. and Notz. Riccomango. (2000). D. J. Frames with block size four.. Mitchell. E. (2000). Wadsworth. Theory of Reproducing Kernels and Its Applications. 12. in D. and Rao. Model robust design in regression: Bayes theory. D. 667–679. Y. (1973). J. Saltelli. II pp. B.. Rees. (1988). C. Proceedings of the Berkeley Conference in Honor of Jerzy Neyman and Jack Kiefer. Statist. M. G. 11. Statist. IEEE Trans. Sacks. 2313–2317. Williams. Canad. England. Lattice-based Doptimal design for Fourier regression. Wiley. Essex. and Carroll. A. and Williams. (1991). (1989). (2003). R. 25. G. C. I. on Computer-aided Design 10.L.. Rumelhart.. P. Graph.. M. E. R. and Scott. Rumelhart. Santner. (1986). and Wynn. Vol. Design and analysis of computer experiments (with discussion). Ruppert. (2000). 1030. I. J. Saitoh. 42. and the PDP Research Group. S. J. in L. Ishikawa. T. S. 409–435. W. New York. Linear Statistical Inference and its Applications. B. Learning internal representations by error propagation. Combinatorial optimization by stochastic evolution. New York. K. T. R. R. Sensitivity analysis of model output: Performance of various local and global sensitivity measures on reliability problems. (2000). Olshen. W. Some model robust designs in regression. 205–223. New York. and Welch. and Matsuoka. (1997). 1324–1348. (1992). LLC . 41–47. R. Welch.References 277 Rao. Saab. J. J. Ann. D.. Belmont. K. MIT Press. and Wynn. CA. J. Schiller. 525–535. Statist. Comput. Chan. Y. H. (1989). Master’s thesis. T. and Stinson. MA. (2002). Sacks. J. eds. Sensitivity Analysis. (1985). Mcclelland. D.. Delft University of Technology. J. A. Longman Scientiﬁc and Technical. Parallel Distributed Processing: Explorations in the Microstructure of Cognition. Spatially-adaptive penalties for spline ﬁtting. E. 44. J. The Design and Analysis of Computer Experiments. Math. W. Springer. Kawai.

J. Metamodels for computer-based engineering design: survey and recommendations. J. 187–193. G. and Comp. © 2006 by Taylor & Francis Group. Ann. 413–436. T. I. Shibata. Large sample properties of simulations using Latin hypercube sampling. B. P. C. Statist. Society of Automobile Engineering. Kernel smoothing in partial linear models. Statist.. Simpson. Schwarz. Math. N. Ann. C. N. 898–916. B. (1988). (1989). o Manuscript. 689–705. (2001). Statist. (1988). Theorems and examples on high dimensional model representation. and Sch¨lkopf. (2001). M. A tutorial on support vector regression. J. M. 12. Nonlinear Regression. D. Silverman. J. I. Stein. in Simulation 55. Engineering with Computers 17. 859–914. R. A quasirandom approach to integration in Bayesian statistics. A. F. Approximation eﬃciency of a selection procedure for the number of regression variables. H. Additive regression and other nonparametric models. 143–151. J. G. (1987). P. 461– 464. Peplinski. Shaw. 407–414. (1985). (1993). Ann. J. M. W. J. Royal Statist. New York. and Wynn. Sensitivity analysis for nonlinear mathematical models. Simpson. A. Global sensitivity indices for nonlinear mathematical models and their Monte Carlo estimates.-J. M. (1998). Spline smoothing: the equivalent variable kernel method. (2000). 271–280. Maximum entropy sampling. Statist. Stone. A. LLC . 16. (1978). 43–49. J. in AIAA/ISSMO Symposium on Multi-disciplinary Analysis and Optimization 9. Estimating the dimension of a model. M. 165–170. (1984). (2003). K. Atlanta. P. Shewry. W.. Smola. Giunta. Booker. Ser.278 Design and Modeling for Computer Experiments method for predicting eﬃciency of oil mist separators. 13. W. H. 14. SAE Paper 2000-01-1234. Approximation methods in multidisciplinary analysis and optimization: A panel discussion. B 50. Mathematical Modeling and Computational Experiment 1. Koch. Reliability Engneering & System Safety 79. Ann. I. Sobol’. T. Technometrics 29. and Allen. J. Soc. (1984). Speckman. A. C. D.. Appl.. (1987). Biometrika 71. Seber. and Yang. and Wild. Ghosh. P.. 6. A. Koch. Sobol’. Wiley. E. 129–150.. Statist. Sobol’. R.

J. Computer-aided reliability and robustness assessment. G. Ph. Polynomial splines and their tensor products in extended linear modeling. Wahba. G. Y. Tang. Tu.. Soc. 1371–1470. Kissimmee Florida. Structural reliability optimization using an eﬃcient safety index calculation procedure. (1998). Y. (2003). Inter. G. J. 181–193. Inter. Tu. 88. and Safety 5. and Vora. A new study on reliability-based design optimization. A note on uniform distribution and experimental design. G. 1721–1738. Y. Taguchi. Juneja. K. Statist. Reliability. and Jones. Soc. ASME J. Ser. B.. and Hopkins. Improper priors. (1993). R. Variable screening in metamodel design by cross-validated moving least squares method. T. Agrawal. SIAM. Royal Statist. and Material Conference AIAA2003-1669. R. spline smoothing and the problem of guarding against model errors in regression. Hong Kong Baptist University. (2003). Technical Report Math-404. and Fang. Orthogonal array-based Latin hypercubes. Sudjianto.. Hong Kong Baptist University. Amer. (2005). Virginia. Grandhi. Choi. D. X. Tang. (1997). Numer. J. H. New York. KeXue TongBao 26. (1999). Mech. J. (1981). L. 2003. Wang. (1995). 267–288. Extensible uniform designs for computer experiments. Wang. in Proceedings of the IMAC-XXI: A Conference on Structural Dynamics Feb. K. H. J. (1993). (1978). B 40. and Young. Assoc. Statist. J. 25. LLC .. Tibshirani. 364–372. Ann. Methods in Engineering 38. (1996). (1990).D thesis. C. Combinatorial properties of uniform designs and their applications in the constructions of low-discrepancy designs. Royal Statist. D. Wang. J. Cross-validated multivariate metamodeling methods for physics-based computer simulations. T. M. L. J. Norfolk. Taguchi. M. ASME Press. Kooperberg. Asian Production Organization. A.References 279 Stone. Tokyo. Philadelphia. (1986). J. and Truong. V. Introduction to Quality Engineering. C. H. and Fang. Hansen. K. A. Taguchi on Robust Technology Development: Bringing Quality Engineering Upstream. K. Quality. Wahba. Regression shrinkage and selection via the lasso. 3–6. B 58. 1392–1397. 485–489.. © 2006 by Taylor & Francis Group. P. Spline Models for Observational Data. 2003. in The 44th AIAA/ASME/ASCE/AHS Structural Dynamics. 557–564. Tu. Engineering 121. K. April 7–10. R.. (2005). Ser.

and Fang. 661–669. John Wiley & Sons. Springﬁeld. (1998). and Hamada. 13–21. P. C. H. J. P. G. (1938). (1987). and Fang. Smooth regression analysis. and Wang. Statist. 889–896. J. ¨ Weyl. Analysis. Worley. Y.-T. SIAM Numer. C. T. S. Hellekalek. New York. USA. Analysis 34. Wu. (1998). 313–352. Zinterhof and P. Integrated Design and Process Sciences 2. K. (1916). Chichester. 5285 Port Royal Road. Qual. New York. T. and Fang. K. (2000). K. 2038–2042. Wiens. 217–221. Uber die Gleichverteilung der Zahlem mod Eins. & Probab. Warnock. F. Y. Niederreiter. Academic Press. Springer. Winker. (1995). T. Eﬃcient probabilistic design by converting reliability constraints to approximately equivalent deterministic constraints. Wu. Letters 12. D. Sankhy¯ Ser. Biometrika 80. New York. Watson. Am. Wu. (1997). M. National Technical Information Service. W. Computational investigations of low discrepancy point sets. J. Wiley & Sons. Experiments: Planning. C. C. O. Zaremba. S. (1998). Amer. (1985). (1963).. 359– a 372. 436–448. J. VA 22161. T. P. 2 edn. Winker. Science in China (Series A) 39. J. 93. Optimization Heuristics in Econometrics: Applications of Threshold Accepting. R. Weisberg. Designs for approximately linear regression: Two optimality properties of uniform designs. J. Winker. P. H. 27. 226–241. Wiley. Math. Statist. (1972). K. 319–343. Chiang. 77. New York. T. Wu. Optimal u-type design. Mean motion. (1996). Application of threshold accepting to the evaluation of the discrepancy of a set of points. Assoc. in H. Asymptotic conﬁdence regions for kernel smoothing of a varying-coeﬃcient model with longitudinal data. Tech. 264–275. Y. Applications of Number Theory to Numerical Analysis pp. Ann. and Hoover. © 2006 by Taylor & Francis Group. Deterministic uncertainty analysis. A. F. in ‘ORNL—0628’. D. D.. (2001). K.280 Design and Modeling for Computer Experiments Wang. T. Construction of supersaturated designs through partially aliased interactions. K. and Parameter Design Optimization. Lin. 1388–1403. T. in S. (1993). (1991). Monte Carlo and Quasi-Monte Carlo Methods 1996 pp. Uniform design of experiments with mixtures. J. A 26. Wang. and Fang. B. Applied Linear Regression. J. Designing outer array points. Math. Weyl. LLC . 60.

A. Assoc. A. Amer. Yamada. Yue. Ye. Multidisciplinary design optimization of a full vehicle with high performance computing. K. (2001). R. and Fang. 93. Z. J. Gu. Q. Statist. 155–166.. J. 101–111. (2002).-J. (2003). Zou. Statist. S. LLC . Letters 45. F. X. J. Zhang. R. J... and Sudjianto. Reliability Engineering and System Safety 78. C. Inference 83. and Meernik. Y. 2001. Statist. T. S. (2001). Plann. in press. J. J. 33. W. Inference 90.. American Institute of Aeronautics and Astronautics. Orthogonal column latin hypercubes and their application in computer experiments. M. Q. Statistica Sinica 9. Q.. Mahadevan. R. J. Statist. T. R. 1430–1439. Intell. T.. April 16-19. Z. Yang. Mourelatos. (1998). (2000). Li. © 2006 by Taylor & Francis Group. 31–39. and Sobieszczanski-Sobieski. 3-level supersaturated designs. Xu. Statist. P.. Admissibility and minimaxity of the uniform design in nonparametric regression model. Seattle. K. S. Statist.. A comparison of random and quasirandom points for nonparametric response surface design. Liang. & Probab. Letters 52. X. Yue. D. 1053–1069. Systems 52. A. WA. Ann. and Hickernell.. AIAA2001-1273. 129–142. (2005). (2000). Chemom.References 281 Xie. (1999). Ye. H. 315–324. and Sudjianto. 145– 159. and Fang. K. Tho. The eﬀects of diﬀerent experimental designs on parameter estimation in the kinetics of a reversible chemical reaction. Robust optimal designs for ﬁtting approximately liner models. Y. (1999). & Probab. K. Algorithmic construction of optimal symmetric latin hypercube designs.. Lab. Majorization framework for balanced lattice designs. Reliability analysis of automotive body-door subsystem. Fang. Plann. and Lin. L. Li. T.

Sign up to vote on this title

UsefulNot useful- Adaptive Design
- Fast Decoupled Load Flow the Hybrid Model
- Coherent Stress Testing
- AoE RPG Workshop - Basic Statistics for Research
- Edward Greenberg - Introduction to Bayesian Econometrics (2007)
- Bayesian Framework For Concept Learning (Tenenbaum Thesis)
- BitDefender Security for Mail Servers UNIX v3 Userguide
- 0521454166
- Boo Lean Complexity
- Missing Data Analysis in Practice by Trivellore Raghunathan [Dr.soc]
- Michael Spivak Calculus(1994)
- Cogitations
- An Introduction to Wavelet Analysis - Walnut
- Protection Theory of Power System
- DK and NF
- TechnicalGuideBook en 3AFE64514482 RevG
- Physics by Example_ 200 Problems and Solutions
- How Professionals Mak Decisions
- simple physics for engineering and physical science.pdf
- Foundations of Decision-Making Agents
- Cryptography_-_George_Bull.pdf
- 42871_1933904062
- Lixij.benefitRisk.assessment.in.Pharmaceutical.research.and.Development
- BayesBookWeb
- The Search for Certainty - On the Clash of Science and Philosophy of Probability April 2009
- Bayes or Bust
- Data Mining Using Grammar Based Genetic Programming and Applications - Wong, Cheung
- Logical Data Analysis
- David R. Brillinger Time Series Data Analysis and Theory 2001
- Artificial Intelligence%2C 6th Edition %280321545893%29
- Experemental Design