New Text
Advanced Excel® for Scientific Data Analysis
This page intentionally left blank
Advanced Excel® for Scientific Data Analysis
Robert de Levie
OXFORD
UNIVERSITY PRESS
2004
OXFORD
UNIVERSITY PRESS
Oxford New York Auckland Bangkok Buenos Aires Cape Town Chennai Dar es Salaam Delhi Hong Kong Istanbul Karachi Kolkata Kuala Lumpur Madrid Melbourne Mexico City Mumbai Nairobi Sao Paulo Shanghai Taipei Tokyo Toronto
Copyright © 2004 by Robert de Levie
Published by Oxford University Press, Inc. 198 Madison Avenue, New York, New York 10016
www.oup.com
Oxford is a registered trademark of Oxford University Press All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form or by any means, electronic, mechanical, photocopying, recording, or otherwise, without the prior permission of Oxford University Press. Library of Congress CataloginginPublication Data De Levie, Robert. Advanced Excel for scientific data analysis I Robert de Levie. p.cm. Includes index. ISBN 019517089X (cloth); 0195152751 (pbk.) I. Chemistry, Analyticr. ~a processing. 2. Electronic spreadsheets. 3. Microsoft Excel (Computer file) l. Title. QD75.4.E4D432003 530'.0285dc21 2003053590 Disclaimer Neither the author nor the publisher of this book are associated with Microsoft Corporation. While Oxford University Press takes great care to ensure accuracy and quality of these materials, all material is provided without any warranty whatsoever, including, but not limited to, the implied warranties of merchantability or fitness for a particular purpose. Excel and Visual Basic are either registered trademarks or trademarks of Microsoft Corporation in the United States andlor other countries. The product names and services are used throughout this book in editorial fashion only and for the benefit of their companies. No such use, or the use of any trade name, is intended to convey endorsement or other affiliation with the book.
9 8 7 6 5 432 1 Printed in the United States of America on acidfree paper
Preface
This book will take you, my reader, beyond the standard fare of Excel. This is why the title starts with the word "advanced". This book is not a primer, and familiarity with Excel is presumed. You will learn how to make the spreadsheet do your bidding, not so much by prettying up its display, but by exploiting its considerable computational prowess to the fullest, and by adding to it with custom functions and custom macros where necessary. If Excel's builtin least squares facilities don't provide the covariance, don't handle statistical weights, don't supply orthogonal polynomials, or lack special tools for equidistant data, this book will show you how to make those tools. If Excel's fast Fourier transform routine is cumbersome, replace it, and go from there to perform timefrequency analysis. If you want to use the RungeKutta method, write a custom function to do so. If you need a deconvolution, there are several macros to perform it, in addition to the direct spreadsheet approach. The focus of this book is on the numerical analysis of experimental data such as are encountered in the physical sciences. Data analysis is nowadays often performed with one of two approaches, least squares or Fourier transformation, which therefore form the core of this book, occupying chapters 2 through 6. Sometimes, theory does not furnish explicit expressions for our models, in which case the experiments must be compared with the results of numerical simulations, as briefly discussed in chapter 7. Then follows a short discussion of macros, while the final chapters round out the book with an annotated tour of its custom macros. The material is illustrated with practical examples. In cases where the background of some of the methods used may be hard to find, short explanations have been included. Throughout this book, the objective is to make math a convenient scientific tool rather than an obstacle. You should know what a square root means, and have access to a tool (such as a table oflogarithms, a slide rule, a calculator, or a computer) to find it, rather than have to learn (as yours truly once did) how to evaluate it by hand, with pencil and paper. That, incidentally, turned out to be a thoroughly useless skill, and was promptly forgotten. It is useful as well as intellectually satisfying to know how to design an engine, but it is not needed for safe driving. In the same sense, you need not know all theorems, conjectures, and lemmas underlying your mathematical tools in order to reap their benefits, as long as you understand what you are doing. Where, nonetheless, math is displayed in this book, often at the beginning of a chapter or section, it is used as a convenient shorthand for those who can read its precise and compact language, but it seldom requires you to execute the corresponding mathematical operation. In other words, you can skip the math if, otherwise, it would scare you away. On second reading, the math may not even look so frightening any more. At any rate, there are many more figures in this book than equations. Books are as much defined by what they are not as by what they are, and a prospective reader should know the score. This book offers no templates, since the idea is not to provide canned solutions but, instead, to illustrate how solutions can be created. While the macros listed in this book have a fairly general usefulness and applicability, they are primarily meant as examples, and you are encouraged to
*****
R. de Levie, Advanced Excel for scientific data analysis
modifY them for your own purposes, and even to scavenge them for useful ideas and parts. Furthermore, this book is neither an introduction to Excel or VBA, nor a textbook on the mathematical basis of scientific data analysis. There are already some good introductions to scientific uses of Excel on the market, and this book will build on them. There are also numerous books on VBA (which stands for Visual Basic for Applications, the computer language used in Excel custom functions and macros) that go into much more detail than could possibly be incorporated here, and many excellent books on statistics, on Fourier transformation, and on numerical simulation, the three main scientific applications discussed in the present book. Recommended books on each of these subjects are listed at the end of the relevant chapters. What the present book offers instead is an attempt at synthesis of these various areas, illustrating how many numerical problems can be fitted comfortably in the convenient, userfriendly format of the spreadsheet. As such, this book should be suitable for use by any scientist already familiar with Excel. Because it retains its primary focus on science, it can also be used as text for an introductory course in scientific data analysis, especially when combined with student projects. While an effort has been made to make this book as broadly useful as possible, and to incorporate examples from different areas, my own background as a physical and analytical chemist will unavoidably show. Readers who are not chemists will still recognize the general, more widely applicable approach and features involved in many of these examples.
*****
Idiosyncratic notation has been kept to a minimum, with three exceptions. The notation 2 (3) 17 is used as convenient shorthand for the arithmetic progression 2, 5, 8, 11, 14, 17 (i.e., starting at 2, with increment 3, ending at 17). The linking symbol u is used to indicate when keys should be depressed simultaneously, as in AltuFll or CtrluAltuDel. (Since the linking sign is not on your standard keyboard, you will not be tempted to press it, as you might with a plus sign.) And the symbol 0 will identifY deconvolution, complementing the more usual symbol <8l for convolution.
*****
This book can be read at several levels. It can serve as a brief, illustrated introduction to least squares, Fourier transformation, and digital simulation, as used in the physical sciences. For those interested in simply using its macros (which provide a useful set of auxiliary tools for solving a few standard scientific problems on a spreadsheet), it illustrates their modes of operation, their strengths, and their deficiencies. And for those who want the spreadsheet to solve other scientific problems, the fully documented macros can serve as examples and possible starting points for novel applications. Here is how this book is organized. After the introduction, three chapters are devoted to least squares methods, used here almost exclusively as a datafitting tool. Least squares methods are nowadays used routinely for describing experimental data in terms of model parameters, for extracting data from complex data sets, for finding their derivatives, and for a host of other manipulations of experimental data, and chapters 2 through 4 illustrate some of these applications. The guiding principle has been to relegate most of the mathematical manipulations to macros and, instead, to focus on how to use these tools correctly.
Preface
Then follows a chapter on Fourier transformation, a cornerstone of modem data analysis as well as of modem scientific instrumentation, and a companion chapter on methods for handling related problems, such as convolution, deconvolution, and timefrequency analysis. Next is a chapter on the numerical solution of ordinary differential equations. All of these can be, and are, valid topics of entire books and treatises, and we here merely scratch the surface and sniff their smells. The final chapters get the reader started on writing Excel macros, and provide a number of specific examples. Readers of my earlier book on this subject, How to Use Excel in Analytical Chemistry, Cambridge University Press, 2001, will of course find some inevitable overlap, although in the present volume I have restricted the topics to those that are of most general interest, and treated these in much greater depth. Only relatively few owners of Microsoft Office realize that they have on their computer a modem, compilable, highlevel language, VBA, ready to be used, a powerful computational engine raring to go, complete with the associated graphical tools to visualize the results. With so much power under the hood, why not push the pedal and see how far you can go?
*****
Numerous friends, colleagues, and students have contributed to this book, corrected some of its ambiguities, and made it more intelligible. I am especially grateful for invaluable help on many occasions to Bill Craig; for their many helpful comments, especially on the chapters on least squares, to Whitney King, Panos Nikitas, Carl Salter, and Brian Tissue; for commenting on the chapter on Fourier transformation to Peter Griffiths and Jim de Haseth; for valuable comments on deconvolution to Peter Jansson; for letting me use his elegant equidistant least squares macro to Philip Barak; and for sending me experimental data that are so much more realistic than simulations to Harry Frank, Edwin Meyer, Caryn Sanford Seney, and Carl Salter. I gladly acknowledge the various copyright holders for permission to quote from their writings or to use their published data, and I am grateful to William T. Vetterling of Numerical Recipes Software for permission to incorporate some programs from the Numerical Recipes in the sample macros. As always, my wife Jolanda helped and supported me in innumerable ways.
*****
This book was printed from files made on a standard personal computer. All text was written in Word; all figures (including those on the front and back cover) were made with Excel. Special graphing software was neither needed nor used. If so desired, you can read this book by restricting yourself to the passages printed in (relatively) large type, and the figures, even though that would be somewhat like learning to swim or ride a bicycle from a correspondence course. The only additional ingredient you will need for some of the exercises (in smaller print), apart from a computer with Excel version 5 or (preferably) later, is the set of custom macros in the MacroBundle, which can be downloaded as Word text files from the web site oupusa.org/advancedexcel, and are listed in chapters 9 through 11. These macros are most conveniently accessible through an extra toolbar or, where viewing space is at a premium, as a menu item. The above web site also contains a SampleData file so that you need not type in the numerical values of the examples, and a SampleMacros file from which you can copy the macros and functions used in the exercises, and even a short "Getting up to speed" exercise to help you recall your Excel skills. The software requirements are spelled out in section A.9
R. de Levie, Advanced Excel for scientific data analysis
It is wellnigh impossible to write a book of this type and length without some typos and even outright errors, and the present volume will be no exception. I will be grateful to receive comments and suggested corrections at my email address: rdelevie@bowdoin.edu. I intend to post corrections and updates on the above web site.
Copyright credits
The following copyright holders graciously provided permission to use data or verbatim quotes. Data from Y. Bard in Nonlinear Parameter Estimation, copyright © 1974, Academic Press, are used by permission. Data from L. M. Schwartz & R. 1. Gelb are reprinted with permission from Anal. Chem. 56 (1984) 1487, copyright 1984 American Chemical Society. Likewise, data from 1. 1. Leary & E. B. Messick are reprinted with permission from Anal. Chem. 57 (1985) 956, copyright 1985 American Chemical Society. Data from R. D. Verma published in J. Chem. Phys. 32 (1960) 738 are used with permission of the American Institute of Physics. Data from W. H. Sachs are reprinted with permission from Technometrics 18 (1976) 161, copyright 1976 by the American Statistical Association, all rights reserved. Data from G. N. Wilkinson in Biochem. J. 80 (1961) 324 are reproduced with permission, © the Biochemical Society. Data from G. R. Bruce & P. S. Gill in J. Chem. Educ. 76 (1999) 805, R. W. Schwenz & W. F. Polik in J. Chem. Educ. 76 (1999) 1302, S. Bluestone in J. Chem. Educ. 78 (2001) 215, M.H. Kim, M. S. Kim & S.Y. Ly in J. Chem. Educ. 78 (2001) 238, are used with permission from the Journal of Chemical Education; Division of Chemical Education, Inc. Permission to quote data from E. S. Eppright et aI., World Rev. Nutrition Dietetics 14 (1972) 269 was granted by its copyright holder, S. Karger AG, Basel. Data from the 2000 book Modern Analytical Chemistry by D. Harvey are reproduced with permission of The McGrawHill Companies. Finally, material from N. R. Draper & H. Smith in the 2 nd edition of their book Applied Regression Analysis, copyright © 1998; from D. M. Bates & D. G. Watts, Nonlinear Regression Analysis and Application, copyright © 1988; and from K. Conners, Chemical Kinetics, the Study of Reaction Rates in Solution, copyright © 1990; is used by permission of John Wiley & Sons, Inc.
About the author
Robert de Levie is the author of more than 150 papers in analytical and electrochemistry, of an early Spreadsheet Workbook for Quantitative Chemical Analysis, McGrawHill, 1992; of a textbook on the Principles of Quantitative Chemical Analysis, McGrawHill 1997; of an Oxford Chemistry Primer on Aqueous AcidBase Equilibria and Titrations, Oxford University Press, 1999; and, most recently, of How to Use Excel in Analytical Chemistry, Cambridge University Press, 2001. He can be reached at rdelevie @ bowdoin.edu. He was born and raised in the Netherlands, earned his Ph.D. in physical chemistry at the University of Amsterdam, was a postdoctoral fellow with Paul Delahay in Baton Rouge LA, and for 34 years taught analytical chemistry and electrochemistry at Georgetown University. For ten of those years he was the US editor of the Journal of Electroanalytical Chemistry. Now an emeritus professor, he lives on Orr's Island, and is associated with Bowdoin College in nearby Brunswick ME. He can be reached at rdelevie@Bowdoin.edu.
Contents
1
1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.7.1 1.7.2 1.7.3 1.7.4
1.7.5
Survey ofExcel
Spreadsheet basics Making 2D graphs Making 3D surface graphs Making surface maps Making movies Printing, copying, linking & embedding Setting up the spreadsheet Data Analysis Toolpak Solver VBA Help File Additional macros Additional files Commercial tools Choosing the default settings Importing data Error messages Help Functions, subroutines & macros Custom functions Custom subroutines & macros An example: interpolation Handling the math Complex numbers Matrices Handling the funnies The binomial coefficient The exponential error function complement Algorithmic accuracy Mismatches between Excel and VBA Summary F or further reading
1
1
4
10
1.7.6 1.7.7 1.8 1.9 1.10 1.11 1.11.1 1.11.2 1.12 1.13 1.13.1 1.13.2 1.14 1.14.1 1.14.2 1.15 1.16 1.17 1.18
13 16 18 20 20 20 21 21 22 22 23 25 25 26 26 27 28 29 37 37 38 40 40 41 44 49 51 52
2
2.1
Simple linear least squares
Repeat measurements
53
54
R. de Levie, Advanced Excel for scientific data analysis
2.2 2.3 2.4 2.5 2.6 2.7 2.8 2.9 2.10 2.11 2.12 2.13 2.14 2.15 2.16 2.17 2.18
Fitting data to a proportionality LinEst Regression LS Trendline Fitting data to a straight line Simple propagation of imprecision Interdependent parameters Centering Extrapolating the ideal gas law Calibration curves Standard addition The intersection of two straight lines Computing the boiling point of water Phantom relations Summary For further reading
56 58 60 62 64 65 66 68 71 76 80 83 86 91 92 96 97
98
3
3.1 3.2 3.3 3.4 3.5 3.6 3.7 3.8 3.9 3.10
Further linear least squares
Fitting data to a polynomial Fitting data to a parabola The iodine vapor spectrum The intersection of two parabolas Multiparameter fitting The infrared spectrum ofH35 CI Spectral mixture analysis How many adjustable parameters? The standard deviation of the fit The Ftest Orthogonal polynomials 3.11 3.12 Gaschromatographic analysis of ethanol 3.13 Raman spectrometric analysis of ethanol 3.14 Heat evolution during cement hardening 3.15 Least squares for equidistant data 3.16 Weighted least squares 3.17 An exponential decay 3.18 Enzyme kinetics 3.19 Fitting data to a Lorentzian 3.20 Miscellany 3.20.1 The boiling point of water 3.20.2 The vapor pressure of water
98 99 100 104 107 107
111
113 115 115
117
122 125 131 135 140 144 144 148 150 150 151
Contents
3.20.3
3.21 3.22
Fitting data to a highorder polynomial Summary F or further reading
151 153 156
158
4
4.1 4.2 4.3 4.4 4.5 4.6 4.7 4.8 4.9 4.10 4.11 4.12 4.13 4.14 4.15 4.16 4.17 4.18 4.19 4.20
4.20.1 4.20.2 4.20.3 4.20.4 4.20.5
Nonlinear least squares
Cosmic microwave background radiation The 12 potential energy vs. distance profile Titrating an acid with a strong base Conductometric titration of an acid mixture Fitting a luminescence decay Fitting a curve with multiple peaks Fitting a multicomponent spectrum with wavenumbershifted constituents Constraints Fitting a curve through fixed points Fitting lines through a common point Fitting a set of curves Fitting a discontinuous curve Piecewise fitting a continuous curve Enzyme kinetics, once more The Lorentzian revisited Linear extrapolation Guarding against false minima General least squares fit to a straight line General least squares fit to a complex quantity Miscellany Viscosity vs. temperature and pressure Potentiometric titration of a diprotic base Analyzing light from a variable star The growth of a bacterial colony Using NIST data sets Summary F or further reading
161 165 169 176 180 182 187 192 193 194 198 201 203 205 206 207 208 213 217 219 219 221 224 225 226 227 229
230
4.21 4.22
5
5.1 5.2 5.3 5.4 5.5 5.6 5.7
Fourier transformation
Sines and cosines Square waves and pulses Aliasing and sampling Leakage Uncertainty Filtering Differentiation
230 235 239 242 243 245 255
6 6.11 6.1 Write your own macros Reading the contents of a cell 378 .4 6. Convolution of large data sets Unfiltering Convolution by Fourier transformation Deconvolution by Fourier transformation Iterative van Cittert deconvolution Iterative deconvolution using Solver Deconvolution by parameterization Timefrequency analysis The echolocation pulse of a bat Summary For further reading 280 280 285 291 295 300 311 321 325 331 335 337 338 7 7.12 Interpolation Data compression Analysis of the tides Summary F or further reading 261 265 268 277 279 6 6.9 5.3 7.R.8 5.9 6.10 7.1 7.11 7.10 5. Advanced Excel for scientific data analysis 5.12 Numerical integration of ordinary differential equations The explicit Euler method The semiexplicit Euler method U sing custom functions Extreme parameter values The explicit RungeKutta method The Lotka oscillator 1 The Lotka oscillator 2 The Lotka oscillator 3 Stability Chaos Summary F or further reading 339 340 347 350 354 356 361 365 366 368 372 374 375 377 8 8.5 6.5 7.10 6.4 7.11 5. de Levie.8 6.12 Convolution.3 6.7 6.7 7.2 6.9 7.6 7. deconvolution. and timefrequency analysis Timedependent filtering.8 7.1 6.2 7.
4.11.6 8.11.8 9.2 8.9 9.7 9.4.14 Reading & manipulating a cell block Numerical precision Communication boxes Message boxes Input boxes Case study 1: the propagation of imprecision Case study 2: bisection Case study 3: Fourier transformation A starter macro Comments & embellishments Case study 4: specifying a graph Case study 5: sorting through permutations Case study 6: raising the bar Adding a menu item Adding a toolbar Tools for macro writing Editing tools The macro recorder Troubleshooting Summary For further reading 381 384 385 385 386 390 392 395 395 399 404 408 412 413 415 416 416 417 418 420 422 9 9.1 8.10 9.12 8.8 8.7.7 8.1 8.5 8.2 8.10 8.1 0.1 8.1 9.2 9.2 8.11 Macros for least squares & for the propagation of imprecision General comments LS LSPoly LSMulti LSPermute LLSS 423 424 426 436 444 452 459 460 469 481 491 500 509 510 512 512 Ortho ELS WLS SolverAid Propagation Matrix operations Invert Multiply Transpose .13 8.10.5 9.3 9.9 8.7.4 8.2 8.2 8.1 8.3 8.4 9.Contents 8.6 9.11 8.
4 10.I A.5 A.2 A.6 Fourier transfonnation FT 513 513 518 519 523 529 540 544 Direct (de)convo1ution Fourier transfonn (de)convolution Iterative deconvolution Timefrequency analysis Semiintegration & semidifferentiation 11 Miscellaneous macros 11.6.7 A.5 11.6.8 A.2 11.4 A.1 11.2 11.3 549 549 551 559 566 572 573 574 575 576 582 596 11. de Levie.7 11.8 Tenns & conditions Insert a toolbar Insert a menu Movie demos Lagrange interpolation SolverS can Calling Solver with VBA Programming details Possible extensions Mapper RootFinder Appendix A.9 The basic spreadsheet operations Some common mathematical functions Trigonometric and related functions Some engineering functions Functions involving complex numbers Matrix operations Excel error messages Some shortcut keystrokes for pc & Mac Installation requirements & suggestions 599 599 600 602 602 603 604 605 605 607 Epilogue Index 608 610 .1 11.2 10. Advanced Excel for scientific data analysis 10 Fourier transform macros 10.4 11.3 A.6.1 10.6 A.3 11.6 11.5 10.3 10.R.
and are therefore not recommended in connection with this book. while only a small part of it is visible at anyone time on the monitor screen. numbers with a digit. Excel versions 1 through 4 used a different macro language. where it is easily found. rather than at the bottom of the columns.18.8 in the appendix compares the most useful instructions on both types of machines. A cell can contain one of three different items: a label.e.. although it also contains some more advanced material that needed to be introduced here in order to be available in subsequent chapters..9 lists the requirements necessary to get the most out of this book. below B I is B2. a number. the sheet can be (and often is) quite large. to its right is B 1. Several such books are listed in section 1. For that reason. but Macintosh users are alerted to the few commands that are different on their computers. as would be common on paper. and instructions (i. the most important information is usually kept at the top of the spreadsheet. The instructions in this book apply to versions starting with Excel 97. In the absence of contrary information. If you have such an early version. while Table A. The novice user of Excel is urged first to consult a manual. and can be used also (albeit sometimes with minor modifications) with the older versions Excel 5 and Excel 95. The top lefthand cell is AI. A short 'getting up to speed' exercise is included in the web site. as a sheet with rows and columns. for . The rows and columns define individual cells. labels start with a letter. 1. etc. i. as can usually be found on the shelves of a local library or bookstore. denoted by a column letter and a row number.Chapter 1 Survey of Excel This chapter primarily serves as a brief refresher for those who have used Excel before. Table A. Because it is electronic rather than actual. The instructions are primarily for Excel installed on IBMtype personal computers. it is time to upgrade. or an instruction.1 Spreadsheet basics A spreadsheet is laid out as a page in an accountant's ledger.e. or an introductory text.
and by then dragging the corresponding handle. at its bottomright comer. one can conveniently generate a row or column of numbers in arithmetic progression. Numerous builtin functions are listed in tables A. 1. it will show the corresponding numerical result. This uses rules like those in chess. By highlighting two adjacent cells containing different numbers. and copying this to the block C3:FI5.6 of that same appendix. userdefined functions. C3. and C4 respectively. /. Relative addressing can be overridden by using the dollar sign. etc. Try it. and 3 in cells B3. after momentarily releasing the handle. AI. i.1: (1) Place the numbers 0. The function key F4 conveniently toggles through the four possible permutations.1 by dragging twice.1. the underlying instruction can be seen in the formula window in the formula bar when that particular cell is highlighted. division a forward slash. de Levie. Note that this trick does not work for letters or other text. =. sideways (or vice versa). $AI.2. with respect to its starting point. which is simply repeated.1. and a handle. and exponentiation a caret. copying this to B5:BI5. When a cell contains an instruction.2 through A. A highlighted cell or cell block has a heavy border surrounding it. *. 1. say from Al to $A$l. grab its handle. first down and then. The more general way to fill a row or column uses a formula that is copied. B4. 1. Note that multiplication must use an asterisk. A$l.. and numbers as rightjustified. Advanced Excel for scientific data analysis mulas or functions) with the equal sign.e. the cell address is shown in the address window in the same formula bar.11 you will see how to make your own. $A$l. but can still make a block such as shown in Fig. While this takes slightly more ini . Exercise 1. And in section 1. The most basic mathematical operations of a spreadsheet are listed in table A.1.1. the result shown could have been obtained by entering the value 0 in cell B3. In the example of Fig. Highlight the block B3:C4. the formula =B3+ 1 in cell B4. where the knight in one move can only reach squares with a given relative position. 1\.2 R. entering the instruction =B3+2 in cell C3.. that symbol of stability. and move it down or sideways. or 1 horizontal + 2 vertical. either 2 horizontal (sideways) plus 1 vertical (forwards or backwards).g. In general. Excel displays labels as leftjustified.I of the appendix. in front of the column letter and/or row number. a little dark square. thereby making (that part of) the address absolute. Note that one can drag only vertically or horizontally at the time. copying instructions to another cell assumes relative addressing. e. Unless otherwise instructed. At the same time.
i. If you want to restrict the name definition to a given sheet. to parameters with an absolute address.. In Excel you can assign such names to constants. and then to move the mouse pointer to the Name Box on the Formula Toolbar. A 0 2 3 4 5 6 7 6 7 8 9 2 3 4 5 6 7 4 5 6 7 8 9 10 II 8 9 10 II 12 12 13 14 8 9 10 11 12 13 14 15 16 8 9 10 11 12 13 14 15 16 17 10 II 12 13 14 15 16 17 I 18 19 20 Fig.Ch. and press Enter (on the Mac: Return). Note that valid cell addresses (e. as in =B3+$C$1 for cell C3.g. as in Sheet1!Kal. . say C 1. click on the cell address.1. use the latter method. it is not restricted to arithmetic progressions. You can then define Kal differently (or not at all) on the next page. and referred to by absolute addressing. and precede the name with that of the worksheet plus an exclamation mark. or A2B2C can.. but single letters (except Rand C) and combinations such as BB. 4U. You can also use the sequence Insert ~ Name ~ Define. 1. especially when the constant involved is specified in a separate cell. 1: Survey ofExcel 3 tial effort. A2 or IV34) cannot function as names. In complicated mathematical expressions it is often convenient to use (easier to remember and easier to read) symbolic names rather than cell addresses. and then type in the Define Name dialog box the desired name. type the name. and is more readily modified.1: The square B3:C4 (here shown with gray background) as extended by dragging its handle sequentially in two directions.e. The simplest method to assign names is to highlight the cell containing the constant to be named.
(In general do not use the Line chart. (2) Open the Data Analysis ToolPak with Iools => .e. and the plot area. and delete it. we first must have something to show in it. use Shiftv~ (by simultaneously depressing the Shift and back arrow keys) to extend the highlighting to the twocolumn block Al:B1000. In its dialog box. Type Linear. the chart surrounds the plot as a frame (and perhaps a mat) surrounds a painting. within brackets. (4) With colunm B still highlighted. Thus. labels. click on the corresponding window. In Excel. and enter NI:NI000 (it is easy to remember that we have synthetic noise in colunm N). then click OK. see section 1. then use Insert => Chart.. The newly copied column acquires the length to which the adjacent colunm is filled contiguously. because welldesigned visual images are often interpreted more readily than data sets. If you cannot find Data Analysis listed under Tools. most chart borders have been deleted. highlight it. which can be placed anywhere within the chart area. i. . 1. in the resulting dialog box. Axis labels and legends can be either inside or outside the plot area.e. (3) In cell BI enter the instruction =1 O+N1. ~tep value: I.dit => Fill => ~eries and.2.3) the plot borders have been retained.1. and other textboxes.. StQP value: 1000. select XY(Scatter). then doubleclick on the handle of cell B I to copy this instruction to B2:B 1000. which assumes that all points are equidistant. generated with the Data Analysis ToolPak. and finish.1. Excel makes it very easy to generate publicationquality 2D graphs. . regardless of whether or not this is true. 7. OK. but must be within the chart area. The data are shown in the plot area. specify Series in . then use !. pick your style of Chart subwe (with markers and/or lines). From now on. and it contains the plot area plus the regions surrounding it.. You should now have a graph such as Fig. To these you can add other independent blocks. select . graphs are made of at least two largely independent blocks: the chart area. Advanced Excel for scientific data analysis 1. But first a brief technical excursion.. contiguously filled columns. In order to make a graph.Golumns. de Levie.Rata Analysis. 1000 in AI:AIOOO. but (with the exception of some 3D plots in section 1.2 Making 2D graphs Graphs form an integral part of science and technology. and will not work when there are no adjacent. then in the Data Analysis dialog box use the scroll bar and select Random Number Generation. such an arithmetic progression will be denoted by its first and last members and. its increment. click on the round 'option' or 'radio' button to the left of Qutput Range. The larger chart takes up the total space of the graph. such as legends.4 R.3. i. In this book. as I (1) 1000. In recent versions of Excel this is conveniently accomplished as follows: enter the starting value (here: I in cell AI).1. .2. that all xvalues are equally spaced.Ristribution: Normal (for Gaussian noise).) (5) Click on the superfluous Legend box. Below we will use simulated data. Exercise 1.1: (1) Enter the sequence 1.
and doubleclick on its handle.2. Click on the vertical axis and try out some of the many options available in the Format Axis dialog box.1 (continued): (6) To compute averages and standard deviations of pairs of nine successive data points. the most often modified elements are the function shown and its axes. Below we will import a label. copy the graph you made for Fig. and in that copy click on a data point.41.2. 14.23.2. Click on the numbers next to the horizontal axis. then change their font and size using the font and size windows on the formatting toolbar. Or. 1.econdary axis).Ch. for more drastic changes. deposit in cell C5 the instruction =AVERAGE (B 1 : B 9) . and in cell D5 the command =STDEV(Bl:B9). 1. 1: Survey ofExcel 5 By rightclicking (on the Mac.1: A simulated data set of 1000 data with an average value of 10 and Gaussian noise with a standard deviation of 1. Click on the data and try out some of the many options available in the FQrmat Data Series dialog box. These include smoothing the curves using a cubic spline (Smoothed line under the Pattern tab) and. look under ~hart. You will see the averages and corresponding standard deviations appear on rows 5. etc. (7) In order to plot Cl:C1000 versus Al:A1000. grab its colored edge when the mouse pointer is an arrow. for multiple data sets. use Ctrluclick instead) or doubleclicking on the various components of the graph we can modify them. Highlight the block C1:D9. click on FQrmat => S~lected Data Series and. calculate 9point averages and their standard deviations. under the Axis tab. Go to the top of cell B I. This . Click on the background to get the Format Plot Area dialog box. the use of a secondary axis (highlight the series. click on S.32. including (under the Scale tab) using a logarithmic and/or reversed axis.1. The columns A1:AI000 and B l:B 1000 will now be outlined with color. 14 12 • 10 8 • a •• 200 6 400 600 800 1000 Fig. Exercise 1. and move it Over to cell C1. or on the graph edge to find the Format Chart Area dialog box instead. and display these as error bars. which appears in the Menu bar instead of Data when the graph is activated.
. click on a data point. 14 12 10 •••'~. Also change the y range. under the Scale tab./ .. .2. • 200 400 '" . . Figure 1.2. and the size of the markers. .2....:... 1.~• .. you can activate the text. which will now show C I:C I 000 vs. click on the plot area of the graph so that the plot border (its inner framing) is highlighted.. .. : . 1.3. (10) To enter text..~ • .. 1.6 R. ...  6 o 600 00 1000 Fig. .2 shows what to expect.2 with added (light gray) error bars. ... By passing over the text with the mouse key depressed. Al :AlOOO.. Compare with Fig. . change the Minimum and Ma"imum values.. then use format ~ Text ~ox ~ Font to change its appearance. and in the copy rightclick (on the Mac: Ctrluclick) on it to change the appropriate range. (9) If you want to see a smaller fraction of these data. Advanced Excel for scientific data analysis will change the yaxis assignment of the graph. 1.2. de Levie.2.... and select the Y Error Bars tab.. The text will now appear in the graph in a text box. then depress Enter (on the Mac: Return). . otherwise instead use whatever name shows on the tab below the working area of the spreadsheet). copy the graph.1.3: The same data as in Fig.'. 14 12 8 6 o 200 400 600 00 1000 Fig.. and in the resulting Format Axis dialog box.. 'I '" • .. which you can move and shape with the mouse pointer. as well as the Millor and Minor units...". 1. Highlight the Display: Both. and in both windows type =Sheetl! Dl : Dl 000 (assuming that your spreadsheet is called Sheetl."'". Type the text in the formula bar.2: Averages of groups of nine successive data points from the data shown in Fig. select Custom:. (8) To add error bars. . rightclick (on the Mac: Ctrluclick) to get the Format Data Series dialog box.2.
it may be necessary to add a second ordinate (vertical axis) on the righthand side of the plot. under the Scale tab. click on the particular data set in the graph that you want to associate with a second vertical axis. A few examples are shown in Fig.6 1 100 140 T rTl TiII . Error bars can also be used to label specific features in graphs and spectra.3. Educ.. Lim. 7 12 11 10 9 I . not shown here. 1: Survey ofExcel (11) To highlight a particular point.Ch. 79 (2002) 135. Alternatively. The symbol u has been chosen so that you will not be tempted to type it.2. furthermore.4: Embellished detail from Fig. and now use . Sometimes Excel places an axis somewhere in your graph (typically at x = 0 or y = 0) where you may not want it. see K. . click on the display area of the graph. change the ~rosses at. with a few individual points highlighted with different markers (or.) In the resulting Paste Special dialog box select Add cells as New §. Chern. and checkmark Categories (X Values) in Eirst Column.2. 1. To do this. color can also be used.oe! Ill! fTeI11 1 1 11 ~1 1 ~ ~1 180 220 260 300 340 ~I Detail I I Fig. L. copy it to the clipboard with Ctrluc. where you can change its appearance.Edit ~ Paste Special.4. In order to introduce another data set with the same abscissa (horizontal axis). The above examples involve a single function or data set. and under the Axis tab select Secondary Axis. as you might do with the symbol + often used instead. move it by selecting Format Axis and.eries. and simply paste the data with Ctrluv. Values (Y) in ~olumns. When displaying more than one function in a single graph. select its Format Data Series dialog box. highlight the plot area of the graph.2. just highlight that set. If so. colors). 1. 1. after copying the data to the clipboard. (The linking symbol u indicates that the Control and c keys should be depressed simultaneously. J. doubleclick on it (the mouse pointer changes into crossed doublepointed arrows) to see the options in Format Data Point. then press OK.
then use FQrmat => Text BQx to modifY the appearance of the text. Select marker size and style.2.005) 0.and yscales. Select appropriate x. Advanced Excel for scientific data analysis Some of these aspects are illustrated in exercise 1.005) 0.Ql (0.05 0. move it to highlight the text.2. which shows the use of two different axes by plotting the proton and hydroxyl ion concentrations as well as the pH as a function of the proton excess Ll = [H+] . and plot them in an XY graph. and pH = log[H+] as a function of the proton excess 6 = [H+] . copy them.005 (0.01 (0. de Levie.02 0. whereas that for pH has the typical shape of a titration curve. (2) Start a new spreadsheet.05 Fig. (4) Highlight the data in columns A and B.s. and D compute [H+] = {6+>'(6 2+4x1O.2. type the text of appropriate function labels. and under the Axis tab specifY the plot series as .[OH]. copy them with Ctrluc.04 0.2. 0. This rather complicated plot shows the values of [H+] . [aIr].05 • 0. line thickness. .05 (0.2: (1) Here we illustrate the use of two different axes by plotting the proton and hydroxyl ion concentrations as well as the pH as a function of the proton excess 6 = [H+] . Gray is used here instead of color to identify the secondary (righthand) scale and the corresponding curve.001) 0. (6) Again highlight the plot border. highlight the data in column D. Exercise 1. select FQrmat Data Series.01) 0.econdary Axis. The plots for [H+] and [OH] have two linear asymptotes. rightclick (Mac: Ctrluclick). C. 1.14 )}/2 (assuming for convenience the value Kw = 1014). Now click on these data. and paste the copied data in the graph.005 (0. [aIr] = 1O14/[H+]. (7) Likewise. and paste the data of column C into the graph with Ctrluv. and pH = log[W]. and in column A make a table of 6 with the values 0.[OH]. maneuver the mouse pointer over the text so that the pointer takes the shape of a capital I.05 where the numbers within brackets indicate the data spacing.Q3 0. then highlight the plot border. use Enter (Mac: Return) to place them in the graph.01 • l~ • [OHJ • • 7 o• IH+] • • • 0 PI n tl 0. (5) Highlight the data in column C.8 R.[OH]. (3) In columns B. highlight the plot border. and/or colors to distinguish the two curves.5: Using two vertical axes. and finally move the pointer so that it becomes a cross.01) 0. at which point you can drag the label to its final place.
as the corresponding power spectrum. An alternative.. 1. and makes copying graphs into Word much easier. And do read the books by E. the Fourier transform of the tidal data shown in Fig.2. one plotting [H+] and [OH] vs.Box 430. sometimes. Tufte. unless the curves all belong to the same family. Resist the temptation to put too much information in a single graph: visual overload defeats the purpose of a graph. and . grab the doublesided arrow. and click on it to establish the secondary axis.10. It makes for a goodlooking spreadsheet.6 and conveys the same information in a single frame. Afterwards repair that point.2. For example.Ch.4 f 0. 1 0. marker type and size. I log P 0 o 0 I 2 3 4 5 o 0. especially The Visual Display of Quantitative Information (Graphics Press. If you want to include details. and (if your final output can display it) color to emphasize important aspects. replace one data point in column D by a temporary value (such as 0). highlight the chart area (by clicking on the region between the right edges of the plot and chart areas). and to make your graph more intelligible. logarithmic plot of the same data is illustrated in Fig.2 0. the information in Fig. P. 1992) on the design of effective as well as visually pleasing graphical displays. 1.2 is displayed there as three panels with different vertical scales in order to show both the major peaks and the harmonics.5 Fig.e. Ll. as in this example.3 0. R. if you make sure that the graph is in register with the spreadsheet grid. Cheshire CT. 9 Use line width. To do this. three is a crowd.2. In fact.2 combined in a single. use multiple panels or inserts. 1: Survey ofExcel (8) If. move the mouse to a comer of the chart area. 1. The rule of thumb is: two is company. i. semilogarithmic display.5 would be better illustrated in two separate graphs. the data in column D all fall outside the vertical range already established for the other data. the other showing pH vs.10. a logarithmic scale will work.O. 5. 5.6: The data in the three panels of Fig. Ll.
It is really a threedimensional form of Excel's Line plot. . programs such as Mathematica.3.10 R.3 Making 3D surface graphs The twodimensional surface of a computer screen or a printed page cannot contain a truly threedimensional image. for one singlevalued variable a 3D surface graph can be made.2) 10 in both A2:A52 and Bl:AZl.and yvalues are equidistant. Moreover.5. (2) Deposit the instruction =(1+SIN($A2*SQRT(B$1)/PI()))/2 in cell B2. you will have much less control over its appearance than Excel gives you for 2D graphs.) (4) By rotating the graph we can select a different point of view. Click on the graph axes to select the label spacing. Doing this with two diagonally opposed comers will align the graph with the cell grid. Still.5. Maple. Advanced Excel for scientific data analysis drag it to a cell comer while depressing the Alt key. i. and the plot does not contain too many data points.. 1. 4. Specifically include the x. This book contains several examples of inserts. Excel cannot make general 3D graphs. Exercise 1. (3) Highlight the block Al :Z52. and copy this to the area B2:AZ52. the dependent variable z is singlevalued. That is what we mean by a 3D graph. with arbitrarily spaced functions for all three axes. as described in section 1. e. (In the example. so that only the zparameter (plotted vertically) can have arbitrary values. move it so that it is completely inside the area of the main graph. select Surface and its topleft Chart subwe. then Einish. Doubleclick on the graph and position it where you want on the sheet. and adjust the scales. Click on a top corner of one of the vertical background panels when it shows the label "Corners". such as Figs. it assumes that the x. de Levie. Call the Chart Wizard with Insert ~ Chart.2 and 4. such a graph will often make a reasonable 3D plot as long as the independent coordinates x and yare both indeed equidistant. For complicated threedimensional shapes.e. Inserts are best made as independent graphs. but (as in a painting or photograph) it can give the illusion of a third dimension. labels. However. After the insert is the way you want it.6. and enter the sequence 0 (0. and symbols to fit with the main graph. By highlighting the cell block underneath the main graph you can then lift both off simultaneously and. which will be used automatically to calibrate the axes.g.3.1: (1) Open a spreadsheet. and drag it to rotate the graph until it suits you. for both axes..and yaxes. the number of categories (or series) between tick mark labels and the number of tick marks were set at 10. Grab the little square at the top front comer. or AutoCad should be used. transfer them to a Word document.
The function is scaled to fit within the range 0 :::: y :::: 1 in order to avoid lines where it crosses these boundaries. and borders colored white to make them invisible.3.Jy IJr).1 we could also have selected the color white. In Fig. you can click on the Legend box and delete the box. Afterwards. Now click on one of the colored boxes in the Legend box.3. perspective.Ch. (7) A fun picture is that ofthe Mexican or cowboy hat z = 0. 1. Then.7*EXP(«$A27)A2))*EXP(2*«B$14) A2) ) +0. Make it. and play with it by moving the comer and thereby changing the point of view. Select ~hart => Chart Qptions. so pay attention.. and under the Legend tab select ~how Legend. 1. Selecting Nons< would have shown the surface as a transparent wire frame. But even that is not easy.3. 1.5 [1 + cos V(x 2+/)] illustrated in Fig. and height. (In Fig. 1. then rightclick (Mac: Ctrluclick) to get the FQrmat Legend Key (or start in the Formatting toolbar with FQrmat => ~elected Legend Key).' Survey of Excel (5) The resulting surface net has a color. which is one of the few attributes of a 3D graph you can change. 1. Click the chart to activate it. use ~hart => 3D Yiew. 11 10 6 8 2 4 10 o = Fig. Your graph might look like Fig.1: A graph of the function z Y2+Y2sin(x. where you can numerically specify elevation. Exit with OK. for precise control. and click on the colored marker specifying an area color. (6) In cell B2 now deposit instead the instruction =EXP (5* «$A23) A2)) *EXP(10*«B$16)A2))+O.3. 5*EXP (3* ( ($A22) A2)) *EXP (5* «B$13) A2)).) Changing the major unit on the vertical axis changes how many color bands are displayed.3. rotation.3 all coordi . 10 :::: y :::: 10 with all axes. legends.2. 1.3 for 10 :::: x :::: 10. the selected color(s) will stay. and click OK. and copy this to the area B2:AZ52.
Fig. z = 1. de Levie. Height 100% of base. 1. Because you cannot control the line thickness of a 3D line plot. as described below. rerspective 30. one for an overview with a coarse grid and relatively few data points. Rotation 70. the latter becomes solid black when too many data are used. Top: !.2cos (1 +v'(x2+/)).levation 30. using the precise controls of Chart => 3D View. .3. the other(s) with the necessary details. Advanced Excel for scientific data analysis nate frames have been removed by clicking on them and then either removing them or coloring them white. In that case it is better to use two or more plots. Alternatively you may want to use a map. 1.2: A graph of three Gaussian peaks. Fig.3: The Mexican or cowboy hat.12 R.3.
) . 3D line plots and maps are.e. while combining red. typically yellow.4 Making surface maps Maps provide an alternative way to visualize a singlevalued. red. which build the color sensation by adding light beams of different colors. once you understand how to write VBA (see chapter 8) you can easily modify Mapper's color schemes to your own liking. In the additive system. maps use color and/or contour lines to provide the impression of a third dimension. Contour maps can be difficult and timeconsuming to generate because they involve interpolation in the height data. there is nothing to prevent us from using it for our maps.. with a gray scale or colors representing the parameter values z. Adding its three components leads to black rather than White. especially when that surface contains much detail. threedimensional surface. which roughly correspond to the maximum sensitivities of the three types of cones in the human retina. Maps are most satisfactory when one must display a large number of data points. As it turns out. and then mount this picture in the frame of an x. therefore. because Excel has the option of using a picture as the background of a graph.y plot. green. Excel has no builtin facilities for such maps. 1. blue. but color (or even grayscale) maps are relatively easy to make as long as there are enough data to define each pixel. (The alternative. precisely the situation where the grid of Excel's 3D line plots may become too dense. While that may have been intended for company logos. Colors are coded in the RGB additive color scheme.' Survey ofExcel l3 1.Ch. The background picture then is the graph. Unfortunately. The additive color scheme is used in light projection and in television and computer monitors. Moreover. subtractive color scheme is based on light absorption and reflection of the remainder. cyan. in which all colors are represented in terms of three color components. i. rather than on light emission. it is possible to introduce color maps through a back door. and magenta. and green. It is based on pigments and dyes. Traditionally. Either method works only for one singlevalued parameter z (x. and is used in color printing as well as in painting. The custom macro Mapper in the MacroBundle provides a gray scale as well as several color schemes to represent height. y). red plus green yields yellow. and blue colors in the right proportions leads to white. largely complementary tools. We therefore generate a picture based on the information we want to plot.
255. =0. Advanced Excel for scientific data analysis 50 50 to 30 20 10 0 10 20 30 40 50 Fig.1: (1) Open a spreadsheet. ° . 255). Exercise 1. black is (0. row 1 and column A. bright yellow is (0. there are many possible schemes to suggest height. 255).4.14 50 40 30 20 10 0 10 20 30 40 R. and enter the sequences 20 (0. Here we illustrate a gray scale. and B are the same.1: The function z = (1 +Y(x2+/» I (lOO+~+/). (4) ModifY the axes to 50 (0. G. and other functions. pure red is (255. and again call Mapper.5* (1+SQRT($A2*$A2+B$1*B$1»/SQRT(100+$A2*$A2+B$1*B$ 1). (2) In cell B2 deposit the instruction for the modified Mexican hat. say. and copy this to the area B2:CXlOl. 0). etc. copy the same instruction to the area B2:IR251. 0). while (180. 0. a variation on the cowboy hat. More colorful schemes.4. 1. as represented (here in black and white) by Mapper. With three different colors to represent the single numerical value of each data point. 0. 255. 180) gives a light neutral gray.4) 50 horizontally and vertically. For example. Each RGB color is represented numerically by an integer with a value from through 255. (3) Call the macro MapperO and see what you get. for which the values for R. 180. de Levie. white is (255.4) 20 horizontally and vertically in. Then try colors (with Mapper! through 3).
. and text boxes can all be added. of the data array used for Fig.2. e. white). and they can handle larger data arrays. Mapper can only display one surface per graph. which provides vivid pictures with great contrast but only a rather weak illusion of height. 20 15 10 5 0 5 10 15 20 20 . yellow. 1. 1.15 10 5 0 5 10 15 20 Fig. curves. and better scales and legends. white). with superimposed (white) text. One such color scheme borrows from geographic maps (with the sequence black.4.2: The application of Mapper to the central part. 1: Survey of Excel 15 such as that illustrated on the cover.4.g. orange to red). yellow. blue. than Excel's 3D graphs. see Fig. are described in section 11. another from black body heat radiation (black.1. 1. They can be treated and modified like any other XY graph. red. yellow. dark red.. Just as a 3D graph. It is not necessary to go from dark to light: one can. orange.7. and with a (white) arrow from the drawing toolbar. and markers. contour lines.4. green. use the color sequence of the rainbow (from black via violet.Ch. but they can provide better resolution. blue. green. BY109: FU209. Color maps lack the illusion of perspective obtainable by rotating 3D graphs.
View tab. The instructions for this macro are shown below. Advanced Excel for scientific data analysis In applying Mapper. Click on th" axis (which may show the label Value (X) Axis or Value (Y) Axis).1: The starting graph for MovieDemol.5 Making movies Excel has a (rather limited) capability to make movies. 1. leaving an empty space at the top left comer. with large size and striking color. place in the bottom righthand comer whatever features you want to display in the bottom righthand comer of the graph. the scale might change annoyingly as the movie plays. enter the number 0 in cells A I and B 1.16 R. etc.5. 1. and to force the screen to update after every recalculation.1: (l) Open a spreadsheet. and use Insert => Chart to make a corresponding graph. if for no other reason than to break the monotony of a presentation. Qridlines). and adjust both scales to have Minimum: 0 and Ma~imum: 10. but on occasion may come in handy. (4) Call the custom macro MovieDemol. Al :BI (for a square input array. which is part of the MacroBundle. The MacroBundle contains a few demonstration macros (MovieDemo 1 through 5) that illustrate how simple movements can be generated in an Excel graph.5. Sub MovieDemol () Range ( "AI" ) 0 Range ("A2") = 0 . and the numbers 10 in A2 and B2. 1. for the duration of the show turn off the spreadsheet gridlines (with Iools => Qptions. and can be downloaded from the SampleMacros file on the web site. de Levie. o 0 10 5 0 0 o 5 10 Fig. (2) Highlight the area Al :B2. Moreover. 5. and enjoy the show. Here is a simple one. And make sure that the axis labels are to the left and on top of the data. in that case. Excel automatically plots rows vs.1. Your graph should now look like Fig. The trick is to make a computation that recalculates a number many times. and show the individual point Al:A2 prominently with an individual marker. which will not challenge Hollywood. Make sure that these values are not 'auto'matic (by turning off the top three check marks under Auto) because.fac: Ctrlvclick) to get FQrmat Axis. rightclick C'. (3) Make sure that the graph displays A2:B2 vs. Exercise 1. columns).
and a Gaussian peak on the way back. 5 5 10 o o o 5 10 Fig. Use different markers and colors to enliven the display.5. 1: Survey of Excel For i = 1 To 400 Range ("AI") = 10 .ScreenUpdating = True Next i Range ( "AI" ) Range ( "A2" ) End Sub 0 0 17 2) A (5) You should see the point trace a straight horizontal line on the way going.200) Range ("A2") = 10 * Exp(O. 5 10 4 5 5 5 5 6 7 ~ 5 00000000000 9 10 o 10 Fig.5 illustrate the starting screens for these demos. Consult the heading comments of the MovieDemo macros for specific instructions on how to configure them.5.Ch.2 through 1.0. 1.2 : The tarti ng graph for MovieDemo2 . . 10 Fig. 1.05 * Abs(i . then make your own. and where to place them.5.4: The tarting grapb for MovieDem04. Figures 1.OOI * (i .3: The tarti ng graph ror Mo ieDemo2.5. Figure out how they work (you may have to read chapter 8 first).5. 1.300) Application. Try the other MovieDemo macros of the MacroBundle. Have fun. o o 0 I 9 5 1 0 • @ 5 5 0 0 Ox 0 10 {).
There are two proper methods to import (part of) a spreadsheet into a Word document. Exercise 1.. on the Mac.6. Embedding takes the part you . it is often sufficient to show only that part of a large spreadsheet. 1. 1.18 R.g. C. and either changing its address in the fonnula bar. Advanced Excel for scientific data analysis 4 4 4 4 6 6 6 6 10 . B.. the letters A. the numbers 1.e.5: The starting graph for MovieDem05. highlight the area you want to show. linking and embedding. To make the graphs reflect the values on the copied sheet. linking & embedding In order to print a spreadsheet.2. except that embedded figures will still refer to the data series on their original sheet.. with the locations where they are used first. .. and then modify its contents. etc.5. or clicking on the curve and dragging the identifying data frames to their new spreadsheet positions. 5 5 5 5 5 5 :1 0 ~~ 10 Fig. highlight each series in tum. and adjust its sheet name in the fonnula bar. color a cell in the top row (e. C. and print it. This can be done by highlighting each individual curve.1. use the same approach (with numbers instead of letters).1: (1) To simulate a horizontal (letter) row atop a spreadsheet.6 Printing. Some examples also display the instructions used. de Levie. For the vertical (number) column. as was done in Fig. B. and the cells to which they are copied... down). followed by rightclicking or. and set its holumn Width to 3 points (after highlighting the entire column by clicking on its true column heading. accent it (with allaround thin border).1. By placing the most important infonnation at the top lefthand comer of the spreadsheet. . Copying a spreadsheet to another sheet in the same spreadsheet book is straightforward. 1. It may be helpful to add simulated spreadsheet axes (i. Ctrlvclicking). perhaps even including one or more thumbnail graphs. and fill the cells with centered capitals such as A. copy it to the other cells in that row. Charts use absolute addressing. and therefore keep referring to their original input data even when moved around or copied. copying.3. because the grid parameters otherwise will not be shown. gray25% of the Fill Color icon on the Formatting toolbar). across. To make compatible pictures it is often useful to copy one.
or other highcapacity media. or an embedded graph. if both sender and receiver can handle these.2 MB) diskettes. Then use gdit => Paste . Linking instead establishes a connection ('link') between the spreadsheet and the Word document. not so long ago. such as sizing. Alternatively. and click OK. This is how you do it.s. and it may save you a lot of headaches to use embedding.Ch. provided that you can handle the resulting. go to the place where you want to insert the graph. This is the method of choice if publicationquality graphics are required. 6.. so that any subsequent changes in the spreadsheet will be reflected in its image in Word.2: (1) Highlight a block of cells in a spreadsheet.g. thereby preserving their original smoothness even if subsequently resized. or for storage on what.) All illustrations in this book were directly taken from Excel and embedded into Word. Just doubleclick on the image. email attachments. but it is also much more efficient in terms of storage requirements. in the experience of this author. linking has not always been reliable in Excel. picture placement and text wrapping. This is how all figures in this book were imported into the text. huge data files. There are other ways to paste Excel images into Word that require much less memory. Because embedding can require large amounts of space. and stores it permanently in the Word file. and can be resized afterwards without loss of resolution. and make sure the corresponding line is formatted (FQrmat => ~aragraph) with the line spacing Single or At least. scaling. the only two formats that have selfadjusting heights and can therefore accommodate the picture. and control over contrast and brightness.e. deselect Float over text. compact disks.. In such cases it may be better to send (or store) text and spreadsheets containing graphs as separate items. such as by using the PrintScreen button to capture an . rotating. Switch to the Word document. they are stored as equations. 1: Survey ofExcel 19 select. were called highdensity (1. Excel files that are either linked or embedded retain their vector nature. Another option (if you have Adobe Acrobat. i. it is not very suitable for use in. Exercise 1. cropping. click on Microsoft Excel Worksheet Object. Not only does linking update the image in Word automatically when you subsequently make a change in the Excel file. e.pecial. (Word2000 allows many manipulations on embedded pictures. Unfortunately. and/or use data compression. Store it in the clipboard with Ctrluc. and select Format Object. use Zip disks. and only need to communicate goodquality page images) is to compress the Word file with its embedded Excel graphs as a pdf (portable document format) file.
the package of macros specially developed for and discussed in this book. available also in Lotusl23 and QuattroPro. Ftests. and file for the Analysis ToolPak addin.e. it (and the associated Data Analysis ToolPak . you will need to use MacroBundle. and ttests. Otherwise.2 Solver Solver is an addin. (Excel 2000 has shortened menus that show only the most often used parts.) When you don't find it listed under Iools on the Excel main menu bar. 1. It may already be part of your spreadsheet.VBA) may just need to be activated there. get the Excel or Office CD.7 Setting up the spreadsheet In this book we will use a number of auxiliary programs that come with Excel but that may not have been installed on your computer.) If it is not listed under Iools.. Moreover. i. check in the same menu list under AddIns. which can then be pasted into Word (possibly after manipulation in Paint). click on Browse. and many statistical tools such as anova. because they store the image as a bitmap. 1. folder. The rest is still available. and locate the drive.orgladvancedexcel. These methods are not recommended. Then run the Setup program to install it. it is usually located in the MicrosoftOffice\Office\Library\Analysis folder. Any resulting omissions should be remedied before you start to use this book. which will show their discrete nature upon resizing.20 R.g. . but you must either let the mouse pointer hover over that menu for a few seconds. as a collection of pixels. Advanced Excel for scientific data analysis image. de Levie. (Why this is done defies logic. usually at or near the very bottom.1 Data Analysis ToolPak The Data Analysis ToolPak contains many useful tools. including a random number generator.. in the AddIns dialog box. and install it. and ways to activate them. so first check whether you find it in the menu under Iools. a data sampler. Below the various addins are listed.7. 1. if the software was installed using the 'typical' option. because Solver is one of the most useful features of Excel.7. but left out with skimpier installation protocols. or by using Save as Web Page to generate a GIF image to be inserted into Word. which can be downloaded freely from the web site of this book at oupusa. or click on the chevron at the bottom of the menu to get the full display. e. The resulting 'pixellation' has undeservedly given Excel graphics a bad name. automatically included in the Full installation.
For Files or Folders . select Files of type: Microsoft Excel Files (*.3 VBA Help file Also check whether the VBA Help file has been activated and. and Open it. To make such automatic calls of Solver possible requires the following additional action. which will display the ReferencesVBAProject dialog box. (or find => files or Folders). which will bring you back to the spreadsheet. see sections 4. from then on (i.xla. 1. navigate your way to Solver. Click on ~rowse. and used in this book.4 Additional macros Users of this book should download the macros from my macro package MacroBundle. Near its top it describes how to place the MacroBundle (explanatory comments and all) in an Ex . then go to Iools => References.5. enter Systemdisk[C:] in the 100k in: window. in the Name: window type Solver. The MacroBundle contains all the major macros listed in chapters 9 through 11.xla). the VBA Help file includes extensive information on how to use Solver as a VB Adriven function. if not. 1. by the custom macro SolverScan discussed in chapter 4. already activated addins. 7.xla.6.Ch.. do so. the Excel Help file does not list it.orgl advancedexcel and download the MacroBundle text file you can read with Word. Program Files.5 uses the VBA instruction Volatile. First click on the Start button and.xls. In the Find: All Files dialog box. go to the web site www.. and click on Find now. MovieDem04 in section 1..e..7. until you reIoad or upgrade Excel) VBA will know how to find Solver when called from a macro. Click OK. in the resulting menu. on Search.oupusa. so that it is listed contiguously with the other. under the tab Name & Location. select the VBA editor with AltuFll (on the Mac: OptuFll) where Fll is the function key labeled FII.g. Note down where Solver.) Exit the Find dialog box. This will return you to the References . Office. and. material you would otherwise have a hard time finding. where you now use the Priority up button to bring it up. *.17 and 11. but the VBA Help file does. but it doesn't allow you to call Solver from a macro. Library. if necessary. (If you have Microsoft Office. To do this. see section 11.VBAProject dialog box. How would you find out what it does? Since it operates only in VBA. Solver.xla is located. 1: Survey ofExcel 21 The above lets you use Solver. and now that you know where to find it. Likewise. This file provides help files specific to VBA. it most likely is in Systemdisk[C:]. as is done. e. Microsoft Office. activate Include ~ubfolders.
the reader can avoid typing the macro. A free 30day trial of the NAG addin is available from NAG at extweb. the SampleData file contains external data used in some of the exercises. while others significantly extend the power of Excel.15). after installation. An example of the first category is the set of 76 statistical routines provided by Numerical Algorithms Group (NAG) as its Statistical AddIns for Excel. but to my knowledge they have not yet been tested against the NIST standards. or from secondary freeware! shareware distributors such as download. a convenience for those prone to typos.asp. The SampleMacros file contains the exercise macros used in the text. and interfaces that make other products available through Excel. such as the principal component and factor analysis routines. and still more in their specialized packages that must be purchased separately. mathematics packages that can do both symbolic and numerical mathematics. and both have superb 3D graphics. specially made for Excel. 7. Both certainly 'know' more math than most graduating math majors.5 Additional files Two other files are provided in the web site with the MacroBundle. Some of these routines overlap with functionality built into Excel (though perhaps at higher accuracy).nag. including their extensive collection of standard functions. Likewise. Maple includes its Excel addon package in its basic software. 1. and when to use more specialized software packages. might be a way around some of the documented deficiencies in Excel (see section 1. and pasting it into the VBA module. 1. Brief descriptions of a few examples of two different types of additional tools are given below: addins that.22 R. whereas Wolfram Research requires its users to buy a separate Mathematica Link for Excel. You may want to store a copy of this text file on your hard disk as a backup. Advanced Excel for scientific data analysis cel module. These addons make Maple or Mathematica accessible through Excel. 7. especially useful because the NAG site has had downloading problems with some browsers.comllocal!excel. Examples of the second category are the interfaces between Excel and Maple and Mathematica.6 Commercial tools It is useful to know when to use a spreadsheet. de Levie. These addins. By highlighting and copying a particular macro. function as if they were part of Excel. and therefore make it possible to do those exercises without having to retype the data. com. The user must buy (or have institutional access to) both Excel and Maple .
and the link between them have been installed. 1: Survey ofExcel 23 or Mathematica plus. absolute. C16. as in =Math ("6! +8!"). to the novice. An existing toolbar can be positioned anywhere on the spreadsheet simply by dragging the two vertical bars at its left edge (when it is docked in its standard place) or by dragging its colored top (when not docked). that many users having access to both may opt to keep and use them separately. Of course. Still. you may want to make changes in the default settings to make the spreadsheet conform to your specific needs and taste. using relative. the software systems are so different. and can then be copied to other places in the spreadsheet just like any other Excel function. It is useful to have default settings. Excel has many other toolbars. once you have become familiar with Excel. and the same result can therefore be obtained more easily as =FACT (Cl6) +FACT (018). as in Windows. cells CI6 and DI8 respectively. which may make their links convenient for Excel users who also have access to one or both of these symbolic math packages. In order to access such functions once Excel.7 Choosing the default settings In Excel. the Mathematica command on the spreadsheet must be wrapped in quotes and then identified as a Mathematica instruction.Ch. Here are some of the common defaults.. By default. almost anything can be changed. and to manipulate Mathematica using Excel's VBA. 7. it is also helpful to have fewer (potentially confusing) choices. Moreover. which can be selected with View ~ Ioolbars. the rather expensive link between them. However. 018). Excel displays the standard and formatting toolbars. This expression can be applied to the contents of. Mathematica. It is also possible to embed Mathematica functions in Excel macros. e.g. which yields the sum of the factorials of 6 and 8 as 41040. and how to change them. and their links are sufficiently nonintuitive. and may be able to provide superior accuracy. with the command =Math ("#1! +#2!". in the case of Mathematica. Maple and Mathematica have many more functions than Excel. . We have picked the above example because Excel also provides a factorial function. or mixed addressing. 1. so that one need not specify everything every time Excel is started. You can even make your own toolbar with View ~ Ioolbars ~ ~us tomize.
e. under lools ::::} Options::::} View) you can toggle the appearance of spreadsheet Gridlines on or off. You can click on a command and then use the button to get its Descri12tion. Advanced Excel for scientific data analysis Many aspects of the spreadsheet proper can be changed with FQnnat ::::} ~tyle ::::} Modity. the default setting for the graph type is accessible after you activate a chart to make the ~hart menu available. you can define the fonnat of the default chart. in Excel 97 and beyond.24 R. the font used.g. Under the View tab (i. You can also automate periodic saves with lools ::::} AddIns by activating the Autosave AddIn. change the Stgndard font (e. Say that you dislike the gray background and the horizontal gridlines that Excel puts in its XY plots. Here you can also Allow cell drag and drop or disallow it. Eile ::::} Page Set.e. you use the default printing settings. or fonnula entry. on the Mac. de Levie. text. and under ~hart type pick your choice..e. use Eiles ::::} Save As ::::} 012tions or Eiles ::::} Save As ::::} Tools => General Options and select Always create hackup. from Arial to a more easily readable serif font such as Times New Roman) or perhaps use a different font Siz~. Make a graph the way you like it. rightclick or. and select Clear).. with a white background (activate the plot area. Under the Edit tab (lools ::::} Options::::} Edit) you can (de)select to Edit directly in the cell. you might prefer the cursor to stay put rather than move down one cell after each data. If you wish to change this. and patterns.. including the way numbers are represented. Ctrluclick. as well as margins.g. Many Excel settings can be personalized in lools ::::} Qptions ::::} GeneraL Here one can specity. Likewise. In Excel 97 and later versions. the number of entries in the Recently used file list. which allows you to edit in the cell (after doubleclicking) rather than in the fonnula bar. and specity whether and how to Move selection after enter. colors. e. browse in lools => ~ustomize to see (and select) your Toolhars and their ~ommands. select FQnnat Plot Area. While the chart is still activated (so that the . i. cell borders. and S~t as default chart. Excel does not make backup files by default.!!p provides many alternatives. XY(Scatter). Here you can also set the Default file location (from C:\My Documents) and even define another Alternate startup file location. including paper size and orientation. When you print with the Print button on the Standard Toolbar. Even better.g.. then select Area Non~ or the white square) and without the gridlines (activate a gridline. Select ~hart => Chart lype..
and then continues with those parts of the assignment it can do.9 Error messages Excel has a rather limited set of error messages. Then. and by saving them as such in a directory or on your Desktop. 1: Survey ofExcel 25 ~hart button is accessible in the menu toolbar) click on ~hart ~ Chart Type. or lui. and undo the checkmark for Menus show recently used commands first.Ch. make sure that the Open dialog box deals With files of we: All files (*. .g. but just labels those tasks as impossible.. and after you tell the Wizard what you want it will execute your wishes. Because the data are now presented to Excel as a text file. go to View ~ Ioolbars ~ ~ustomize. Or. Excel is very forgiving: it does not stop operating when asked to divide by zero or to take the square root or the logarithm of a negative number. The menu bars and toolbars in Excel 2000 initially display only those items that you have used most recently. luf (for Insert CQart Einish). On the other hand. the Text Import Wizard will appear. 1. give an (optional) Description. In the next dialog box. click on the Userdefined option button. e. then on Add. or by waiting a few seconds. and exit with OK. select the Qptions tab. Altuh. and select the Notepad file. the next time you highlight a block and invoke Insert ~ CQart you will get the selected format just by pushing the Einish button on step 1 of the Chart Wizard.8 Importing data Data copied directly from the Web may fit directly into Excel.txt rather than the Word extension . or may all squeeze into a single column (in which case they may seem to fit the spreadsheet columns.*) so that . until you check what the formula bar shows for cells in the next column). 7 of the appendix. highlight the area involved. even though they are still available by clicking on the chevron at the bottom of the menu. Altuf. faster.doc). You can avoid the latter situation by first converting them into a text format.txt files show. select the Custom Types tab. 1. and type Altui. in which you can specify how to treat the file. by importing them into WordPad (which has the extension . Then open Excel. the others are out of view. as listed in table A. Usually the data are either separated by commas (commadelimited) or by empty spaces (tabdelimited). If you don't like this feature. luh. specify its Name. S~t as default chart. select Eile ~ Open.
26 R. including TurboBasic and QuickBasic. The language used for these is VBA. subroutines & macros Excel is quite powerful. Fortunately. In all other aspects. and often indicates how they are calculated. allowing the user to personalize it by adding custommade functions and macros to do things Excel might otherwise find difficult or impossible to do. Of these. . unless you happen to like its visual jokes.10 Help Excel provides a convenient index. You must still provide the special adaptations to match the algorithm to the spreadsheet. functions are somewhat restricted in that they can only affect the value displayed in a single spreadsheet cell.11 Functions. because functions can be copied to other cells.) Custom functions can be copied from one cell to another just as builtin Excel functions can. it has extra flexibility built in. Custom functions will be used occasionally throughout this book. but update themselves whenever a parameter affecting them is entered or modified. (By inserting the instruction Application. Excel offers two customsupplied procedures based on VBA: functions and subroutines. de Levie. but chapters 8 through 11 will help you with that. Advanced Excel for scientific data analysis 1. 1. so that they need not be activated or called. Consequently there is no need to reinvent the wheel when writing your own functions and macros. The singlecell restriction is often not serious.) That index is invaluable in scientific and engineering applications because it clearly explains the Excel worksheet functions. an adaptation of Visual Basic. but it cannot be all things to all people. a modern and quite powerful computer language that is used in all parts of the Microsoft Office suite. easily reachable with the function key Fl or through Help => Qontents and Index. An advantage of functions over subroutines is that they respond automatically to changing input information. and comes with many functions. because you can often find such material in the literature. Moreover. followed by selection of the Index tab. VBA will accept (with perhaps some minor modifications) earlier code written in modern versions of Basic. (First you may have to disable the pesky Assistant. functions are quite flexible: they can use many inputs. and they can even call subroutines (as long as the end result only affects that singlecell output value). Volatile True you can even make them respond to changes elsewhere on the spreadsheet.
After that. but they cannot be embedded in spreadsheet cells. the module is treated as a worksheet without a grid. you open a module by clicking on the tab at the bottom of the spreadsheet. Inside the parentheses you may find one or more arguments. you can toggle between it and the spreadsheet with AltvFll (Mac: OptvFII). The arguments (the material within the brackets following the function name) comprise the input data. 1. then in the Visual Basic toolbar click on Insert ~ Module. nor do they update automatically. and we will look at them in detail in chapter 8.11. The latter often work rather invisibly. both functions and subroutines have input parameters that are specified in the function or subroutine call. which are simply left in the subroutine or macro. just as you would move between different sheets in a workbook. as in =myOutput (). macros. In other words.) A subroutine without any input parameters is called a macro. Instead. You can use these specific examples as starting points for writing your own functions. or addresses of individual cells or of cell ranges where information can be found. or none. that cell will show the error message #NAME? . In Excel 5 and Excel 95. In Excel 97 and more recent versions. a macro. You can even specify optional input arguments. thus making them more powerful than functions. upon being called by either a function. and then clicking on module. or another subroutine.1 Custom functions Functions can be placed in particular cells. they can be either numerical values. Macros can be called directly from the spreadsheet. without having defined its role. We will encounter many examples of custom macros in this book. and subroutines. you can switch between module and spreadsheet by clicking on their tabs.Ch. use Iools ~ Macro ~ Visual Basic Editor. 1: Survey ofExcel 27 Subroutines are not restricted to changing the output in a single spreadsheet cell. (The argument list cannot contain output parameters. If you type this instruction in a particular cell. macros must be called every time one wants to use them. They are defined in the function by the order in which they appear in the function argument. In general. in which case they can control the value displayed in those cells. Once you have established a module. the module is hidden from view in the Visual Basic Editor. To open the more recent type of module. . as in the above example. You specify a function in a module. not by their names. The minimal function declaration in a cell contains its name followed by parentheses.
or other subroutines. by Function name:) that you can access by clicking on the function iconfx on the standard toolbar. You can only call it with Iools ~ Macro ~ Macros or with AltuF8 (Mac: OptuF8). You cannot enter a subroutine in a particular cell. by functions.. and in between one or more statements including at least one that specifies the function value. A fairly detailed . Macros are a most useful. special type of subroutines. That is only of practical use for relatively small vectors or matrices. A module can contain one or more functions. If you do not want such a listing.11.3 End Function which accomplishes the trivial task of setting the cell value to 8.2 Custom subroutines & macros Subroutines are specified in the module by prefacing their name by Sub (instead of Function).e. A simple function might be defined as Function myOutput() myOutput = 8. As illustrated throughout this book. or combinations thereof. separated by commas and/or semicolons. and by then using the resulting Macro dialog box) provided the subroutine has no input and output arguments.3. tying the function name to an output. but you can store it in a VB Editor module just like a macro. and can only be called by other procedures. Advanced Excel for scientific data analysis The output of a function is restricted to the output of a single cell and to a single value. The custom function will appear in the Paste Function dialog box (under Function fategory: User Defined. specify the function name in the module as Private. a last line specifying the end of the function. You will encounter useful custom functions throughout this book. There is one exception to the latter rule: it is possible to have a single cell display (within curly brackets) an array as a string of numbers.28 R. 1. but can also be a message (identified as such within quotation marks) as when the above statement is replaced by myOutput = "abc". subroutines. i. and described in chapters 8 through 11. macros. they can be used for quite sophisticated mathematical operations. but shows the general structure: a first line defining the function by its name. and it is usually easiest to keep them all together in one module. it lives outofsight in a module. as in Private Function myOutput (). The output will most often be a value. If it has arguments. macros. de Levie.
an aspect we will explore in chapter 8. that need not be a constraint. C. Of course you will have to supply the code for connecting the macro with the spreadsheet.cgi) you can find a corresponding Basic version in 1. but since functions can be copied to other cells. Such data require special methods to try to extract that signal. If it happens to be available in Basic you are in luck.library. in a case where a macro could not be used because Solver has no facility to call it. The important thing to remember about them here is that many problems have already been solved.. i. Sprott. a problem . in this case a polynomial interpolation.12 An example: interpolation Much of this book focuses on working with experimental data. Often the only modification such code requires is to make sure that it operates in double precision. 1: Survey ofExcel 29 look at them will be given in chapter 8.Ch. with numbers that are not exact but that are to some extent uncertain. Even if the program is in Fortran it is often readily transcribed.comeII. In that case you may want to make a spreadsheet into which the data are always entered in the same manner. Such a tradeoff might be useful if you deal repeatedly with input data of identical format.lanI. Press et aI. Can we make a macro selfupdating? The literal answer to this question is "No".. Each function can only affect one single output value. section 6. but an often applicable answer might be "Yes.7 will illustrate how a function can be combined with the Excelsupplied macro Solver. 1. as the result of experimental 'noise' corrupting the underlying 'signal'. Likewise. When the program is listed in the Numerical Recipes by W.edulnr/ nr_index. The Lagrange interpolation of section 1. Cambridge University Press. 1986 (which can be read or copied free of charge from libwww. (1991). and that their solution can be found in the literature.12 illustrates that a function can indeed perform a rather complex operation." That means that you can trade the flexibility of the input and output formats of a macro for the convenience of automatic selfupdating of a function.gov/numericallindex. because Basic (especially in one of its more modem forms) can be incorporated readily in VBA. and therefore do not necessarily reflect the input information displayed on the spreadsheet. say with spectra generated by a given instrument under fixed instrumental settings. Cambridge University Press. Numerical Recipes: Routines and Examples in Basic. H.e. if you are willing to convert the macro into a function.html or www. and convert the necessary macros into corresponding functions. Keep in mind that custom macros are not selfupdating.
Consequently we must also be able to handle theoretical expressions.g. Y2 (so that Xl :::.2) + (XXl)(XX2)(XX4)Y3 + (xxI)(xx2)(xx3)Y4 (x3 xI)(x3 x2)(x3 x4) (x4 xl)(x4 x2)(x4 x3) and so on... or chapter 3 of the Numerical Recipes by W. but that there is no assurance that this will be .x3)(X. The most common interpolation schemes use either polynomials or trigonometric functions. Orvis. chapter 25 by P. Below we will illustrate polynomial interpolation. Excelfor chemists.. who in tum credits W. Dover 1965. A. Abramowitz & I.e. and even more so when we have a model theory that provides data y = fix) but instead we need those data in the form x = g(y). J. Y4 (preferably with two points on either side. X2 :::.12..12. J. Interpolation is useful when we need to rescale a table of data.xI)(x x3)(x .x2)(x.. This will also give us a first opportunity to practice with custom functions. Yl and X2. where noise is absent. cubic interpolation between four adjacent points XhY' through X4. e. Billo.. H. e. Such Lagrange interpolation is readily handled with a custom function. X3 :::. with Xl :::. The prototype for polynomial interpolation of noisefree data is the Lagrange method. Sybex 1993.. We should emphasize here that we can fit n data points exactly to a polynomial of power n 1.. i. Advanced Excel for scientific data analysis addressed in the central chapters of this book. and for which the methods therefore can be quite different. X2) we have Y= (x. eds. In order to find y for a given value of x by linear interpolation between the two adjacent points Xl. and to illustrate what this book is all about: making the spreadsheet do the things you want it to do. 1996.1) while.x 2)YI (xxI)Y2 + Xl X2 X2 Xl (1. for additional information and alternative algorithms see. Stegun. de Levie. the one illustrated here is based on an elegant example from E. X:::. However. M. Excelfor Scientists and Engineers. J. 2nd ed. In this section we will illustrate some approaches for dealing with noisefree data.x4) YI (x.30 R. X4) is given by Y= (x .. science is not about experimental numbers per se.. Cambridge University Press 1986.. Davis & I Polonsky in the Handbook of Mathematical Functions.g.x4)Y2 + '~'"'"''''=(xl X2)(XI X3)(XI X4) (X2 XI)(X2 X3)(X2 X4) (1. Wiley 2001. Press et aI.. but about their comparison with model theories. X:::.
enter the values I (0.Ch. $A$20: $A$139. Va> Cb . say. (3) In the column for pH. (4) Plot the resulting titration curve as pH vs. For instance. and Ca. as a function of the volume Vb of added base.e. starting in cell A20.9. E20. Exercise 1.[H+] + K. In this latter. 1: Survey ofExcel 31 a good approximation for the underlying curve through those n points. Below the Vb heading.1. and 0. then interpolate this table in order to generate a second table of pH as a function of Vb. and in C20:C139 the corresponding titrant volume as Vb = Va (Ca .K. i.1) 12. such as IOA14. is a very steep function of Vb. but not the other way around. Kw . The general theory of such curves is quite straightforward when we calculate the volume Vb as a function of the proton concentration [H+].1: (1) Open a new spreadsheet. in cells E20:E49. use Iools => Macro => Yisual Basic Editor. 0. which mayor may not resemble a power series in x. In cell F20 of the corresponding pH column enter the instruction =Lagrange ($C$20: $C$139.1. In A18:C18 enter the column headings pH. nothing will happen. Va> Cb. 3). that of a single strong monoprotic acid (such as Hel) with a single strong monoprotic base (such as NaOH). Note that it shows many points in the transition region. enter the values 1 (1) 30. and this is readily verified by observing that the result obtained with Lagrange interpolation in general depends on the polynomial order used. In cells EI8 and FI8 place two more column headings. and then (in the new toolbar) Insert => Module. and copy it down to F139. and Ca. We therefore first calculate a table of Vb as a function of pH = log[H+]. E20 is the Xvalue for which the interpolation is requested. Do this as follows.j[H+]) I (Cb + [H+] . As our example we will apply cubic Lagrange interpolation to an acidbase titration curve. 20. We will do so here for the simplest type of titration. in Excel 97 or later. in that region. to the relation between the pH of a solution containing a fixed initial amount of. and Vb. In B20:B139 calculate [ff] as lOApH. This completes the first stage. an acid. In practice one does the opposite: one measures pH as a function of the added volume Vb of base. calculating a table of data in A20:C139. and 3 denotes the order of the Lagrange polynomial used. since the points are computed for equal increments in pH which. (2) In B13:BI6 deposit numerical constants for Kw . viz.. Vb and pH respectively. and in cells A13:AI6 place the labels for the constants. (5) Now make a second table in which the pH is listed at given (not necessarily equidistant) values of Vb.12. Vb. . because you have not yet specified that function. The ranges $C$20: $C$139 and $A$20: $A$139 specify the X.j[H+]). more significant sense the interpolation is not exact (even if it passes exactly through all n points).and Yranges in which to interpolate. with 12 rows at the top for graphs. (7) Open a Visual Basic module as described earlier. (6) Even though you have entered the function call. here a cubic. [H].
m denotes the order of the polynomial used. The array must be in ascending order. j As Integer Dim Term As Double.. Next loops generate the terms in the numerators and denominators of the Lagrange expression. computed data points is sufficiently small. so that we can check the interpolation procedure. 1) If Row < (m + 1) / 2 Then Row = (m + 1) / 2 If Row> XArray. XArray. such as those in (1. pH has so many: near the equivalence point.(m + 1) / 2 Then Row = XArray.Count .Count . There should now be data in F20:F49. Note that there are now very few points in the transition region. YArray. Y As Double Row = Application. (13) For more information on functions and macros see chapter 8. (15) Plot the pH (in F20:F49) as a function of Vb in (E20:E49). array.12. we appropriate it as a VBA command. i As Integer.1. and must be an integer between 1 and 14 Dim Row As Integer.(m . (16) The top of the completed spreadsheet is shown in Fig. We already mentioned that a cubic Lagrange interpolation will be a fairly good approximation for this curve as long as the spacing between adjacent.. (12) Note that Y need not be initialized: each time. (14) Return to the spreadsheet with AltvF1! (Mac: OptvFll).XArray(j» Next j Y = Y + Term * YArray(i) Next i Lagrange = Y End Function (9) The =Match (value. (10) The function will work without the dimensioning statements. In this particular case a closedform solution for the pH as a function of titrant volume Vb is available. . and the above definition is for type = 1. X. for the same reason that the plot of Vb vs. the slope d(pH)/d(Vb) is quite high. It is always . which are ignored by the computer but may serve the user.12. the function starts afresh.(m + 1) / 2 For i = Row . (11) The two nested For . which are included here as part of good housekeeping practice.1) / 2 To Row + (m + 1) / 2 Term = 1 For j = Row . By preceding it with the instruction Application. and that of d(Vb)/d(PH) is correspondingly low. Advanced Excel for scientific data analysis (8) In that module enter the following code (either by typing or by copying from the SampleMacros file): Function Lagrange(XArray. de Levie.(m . The same applies to the two comment lines.Match(X. m) .2).XArray(j» / (XArray(i) .1) / 2 To Row + (m + 1) / 2 If i <> j Then Term Term * (X . type) function in Excel returns the relative position of the largest term in the specified array that is less than or equal to value.32 R. 1.
12. 131 1. 1.2 1. X Inversion of the expression used under (3) in exercise 1. as a bonus. 1: Survey ofExcel 33 useful to calibrate new software with a test for which the (exact) answer is known. because it provides an early indication of its reliability and.0 7 1.3) .00 14 20 0.063 0.050 0.100 0.3 0.E20. yields pH = log[H+] =10 [CaVa CbVb + (CaVa CbVb)2 +K g 2(Va + Vb) 4(Va + Vb)2 w = Va 1 (1. $A$20:$A$139. 1 1. 176 cell: instruction: 820= IOAA20 C20= $B$14*($B$15B20+$B$131B20) 1($8$ I 6+820$8$ l31B20) F20= Lagrange($C$20:$C$139.Ch.1 (H( Vb Vb I pH 1. A 14 •••• pH ~I o 0 • • • ••• •••• 0 0 20 ~ 0 1 0 Vb 30 10 20 Vb 30 16 17 18 19 20 21 22 23 KII1= Vll= Cb= Cll= pH 1.Kw/[WD.1: The top of the spreadsheet for interpolating in a table of Vb vs. Note that the points in plot a are equidistant in pH (with ~pH = 0.3) I copied to: 821:8139 C2l :C139 F2l :F49 Fig.[W] + Kw/[H']) I (Cb + [H'] .1) while those in plot b have constant increments ~Vb = 1.53 6.1.0 1. pH (in columns C and A respectively) in order to generate a second table of pH as a function of Vb (in columns E and F). may alert you to possible problems.12.043 1.1 0.12.29 4.079 0.65 2 3 4 1. Vb (Ca .00 2.
1 (continued): (22) In cell JI8 place the heading [H]. and therefore inconsequential. in this case.12. Advanced Excel for scientific data analysis which will now be used to check the results obtained from the Lagrange interpolation. in cell J20 compute the proton concentration [H+] as =10AE2 0.12. (20) In cell 120 enter the instruction = H20 . or even the difference between it and the pH computed by Lagrange interpolation. Here we will use three steps. and change the cell format to scientific with FQrmat =:> C~lls. Now that we have converted a theoretical curve into one more like those encountered in the laboratory. Notice that. Optimal results are usually obtained with a loworder polynomial interpolation of densely spaced data. (21) Copy the instructions in cells G20:120 down to row 49. ~ategory: Scientific. the deviations are all less than ±4xl04. offset can usually be kept at bay by careful instrument calibration using standard buffers.. in cell K20 calculate the corresponding hydroxyl concentration rOlf] as =$B$13/F20.e. derived quantity.Ol pH unit. . and so obtain [W]" [OHY. Finally we will make the simulation more realistic by adding offset and random noise to the pH. (19) In cell H20 compute the pH as =LOG (G20+SQRT (G20 A 2+$B$13) ).1 (continued): (17) You can calculate the pH from (1. smaller than the resolution of a pH meter (typically ±O.OOI pH unit). First we will calculate the concentrations of the species of interest. and copy this instruction down to cell J49.3) in one operation. Increasing the polynomial order has a smaller effect. and can even be countereffective. and a useful. Exercise 1. de Levie. (23) In cell K18 deposit the label [OH]. in which case fitted curves of high order may swing wildly between adjacent points. occasionally ±O. Exercise 1.12. i. [H+] and [OIr] . which requires simpler expressions but uses more spreadsheet real estate. and L1'. Then we correct these for the mutual dilution of sample and titrant. In practical implementations. we will make a short excursion to illustrate what we can do with it.34 R.[OH]. Verify that using a ten times smaller pH increment in column A (with a concomitant change in the ranges XArray and YArray in the function call) can reduce the errors by another three orders of magnitude. especially when the intervals are relatively large. (18) In cell G20 calculate the quantity (CaVaCbVb)/(2(Va+Vb)) as = ($B$15* $B$14$B$16*E20)/(2*($B$14+E20) ). and likewise extend this calculation downward to row 49. the proton excess L1 = [H+] .F2 0.
in cell Dl4 the label na=. Qutput Range: NI9:N49.4. see Fig. and (using the righthand scale) the proton excess ~ = [H+] .12.08 0. Use three additional columns. You could do this. using Iools => nata Analysis => Random Number Generation. Note that the quantities [H+]' and [OW]' are directly proportional to the Gran plots (G. one for [H+]' = [W] (Va+Vb)/Va> one for [OW]' = [OW] (Va+Vb)/Va. and in N20:N49 deposit Gaussian ('normal') noise of zero mean and unit standard deviation. . by linear interpolation in the table for ~' as a function of Vb for ~' = O.. M through 0.g. (31) The effect of an offset a is to multiply [H+]' by 10A(a). Now experiment with nonzero values for either offset or noise.05 pH unit. (26) Adding titrant to sample clearly dilutes both. 10 0. and in L20:L49 compute its value as ~ = [W] [OH]. then highlight it and change the font to Symbol) for the proton excess.Ch.06 0. and ~' and . •• 0. and by lOA(+a) beyond it.04 0. and therefore leads to the slope changes shown in Fig.12. 1: Survey ofExcel (24) In LI8 put the label ~ (type D.2. 0.00 •••••••••••••••••• 1 ••••••••••• 0. and ~ by (Va+Vb)/Va. rOlf']. In the latter curve the equivalence point at ~ = 0 is highlighted. the corresponding concentration terms [H+]'.tandard Deviation = 1.. In cell Dl3 deposit the label offiet=. 0. 1. ••• • •••• 25 15 . and in cell Nl8 the heading "noise".). (27) The above are purely theoretical plots.2: The proton concentration [H+].. [OW]' by lOA(+a). OK.05 j 0. M~an = 0. and one for D' = [W]' . [Olf']'. They suggest that the equivalence point of the titration can be found simply by looking for that value of Vb where~' is zero..10 o 5 10 20 b 30 Fig. the hydroxyl ion concentration [OW] = Kw/[H+]. all as a function of the titrant volume Vb.rOlf'].12.. Analyst 77 (1952) 661) for this type of titration. 1. Distribution: Normal.05 0. etc. and ~ as a function of Vb.00 0. (28) We now make the transition to practical data analysis.10 35 [H] •• • I •• •• •• 0. (25) Plot [H+]. (32) Even though the titration curve may barely show the effect of added random noise.05 for offset or noise of 0. (30) To the instruction in cell F20 now add the terms +$E$13+SES14 *K20.3. and copy this down to F49. Gran. We can correct for this mutual dilution by multiplying [H+]. (29) In cells El3 and El4 enter corresponding values (0 for zero offset or noise. .[OH]" then plot these. e.S.02 [Oil ] • e.12. 1. [OH]. and~' by lOA(a) before the equivalence point. 1. as in Fig.
g.4." •• 20 l 30 0..00 •• •• _ _. 0. .[OH]' 0. use the primary (lefthand) scale...3.... 0. and proton excess /1'. while the secondary (righthand) scale pertains to /1'...02 •• •• •• ." 0.. 1.12. the lines are drawn for zero offset. de Levie. see Fig. ...05 o 5 10 15 25 Vb Fig.. all as a function of the titrant volume Vb' Note that /1' is a linear function of Vb throughout the entire titration.. hydroxyl ion concentration [OIr]'.12.5 is a different problem..00 •• 0._ _ • 0. .. where /1' = 0..· ..05 •• 0... but to do so in Fig.12. I O [H ]' • •• •• ' = [H]' . [OH]."'" " " " •••• ••••• •• I° 0.12.0 0.. [OIr]'. because these data fit a straight line only approximately (even though most of the noise is at the extremes of the curve).5..02 0. 1. Dealing with such more realistic data on a spreadsheet lies at the core of this book.12. 1..3: The dilutioncorrected proton concentration [H+]'.4: The effect of a relatively small amount of pH offset (offiet = 0..08 0. has been highlighted.0.. the latter part of this example merely illustrates what we can do by simulating an experiment. For comparison..1) on the concentration parameters [H+]'. Again. Advanced Excel for scientific data analysis analysis procedures relying on them may be affected strongly. 10 0. by visualizing how sensitive a proposed ._.04 0. [W)' and [OIr). 1. and /1'. 10 •• • [H]. The equivalence point. 10 •• • 20 5 15 25 Vb 30 Fig.00 I" " " " " " ". For now._ ._ _ _..04 0.Hf:. e. 1...00 " . 10 0.. 10 [OH]..05 0..06 0...36 R.. It is straightfOlward to use formal mathematics to identify the zero crossing of /1' in Fig...06 j •• • o.. because the exponentiation involved in the conversion from pH to [H+] greatly accentuates the noise. 0.
10 .04 0...... .05 0...... as illustrated below. but one cannot use the corresponding capitals. Instead of i one can use j to denote the square root of 1 (which must then be specified as such). andL'1'.. o 5 10 20 25 V b 30 Fig.Ch.. 10 0.2 through AA of the appendix. 10 0..3. . I or J..1 Complex numbers Excel operations on complex numbers use text strings to squeeze the two components of a complex number into one cell... 15 0.. 1..00 .13... More robust data analysis methods for titration curves will be described in section 4... 0. .5: The effect of a relatively small amount of random pH noise (with zero mean and standard deviation Sn = 0. 1: Survey of Excel 37 analysis method will be to the effects of offset (systematic. 1... stochastic) noise. Some of the most useful of these are listed in tables A.... detenninistic bias) and (random..06 0..1. The corresponding functions are listed in tables A5 and A6 respectively of the appendix.. In order to use the results of complex number operations. [OHl'. ranging from search tools to sophisticated mathematical and statistical tools........ Excel can also handle complex numbers....13 Handling the math One way in which Excel facilitates computations is through its extensive collection of functions..12. 1.. 0. using IMREALO and IMAGINARYO.. as in the Fourier transfonn macro described in section 10. which are therefore best handled in tenns of their real and imaginary components respectively.05) on the concentration parameters [H+)'.08 0.. ..... and matrices.. one must therefore first extract its real and imaginary components. VBA lacks a special data type for complex numbers.5 in the appendix lists the complex number operations provided in Excel's Data Analysis ToolPak.02 [OHT •. Table A.
Because transposing rows and columns is a common operation in.3. Remember the order: first rows. You can also incorporate the array elements in the instruction. and copy it to the clipboard (e. separated by commas.g. highlight) the array to be transposed. and deposit this instruction with CtrluShiftuEnter (on the Mac: CommanduReturn). Say that C4:E8 contains the data o 4 8 1 5 9 2 6 10 3711 then =INDEX(C4:E8. then Column.2 Matrices Excel has several builtin functions for matrix operations: {}. think of the electrical RC (resistorcapacitor) circuit or time constant: RC.11}. and use the keystroke sequence Edit => Paste s. Matrix inversion and matrix multiplication work only on data arrays.3) yields the answer 9. Curly brackets {} around a set of numbers separated by commas and/or semicolons can be used to deposit an array. e.7}. or the generic All may just work fine. it is performed as part of the Edit => Paste S. column 3.co[umn#) yields the individual matrix element in a given array. MINVERSE.) INDEX(array. This instruction likewise yields the answer 9.e..1O.3). but not on single cells. TRANSPOSE...4. and copy . This will deposit the numbers 2 and 3 in the top row. then type the instruction ={2.Pecial => Transpos~ => OK. while semicolons separate successive rows. Select (i.2.5. INDEX. As a memory aid.. TRANSPOSE interchanges the row and column indices. with Ctr1uc).8.7.pecial operation. accounting. Then select the top left comer (or the left column. i. on rectangular blocks of cells. Here commas separate matrix elements in the same row. de Levie. from left to right. To enter these instructions. then columns. Advanced Excel for scientific data analysis 1. and 6 and 7 in the bottom row of the block.2.4. (Depending on your software. as in =INDEX({O. since it specifies the array element in row 2.9. while different rows are separated by semicolons and are read from top to bottom.3. first activate a cell block of appropriate size.g. and MMULT. 4 and 5 in the middle row. first Row. or the entire area) of where you want its transpose to appear. or the top row. you may have to specify Values or Formulas before Transpose.2. Highlight a block of cells 2 cells wide and 3 cells high.13.row#.6.e.5. 6.1. where the array elements in each row are enumerated one at a time.38 R.
2. . Note that MINVERSE can only be used with square arrays.5 ' MlNVERSE(B3:C4) = MMULT(array) 3 2 2.5 1. 1.5} 3 2. In VBA.Ch. with Ctrluc). parameters dimensioned As Variant (as well as all undimensioned parameters) can represent arrays.5. In the formula box. the instruction will show inside curly brackets to indicate that it is a matrix operation. Function Description and example MINVERSE(array) The matrix inverse of a square array: when B3:C4 contains the data 3 5 4 6' MlNVERSE( {3. 5.5 2 1.5 respectively.g.{3.6}) = {3. while for MMULT the number of columns in the first array must be equal to the number of rows in the second array. and finally enter it with CtrluShifiuEnter (Mac: CommanduReturn). 4. Then highlight the area where you want to place the result.5 1.E6:F7) = 1 0 o 4 1 when B3:C4 and E6:F7 contain the data and 3 5 6 3 2 2. # of columns) are specified using a Dim or ReDim statement.5. 1. 25.5 The matrix product of two arrays (where the number of columns in the first array must be equal to the number of rows in the second array): MMVLT({3. when their specific sizes (# of rows. 2. 1: Survey ofExcel 39 it to the clipboard (e.5}) 1 0 o l' and likewise MMULT(B3:C4. 6}. 4. type the instruction.. 2. Matrix manipulations are often relegated to subroutines.
0000. most birds will know that the blind is not yet empty after they see one of them emerge: 2 . Even so. The binomial coefficient is most clearly associated with binomial statistics. But you can usually fool birds by having three enter. But for x = 0 you will get the error message # DIV /O! without a numerical result. If you encounter this problem in a spreadsheet.. once you see the approach you will know how to deal with similar problems you may encounter. the convolution of a sine wave and an impulse function yields the sinc function.14. suggesting that the value of sinc(O) might be undefined.14. Below we will explore some of these.. or 00 I 00. Advanced Excel for scientific data analysis 1.. even though they can count a little further. you may occasionally ask a computer to calculate 00 00. as is most readily seen by expanding sin(x) for x« 1 as x .14.x 617! + . such as 0/0.Y13! + x 4/5! . it occurs in the Gram functions used for equidistant least squares discussed in section 3. Excel makes it easy to compute many mathematical expressions. you can use the series expansion or. When mt is zero. For example.x717! + . sinc(mt) = [sin(mt)]/(mt).e. and two leave: apparently birds count zeroonemany. Excel will return 1 for [sin(x)]lx.. birds that have seen this will remember that someone went in until he or she comes out again.x3/3! + x5/5! . simpler. When two people enter. close to the smallest number it can represent. ? 1 for x ? O.1) . especially when our calculations involve some of the mathematical 'funnies'..1 O.40 R. but only up to a limited number. and Oxoo. de Levie. Even if you take x as small as 10300 .14 Handling thefunnies With its many builtin functions. Birds can count. But this is not the case.1 The binomial coefficient There is an old birdwatcher's trick. *' The same applies to computers. so that [sin(x)]lx = 1 . we may sometimes need to help it along. i. 00/00. But still. sidestep it by using a very small value instead of O. The binomial coefficient is defined as N) N! ( n (Nn)!n! (1. they cannot distinguish between 2 and 3. both the numerator and denominator are zero. When one person enters a blind. As a simple example. but once in a while it crops up in seemingly unrelated scientific and engineering problems. in which case it will come up short. 1.
two rows lower enter labels for N and the binomial coefficient.14. For example. Understanding what causes the problem is.14. and this ruins the calculation of the much smaller binary coefficient. For n = 10 you will do fine till N = 170. (2) In B5 start the column for the binomial coefficient where N = n. y = exp[x2 ] erfc[x]. as illustrated in exercise 1.Ch. .. + In(2) + In(l) will not exceed the numerical capacity of the computer. which for n < 170 can be computed simply as =LN (FACT (Bl». by calculating it as N! / (Nn)! n!. since In(N!) = In(N) + In(Nl) + In(N2) + In(N3) + . Exercise 1. at any rate. deposit the instruction =EXP (C16C6$D$1) in row 16. We first note that the definition of N! as a product makes it easy to compute it in logarithmic form. and we therefore start out by simply multiplying them. even though the binary coefficient is still smaller than 10 16 . Exercise 1.. problems of statistics..) Copy the instruction from cell C7 all the way down.02167227 x 1041 . (We avoid cell C5 because In(O) is not very useful. but when you apply this to the binomial coefficient.14. and diffusion is the calculation of the exponential error function complement. At N = 171. 1: Survey ofExcel 41 and should not give the spreadsheet any problems because Excel contains the factorial function (the instruction =FACT (3) will yield 6) and.1: (I) Start a spreadsheet with the label and value of n. e. thereafter.2 The exponential error jUnction complement A function that appears in. N! exceeds the maximum number the computer can represent (about 10207). Obviously we should compute the binomial coefficient without explicitly calculating N!.. heat transport. and copy it all the way down. for n = 10. You now have the binomial coefficient for as far as the eye can see on the spreadsheet: for N = 65531 and n = 10 it has the value 4. the most important part of fixing it.14. (3) Make a temporary third column in which you calculate N! for the same range of Nvalues. (5) In cells CI and Dl enter the label and value respectively ofln(n!). a sizable number but no problem whatsoever for Excel. the computer fails.. you can always evaluate a factorial from its definition. of course. (6) Now use column D to compute the binomial coefficient.1 (continued): (4) Relabel the third column as In(N) and use it to compute In(N!) by making cell C6 read =LN (A6). Perhaps so. N! = I x 2 x 3 x 4 x . and by entering in cell C7 the instruction =C6+LN (A 7) . Excel provides both exp[x2 ] and erfc[x].g. and in cell A5 start a column for N = 0 (I) 200.2. you will quickly run out of luck.. 1.
. Open circles: the product of exp(x2) and erfc(x). Only the region of switchover is shown. For large values of x we therefore use an asymptotic expansion in stead. and (because the asymptotic expansion fails for small values of x) how to switch smoothly from one to the other.14..l. while the computation obviously fails above x = 6.2) into a cell..14. 0. go to the end. the latter simply by calling the functions expO and erfcO and multiplying them. (2) Plot y versus x. Exercise 1. There should be no problem as long as x < 5. The first problem will be addressed below by introducing a custom function. Gray band: the result of switching from one to the other at x = 5. where exp[x2 ] becomes large while erfc[x] tends to zero. there is no module yet.0 _ _ _ _. (4) In the module.2) for x ~ 00. The open circles in Fig. The problem therefore is twofold: how to incorporate a computation such as (1...show what you should get.14. or by copying from SampleMacros) the following lines.42 R.. and you therefore make one with Insert ~ Module. Solid circles: the output of the asymptotic series.5 .14. in this case exp[x ]erfc[x]::::: 2 r 1+ L(l)m Xv" m=! I {oo 1. 1.1: Cobbling together a smooth function from two partially overlapping segments.14.. Either use AI1:uFII (Mac: OptuF1I) or Tools ~ Macro ~ Yisual Basic Editor. This is the digital analog of the product 00 x 0. Function EE(x) .1) 10 and y = exp[x2] erfc[x]. 1. 3 4 5 6 x 7 Fig.3.2 y 0.2: (1) In a spreadsheet enter columns for x = 0 (0. Clearly we run into a problem for x> 5. (2m1)} 2 m (2x ) (1. If a module already exists. enter (by typing.14. If you find a gray space. Advanced Excel for scientific data analysis Exercise 1. de Levie. as can be recognized by plotting both functions separately.2 (continued): (3) We first find a macro in which to write the custom function.1 0.
(12) Finally.Ch. (The initial value of the function exp(xz] erfc(x]. is I. Pi ( ) . Jr.) (9) Now we consider the terms summed in (1.e. using the assignment symbol = to mean <=. You can operate the function without the dimension statements.1) * oldterm I sum = sum + newterm oldterm = newterm m = m + 1 Loop Until Abs(newterm) < 0. and doesn't know what Jr means. unlike the expression in (1. and in general Tm = (2mI) Tm1/2xz. which is why we will do so here.2). at x = 0. (to) In the next line we then add the successive terms to sum. between square brackets: [P i () ] .00000001 ee = sum I * x * x) (X * Sqr([Pi()])) End Function (5) The first line specifies the function by name. Therefore we can either write out the value of Jr. (6) The next three lines contain the dimension statements. it is good general programming custom to include dimension statements. Also note that the instruction for taking the square root in Visual Basic is sqr. or simply use the spreadsheet function. (11) Then we update oldterm.e. (7) The next three lines initialize the calculation. Tz = 3TI /2x for m = 2. i. . which forms a socalled doloop. This is the logic behind starting with the parameter oldterm and using it to compute successive values of newterm.2). here we use as such a criterion the requirement that the absolute value of the lastcomputed term is smaller than to1I. In the latter case we must place that function. 1: Survey of Excel Dim m As Integer Dim sum As Double Dim oldterm As Double. we assign the value of ee to sum divided by x V Visual Basic does not have the rich assortment of functions that Excel has. Note that the variable x should not be dimensioned. Note that. outside the loop. since the spreadsheet already knows its dimension. and increment m. This loop executes a set of commands until a termination criterion is reached. The last line identifies the end of the custom function. in which case they are required. to replace the lefthand expression (sum) by that on the righthand side (sum + newterm). sum already incorporates the term 1. Empty lines have no meaning for the computer. which (through the first line) is imported into the function together with its value. and are inserted merely for greater readability... except when your module starts with the line Option Explicit. T3 = 5 Tzl2x for m = 3. (8) The heart of this custom function is the part between Do and Loop until.14. they assign the relevant parameters their initial values. However. not sqrt.14. i. which specify the nature of the variables used in the custom function. newterm As Double 43 m = 1 sum = 1 oldterm Do 1 (2 newterm (2 * m . and within brackets indicates the name of the parameter(s) on which the function operates. which we can write as Tl z 2 = lIU for m = 1.
especially if those involve subtractions of numbers of nearequal magnitudes.. as benchmarks (such as the Statistical Reference Datasets from the National Institute of Science and Technology) have become available to test them. (16) Plot this result. Wilson. Computational Statistics and Data Analysis 31 (1999) 27.0999994 .. 1. Apparently. the binary representation of 1/10 is 0. de Levie. P. 1.14.. and B. and below it scratched by the cat. it should look like Fig. and copy this down the length of columns A and B. Figure 1. Political Science and Politics 43 (2001) 68.EXP(A3"2)*ERFC(A3). Exercise 1. The basic problem with all programs (including Excel) that use a fixed number of binary units ('bits') to represent data is that many nonintegers cannot be represented exactly as binary numbers. So the final column takes the good parts of each calculation. EE(A3».14. e. . including Excel.14. ) The resulting small errors can accumulate in complicated calculations.. McDonald.1 illustrates what we have just calculated. For example. Advanced Excel for scientific data analysis (13) Use AltuFII (Mac: OptuFII) to switch back to the spreadsheet. D. in C3 deposit the instruction =EE(A3). and an error is therefore made wherever it is truncated. For example. and make a third column in which you use the custom function.. have recently been tested.15 Algorithmic accuracy Excel can do many things. but how well does it do them? This question is increasingly being asked of many types of software. Computational Statistics and Data Analysis 26 (1998) 375.00011001100110011001. if your column containing the values of x starts in cell A3. This reads as 'if A3 < 5.1. or in broader comparisons such as given.g. (The binary number as shown is actually good to fewer than five significant figures. . and his macros will no doubt confirm that. The present author certainly makes no claims to such expertise.44 R. Knussel. and avoids their problems. but that is an area of competence that most scientists will have to leave to the specialists. McCullough & B. then use EXP(A3"2)*ERFC(A3). A number of software packages. by M.. and readers of this book may be especially interested in tests of Excel such as those published by L.2 (continued): (15) In cell D3 deposit the instruction =IF(A3<5. otherwise use EE(A3)" and apparently solves both problems. Good software design tries to minimize such errors. above x = 5 we are bitten by the dog. since its decimal value is 0. (14) Add the result obtained with this custom function to your graph. Altman & M.
1). It is equivalent to a teacher. Political Science and Politics 43 (2001) 681. honestly answering "I don't know" if that is the case. rather than display the last result before the algorithm stopped. whereas one needs two passes (one to determine Xav first) for (1.1) which is mathematically (but not computationally) equivalent to (1.15. Altman & M.1 with an example taken from M. We will call this absolute accuracy. rather than the difference between the usually much more extreme values of Lx2 and (Lx f . It would be optimal if software could let its user know when the computer cannot find the requested answer. to the detriment of accuracy.15.15. regardless of whether the answer might be knowable. which uses the population standard deviation given by (1. even though one might wish the algorithm to yield the requested answer. it is of questionable value now that personal computers have clock speeds above 1 GHz. That is fine.2) Use of(1.15. (1.1) keeps better numerical accuracy for large values of x. consistent accuracy will soon become the sole determining factor. one wants the final results of a calculation to be correct rather than wrong. and it requires elaborate software testing.15. But sometimes algorithms have been chosen consciously for speed. P. but instead displays the error message #NUM!.1) because (1. We will call this numerical accuracy.Ch. or perhaps just one more.15.15.2) requires only a single pass through the data set. Algorithms incorporated in standard software packages often reflect a compromise between accuracy and execution speed. McDonald. We will illustrate this in exercise 1. because it computes the squares of the differences XXav between the numbers x and X av . one would like only significant numbers to be displayed. Knusel lists several examples where Excel cannot find a result. While that might have been justifiable in the days of 1 MHz processors. On the other hand. 1: Survey of Excel 45 There are several issues here. and then does not give one.2) can be faster than that of(1. with the increasing processing speeds. First and foremost. Secondly. although one would hope that.15. or a politician. as a guard digit.
then highlight C3:F3 and paste. as illustrated below. (2) In cell Bl enter the number 1000000. and inB24 =STDEVP (B3 :B9).15. Advanced Excel for scientific data analysis Exercise 1. xav As Double .15.46 R. (6) In cell B13 place the instruction =B3B$11. Or. 3. more than it has available.yRange) = (lin) L(XXav ) (yYav) cannot be written in terms of squares. Function myStDevP(myRange) Dim Count As Long . you can force Excel to compute the average (actually twice) by replacing STDEVP(xRange) by SQRT (COVAR(xRange. and in cell B3 the instruction =B$l+$A3. 7. x As Double. The builtin Excel function STDEVP fails when x contains more than 8 significant figures. copy it.xav) Next Cell myStDevP = A 2 Sqr(DifSq I Count) End Function . 4. just in case Count exceeds 32K Dim DifSq As Double. You should now have a spreadsheet that. de Levie. which avoids the above problem because covar(xRange. or with a 1 in cell A3. (4) Highlight B3:B9.Value Count = Count + 1 Next Cell xav = x I Count . (5) In cell B 11 use =AVERAGE (B3 : B 9) to compute the average. 2. xav x = 0 Count = 0 For Each Cell In myRange x = x + Cell.1. compute the standard deviation DifSq = 0 For Each Cell In myRange DifSq = DifSq + (Cell. and paste. either directly as 1.1: (1) In cells A3 :A9 of a spreadsheet enter the values 1 (1) 7.xRange)). You now have entered the test data set. the instruction =A3+1 in A4. you can write a function yourself to do better than Excel. apart from the few added notations in column A. (3) In Cl deposit the instruction =10*B$1. 1. highlight CII :Fll. 5. copy it. (7) In B21 deposit =COUNT (BD :B19). Interestingly. and copy this to Dl:Fl. and copies thereof in cells A5:A9. Copy this instruction down through B4:B9. and copy this down through B14:B19. in B23 =SQRT (SUMSQ (B13 :B19) I B$21). compute the average xvalue.Value . predictable because squaring such a number requires more than 16 significant decimal digits in its computation. resembles Fig. (8) Highlight BII :B24. 6.
which gives the formula used. On the other hand. and VarPA.(1. instead it provides an incorrect result with the same aplomb as a correct answer. 13. For the analysis of the problem. which has haunted Excel at least since version 5. such as StDev. look up the Excel Help file for STDEVP. http://supportlmicrosoft. this is really a simple problem with a simple solution.285714286 0 0 Fig.Ch. Kurt. Excel does not alert the user to the fact that the numbers entered into this function are too large for it to function properly. but also functions not listed there. 1) STDEVP 2 2 2. R2 in Trendline.0 10000004.15.0 3 AVERAGE 1000004.com. LinEsT. VarP. you might never know this problem exists. such as the variance Var and the related functions VarA. This problem affects not only the functions listed in note Q158071. and neither does Excel. e.0 100000004.0 3 10000000001 10000000002 10000000003 10000000004 10000000005 10000000006 10000000007 10000000004 .1: A spreadsheet computation of the standard deviation of sets of seven large numbers. 1: Survey ofExcel A 47 F 1 2 3 4 10000000000 I 2 3 4 5 6 7 100000 1 1000002 1000003 1000004 1000005 1000006 1000007 10000001 10000002 10000003 10000004 10000005 10000006 10000007 100000001 100000002 100000003 100000004 100000005 100000006 100000007 100000000 1 1000000002 1000000003 1000000004 1000000005 1000000006 1000000007 I000000004. and BinomDist. and still has not yet been fixed in Exce12002. do keep matters in perspective: how often in your lifetime do you expect to encounter computations involving numbers with eight or more significant digits? As long as you care only about their first seven digits. 1. .. You can readily verify the latter statements by consulting the Microsoft Knowledge Base article Q 158071 on "XL: Problems with statistical functions and large numbers". A similar algorithm is used for the calculation of. Computers do not blush. Unfortunately. Yet. Comparison of the results on the two bottom lines suggests that the Excel function STDEVP fails when x has more than 8 significant figures. LogEst.0 3 3 3 2 2 2 2 I I 0 I 0 I 0 2 3 2 I 0 I 0 I 2 3 COU T 7 2 3 2 3 2 3 7 2 2 7 2 7 2 7 2 eq.g.
99201E·16 8.I.39 Foo/c 1.S7E16 4.772E15 1.1I022E·16 l.44089E16 4.11022E·16 1.01 8.22045E15 2.15.636E15 1. according to their descriptions in the Index. Considering the uncertainties in the input data analyzed.33067E16 3.99 R.43SE15 2.16E·17 5.15.33227E·15 1.36E17 8.1: The answers Fcalc obtained for the functions F =NormDist(x.9SE·16 2.e.11022E·16 1.91 7.31 ·8.22045E·16 2.187E15 l.079E15 1. NormInv and NormSInv can blow up to quite ridiculous numbers.82E·16 l.13 ·S.29E16 4.60E·16 7. de Levie. <lE16. it would be better if the answers for x ~ 8.2IE·17 4.66E·17 5.22045E16 2.87 7.75E·16 2.31E·16 x S.11022E·15 9.41E·17 3.33 ·8.19 Foo/c 6.20E16 1.11022E16 1. A similar but even more severe problem exists with its inverse functions NormInv(x.03 8.1 the results for large negative values of x are 'quantized' in steps of about 1.14 ·8.86 7.80 7. 1) or NormSDist(x).55112E16 5.88 ·7.252E15 2.04 8.53E16 2.771 56E16 6.11022E16 1.34 ·8.14E16 3.49E16 4.12 ·S.11 ·8.11022E16 1.32 ·8.24E·16 2.04E17 3. this will often be unimportant (i.9S ·7.40E17 4.22045E·16 2.44249E15 2.11022E16 1.33067E16 3.77636E15 1. when NormDist or NormSDist becomes small. it is prudent to assume.02 S.28 ·8.33E16 2.25 8.94 7.IS ·8..33067E16 3.30 were to read. even though the result is displayed with 16 digits.8S178E·16 8.D9 S. Advanced Excel for scientific data analysis F<a/.73E16 5.286E·15 1. Especially.05 8. Moreover.11022E16 1.l) and NormSInv(x) which.22 8.O. 3.93E·16 7.510E·15 1.97 7.11022E·16 0 0 0 0 0 0 0 0 0 0 Faacl 1.11022E16 1. When your results depend critically on more than the first 3 or 4 nonzero digits of the result.8SE·17 2.6IE17 7.O. 1.8865SE15 2.094E15 2.70E17 6.20 8.17 8.55431E·15 1.33067E16 3. that not all digits displayed by Excel functions and macros (including the custom macros used in this book!) are significant.g.21 8.92 7.88178E·16 7.74E·16 X ·8.27 ·8.44089E16 3.01IE15 9.88738E15 1.95 7.14E·17 2.920E15 1.84 7.02E16 9.07 8.l1xlO16 .08 ·S. The cumulative standard Gaussian distribution.77156E16 7.858E15 2.66533E15 1.98E·16 l.8IE16 3.65E·17 2..640E15 2. 3. are computed iteratively to within ±3xlO7 . In general.30 ·8.35 ·8..16 S.1I022E·16 F extlct 6.72E·17 3.42E16 1.e.85 7.l) or F =NonnSDist(x) for various (admittedly rather extreme) values of x. it is time to check the numerical precision of .44329E·15 1.22125E15 1. e.29 ·S.S3 7.IO ·8.66134E16 F""".28E17 6.90 7.43E·17 Table 1.79E·17 4. can be found in Excel with the instructions =NormDist(x.3IE·16 6.394E·15 1. in the absence of contrary evidence.15E·16 l.66454E15 2.11022E16 1.67E·16 l.22045E16 1. but it is still useful to keep in mind.38 ·8. then. where x denotes the probability. insignificant digits.10942E15 1.54E·16 1. and their correct values Fexact.22045E·16 2.37 S.26 8.48 x 7. i.O.06 8.22045E·16 2. As can be seen in Table 1.15 S.36 ·8.89 ·7. the area under the Gaussian distribution curve for zero mean and unit standard deviation.92E17 7.93 7.23 8. and it would be preferable if they were provided to ±lxlO16 rather than with the extra.S2 7.24 8.66134E16 5.11022E16 1.32E16 8.S1 7.5IE16 3.22E16 5.00 8.11E16 1.33067E·16 2.440S9E16 4. the standard deviations in the results will often be far larger than the computational errors of Excel).095E15 l.IOS62E15 2.55112E16 4.96 7.
Ch. and use other software packages for independent verification. calibrate your procedure with similar (but binaryincompatible) data for which you know the exact answer.011. in . practical consequences. It is not clear why Solver doesn't incorporate such a simple check itself. made by Frontline Systems Inc. Simon & J. By comparison with other makers of statistical software. As with anything else. While it is a welldesigned and very useful tool. Yours truly will much appreciate your suggestions to improve his macros. Not infrequently. 1: Survey of Excel 49 the method. Testing with a number such as 0. 1. if nothing else. This holds for Excel as well as for more specialized software programs. and the same applies to the custom macros presented here. Microsoft has not been very willing to improve its routines after its problems had been pointed out and acknowledged which. and included in Excel as well as in Lotus 123 and in QuattroPro. Here is another issue. at data analysis. Chapter 4 describes many applications of the nonlinear least squares routine Solver. and then learn how to write them. and especially when the results you obtain may have serious.375 would not be useful even though it is a noninteger. But if you deal with complex problems. For most applications. an adaptation of Visual Basic which. makes for poor public relations. There are possible errors every step of the way: at sampling. Excel will be fine. and to keep them all as small as possible. at measuring. and at the final interpretation of the results. 7 (1988) 197. and made more robust. a second run of Solver starting from the justobtained answer will produce a slightly improved result. at which point we will often switch back and forth between the spreadsheet and its macros. Data Anal. The language used in Excel macros is Visual Basic for Applications (VBA). see S. user beware.375 = 3/8 and is therefore represented exactly in binary format as 0. and the number of significant figures needed is small. it is a good general precaution always to repeat Solver to see whether its answer is stable. and it is best to know about them. refined. Algorithms are continually debugged. Preferably it should be checked with nonintegers that cannot be written exactly in binary notation. D. Lesage. as long as your data are relatively few. Custommade functions and macros (including those in this book) are also likely to contain errors that crop up only when algorithms are put under some duress. Compo Stat. but available software packages may not always incorporate such improvements. because 0. P.16 Mismatches between Excel and VBA In this book we will first use macros.
Basic lost some of its less convenient features.. atn(x) in VBA for the arc tangent. for a random number. As in any marriage. which in Excel is therefore best coded as. negation comes before exponentiation. In VBA.0) = 2. whereas it is the other way around in VBA: 3"4 = 81. Microsoft has done little to soften the resulting conflicts. Here are some more beauties: in Excel. and they did not always match. and arctan(x) in Excel vs.5. such as line numbers and line interpreters.5. In Excel the sign of x is obtained with sign(x).50 R. =EXP (. Visual Basic was combined with Excel in version 5 (Excel 95).0) = 3. round (2. but that is a different story.0) = 2 and round(2. whereas Excel automatically computes everything in double precision. Fortran has morphed into a much more powerful language with Fortran90 and Fortran95. VBA does not . Advanced Excel for scientific data analysis tum. To avoid confusion and the resulting ambiguity it is therefore best always to use brackets when a minus sign is involved: (3)"4 = 81 and (3"4) = 81 in both Excel and VBA. VBA rounds 0. and became more like Fortran77 (for Fonnula translator).0) = 3. Another annoyance is that the order in which simple arithmetic operations are performed in Excel and VBA is not always the same. both partners brought in their own characteristics.(A3 . In the same category you will find rand(x) in Excel. Vx must be coded as sqrt(x). is an evolutionary development of Dartmouth Basic (an acronym for Beginners Allpurpose Symbolic Instruction Code) via Borland's Turbo Basic and Microsoft's QuickBasic. round (2. The worst offender in this category is perhaps the logarithm. ebased logarithm. Both are a consequence of the threeletter codes used in VBA. e. as in round(2. as in =exp[(xcil. while the Excel function rounds up for x> 0 and down for x < 0.5.((A3$B$1) "2)) or =EXP (1* (A3$B$1) "2) because = EXP (.5 to the nearest even integer. which in Excel (and in almost everyone else's nomenclature) is written as In(x).g. Matters can get especially confusing when a negative sign is used all by itself.5. so that 3"4 = 81. de Levie.e. Perhaps most annoying is the fact that VBA uses single precision arithmetic as its default. In the meantime.$ B $1) "2) will square (and thereby cancel) the first minus sign. when both were already mature products. in VBA as sgn(x). Unfortunately. i. and md(x) in VBA. in VBA as sqr(x). the successor to a computer language developed at IBM in the 1950s. VBA must therefore be reminded to use double precision. Along the way. log(x) represents the natural. a few of which we will illustrate below. In Excel..
So. it has very limited capabilities for displaying threedimensional objects. It is primarily designed for business applications. log(3) yields 0. specialized software will often be required. in what everyone else would write as log(x) or In(x)/ln(lO). Excel is eminently suited. but in VBA we find log(3) = 1. Excel cannot handle very large data arrays. one should use more appropriate software. so that it must be calculated as log(x)/log(10). whereas a semicolon may be used as such in Europe. ease of use. as illustrated in the function Logarheads. in which case we obtain 0.47712. 10g(x. for many relatively mundane problems. and go for it. spread the sheet. However. 1: Survey ofExcel 51 even have a symbol for the tenbased logarithm. and convenient graphics. On the other hand. no scientist should rely on just one type of data analysis software.Log(" & x & ") = " & Application. Go figure! Function Logarheads(x) MsgBox "Log (" & x & ") " & Log (x) & " but" & Chr (13) "Application. That does not mean that we should try to do everything with Excel: just as no carpenter will go to the job with only one tool. closedform mathematics. A case in point is the use of the decimal comma (rather than point) in most continental European languages. on the other hand.Ch. In all such cases.1O). as in Application.Log(3). because Excel and VBA may provide different adaptations to languages other than American English.0986. because it combines general availability with transparency. In the US versions of Excel and VB A. has no fewer than three ways to represent the tenbased logarithm of x: log(x). Fortunately it incorporates many features that make it very useful for science and engineering. . and its ability to accommodate custom functions and macros greatly extends its already considerable power to solve scientific data analysis problems. and is marketed as such.47712. For instance. 1. in VBA we can refer to the spreadsheet function. In Excel. Excel.17 Summary Excel is a powerful spreadsheet. and it cannot do formal. the comma is used as a general separator in both Excel and VBA. For special problems. and 10glO(x).Log(x) End Function & An additional set of problems is encountered outside the US. which makes it both ubiquitous and affordable. The macros described in this book may therefore have to be modified to run properly in such environments.
McGrawHill 2000. both for a general (often businessoriented) audience. you may want to consult the beautiful books by E. available from Graphics Press. S. you have much information at your fingertips in the Help section. and those specifically written for scientists and/or engineers. 2nd ed. with its two most valuable chapters tucked away on the accompanying compact disk. Excel for Scientists and Engineers. microsoft. F or the graphical presentation of data. the more impact it will have. de Levie. .. use color sparingly: moderation marks the master. go to your local bookstore. which describes many useful shortcuts. Advanced Excel for scientific data analysis 1. Billo.18 For further reading There are many introductory books on Excel. Orvis. S. Wiley 2000.Box 430.O. public library. For other books. In the latter category we mention E. Excel for Engineers and Scientists. college library. Sybex 1996. J. P. Wiley 2001. Bloch. The Microsoft manual provided with Excel is quite good. the simpler and clearer the graph. Excel for chemists.. Spreadsheet Tools for Engineers Excel 2000 Version. In general. and W. B. which like Bloch's book hardly mentions custom functions and macros. C. or (if you will not miss browsing) to a webbased bookseller. J. com. especially The Visual Display of Graphical Information (1992) and Visual Explanations (1997). Cheshire CT 06410 or from web booksellers. Gottfried. Tufte. moreover. despite its age still worthwhile. All Microsoft manuals can be consulted and searched on http://support. 2nd ed. and clearly lays out the differences between personal computers and Macs.52 R. For best effect.
one of the most widely .Chapter 2 Simple linear least squares All experimental observations are subject to experimental uncertainty. but merely of the fact that a useful theoretical framework exists for their treatment. how far from true they are. i. readout instruments may have limited resolution. In this and the next two chapters we will be mostly concerned with precision..e. instrumental distortion. Measurements may be distorted systematically by interfering phenomena. seemingly random fluctuations in experimental conditions. which is sometimes called 'error'.e. i. the amount of inaccuracy can at best be guessed. Chapter 6 will briefly discuss some known sources of systematic uncertainty.e. because most experiments leave wriggle room for a multitude of small. amplifiers may magnify the effects of thermal fluctuations. many measured properties are inherently stochastic. a criterion that does not imply a value judgement regarding its relative importance.) Moreover..e. This chapter will illustrate the method of least squares. etc. Such noise affects the reproducibility of the measurements. i. measurements may exhibit 'noise'. (Since the truth is not known. have a small amount of randomness because of the discrete nature of mass (atoms. and their possible remedies. We seldom have or take the time and tools to analyze the sources of such noise. This is no reflection on their relative importance visavis systematic sources of uncertainty. molecules) and energy (quanta). and they are often of little interest as long as they can be removed without seriously affecting the underlying information. We can often distinguish two types of such uncertainty. from which one tries to extract much smaller amounts of more meaningful numerical information. or any number of factors that affect their accuracy. its precision. Even assuming that all experimental artifacts could be removed. Scientific experiments typically generate large volumes of data. as when the margin of error in Votomatic vote counting machines exceeds the margin of votes for one presidential candidate over another... Only rarely do they make the evening news. with random fluctuations and their reduction or removal. i. faulty calibration. in a process often called data reduction. Books can only teach what is known.
We do not explain the basis for this or other statistical techniques. and averaging the result.e. This can indeed be expected for many errors that are essentially random. we will usually get a similar reading. But. . we have split the discussion of linear least squares into two parts. Apart from major identifiable changes (the power just went out) or unintentional ones (we transposed two digits when noting dOWfi the result). where necessary we will use additional tools to facilitate the application of this method.. e. and highlight what choices the experimenter must make. Least squares analysis is based on a single Gaussian distribution of errors. The question then arises: which is the correct result? There is no general answer to that question. i. someone may have opened a door and let in a draft. this is· most likely caused by possible fluctuations in some uncontrolled parameters: the temperature may have drifted somewhat between measurements.54 R. they are systematic rather than random. since that would require a text all its own. whenever necessary. Excel provides convenient facilities for least squares analysis. Advanced Excel for scientific data analysis used techniques for the extraction of such essential information from an excess of experimental data. and a large number of those have already been written. although some experimental errors (such as might result from. or a slow increase in room temperature during the morning hours) can introduce a bias. or for any number of other. the presence of an impurity. on the assumption that any experimental errors tend to 'average out'. and weighted linear least squares. and we usually deal with it by making several repeat observations. When we repeat the measurement under what appear to be identical conditions. there may have been a glitch on the power line because an instrument in an adjacent room was turned on or the elevator motor just started.. 2. multiparameter. In the present chapter we survey the simple applications of unweighted linear least squares methods to the proportionality y = a\x and to the straight line y = ao +a\x. we will explain some of the more practical features of least squares methods. while chapter 4 will cover nonlinear least squares. often not readily identifiable reasons. Chapter 3 will deal with its extensions to include polynomial. de Levie.g. but not necessarily an identical one.1 Repeat measurements When we make a measurement. we obtain a reading. In order to keep the present chapter within reasonable length.
1.1.1) to calculate the average of the data in column B with the instruction =SUM (B3: B1003) /1000.Yav)2 = i=l (2. is called the residual. and divide by N . and carried forward to the top of the next page when necessary. such as its standard deviation N L(Yi . highlighting the top two cells containing the row labels.) We can also get an estimate of the likely statistical uncertainty in that result.Yav).1. On a sheet of paper. This is a large data set. making the bottom of a column rather hard to find.l. We will therefore use the more explicit notation Yav.2) or the associated variance L(Yi . It is therefore convenient to place totals and related derived quantities at the top of the spreadsheet. make it (which is much easier to do in a spreadsheet than in a job) by. In Excel 2000 you need not even enter the closing bracket. But it is much easier to use the single command =STDEV (B3: Bl 003) instead. CtrluShiftu. add them up. the columns can be very long. Shiftu. (2) In order to calculate the average. running from 1 to N. e.1.2). (In many fields of science and technology the superscript bar has a special.i = (Yi. type =AVERAGE ( . you could use (2. The numbers we obtain for the average and standard deviation over a large number of data points are much closer to their 'true' value (which . make a new column containing the squares of the residuals.2. fieldspecific meaning. Yav or y.Ch. in order to compute the standard deviation. .1: (1) We use the data generated in exercise 1. which we cannot possibly display legibly on a typical computer screen. and Enter.1..Yav)2 s= i=l Nl N (2. identifies the individual observations Yi.1) i=l where the index i. totals are typically placed at the bottom of columns.1. rightclicking. but it is more convenient to use instead =AVERAGE (B3: B1003). you could use (2. of N equivalent observations is defined as 1 N Yav = N LYi (2.3) N1 Nl where the difference between the individual observation and its average value.1. 2: Simple linear least squares 55 The sample average or sample mean. If you don't know how long the column is. in one command.g. then use ShiftuEnd.l.). If there is no room at the top. (or. and it is also difficult to use in Excel. and then selecting Insert. click on the top cell of the column (B3). In a spreadsheet.1.1. (3) Likewise. V = s2 = i=l Exercise 2.
a = 1 . a = 0.99 or 99%.56 R.. and calculate the resistance R of the resistor from Ohm's law as R = VII. in %. But often it is more efficient to make measurements at several different currents.1. Advanced Excel for scientific data analysis in exercise 2. because we can then use the data also to test the applicability of the .1 we know to be 10 and 1 respectively. 9) in case you did not insert the two additional lines atthe top. and have nothing whatsoever to do with how reliable or confidenceinspiring they are. s. a = 0. we would get their 'correct' values.1 (continued): (4) Calculate the 95% confidence limits of the first 9point average of exercise 1. (2. corresponding with a confidence of 1 .05. Yav multiplied by a factor that reflects both the finite sample size and a specified 'confidence level'.1. the standard deviation of the mean is YN smaller than the standard deviation of the individual measurements. etc. If we could take an infinite number of measurements. N) where a is the assumed imprecision. Exercise 2. The confidence interval is found with =CONFIDENCE (a. The term 'confidence' would appear to have been chosen deliberately to blur the distinction between accuracy (i..05 for 5% imprecision.e. D5.1.JN A sometimes more realistic measure of the uncertainty in the mean is the confidence interval (or confidence limit). 05. Note that the confidence intervals delineate the likely range of the reproducibility of the data.2 Fitting data to a proportionality Say that we apply a constant current I to a resistor.9) .4) Sav =sJ. and N counts the number of data points analyzed.05 = 0. We can repeat the measurement.1. which is the standard deviation of the mean. We could make a single measurement.07.1 using the instruction =CONFIOENCE (0. and the uncertainties would all be truly random. de Levie. or =CONFIOENCE (0 . 2. and then analyze the resulting data as in section 2.e. i.0. We will assume that Ohm's law applies.95 or 95%. is the standard deviation. how reliable a number is) and precision (how reproducible that number is when the measurement is repeated under the very same experimental conditions). we must make do with the sample average and sample standard deviation. as it has been shown to do over many orders of magnitude. and measure the resulting voltage difference V across its terminals. For a sample of N observations. The second parameter. because these are synthetic rather than real data) than the averages and standard deviations for smaller subsets of these data.2.01 for a confidence of 0. s. But for a small number of observations.
In any case. we need to use the expression (2. When we plot V versus I we expect to see data that. the resulting analysis is nontrivial. fit to a straight line with a slope R passing through the origin of the graph. the choice is sometimes made merely for mathematical convenience. In other situations. for the proportionality y = al x. In this case we must fit the observations to a given/unction. If the disparity between the uncertainties in the two variables is sufficiently large.al x. 2: Simple linear least squares 57 assumed proportionality. apart from the noise. here Ohm's law. in general. which we will then assume to be errorfree. Below we will assume that the measurement of V is the more imprecise one. In order to extract the value of the resistance R from such data.2) where.2.2. We can now define two standard deviations. In general. one of them usually has a smaller uncertainty than the other.Ch. and the choice of model is always a judgement call. by convention.2) can be derived by minimizing the sum of squares of the residuals l1 i. V=IR. the experimenter must decide what model is the most appropriate to use in conjunction with the least squares method. the other being called the independent one. and the index 1 for the slope al is given for the sake of consistency with subsequent extensions.). this is often an acceptably small price to pay for a considerably simplified analysis.l) or. and SI the resulting imprecision in the derived slope al. Usually it will be clear from the type of measurements made which variable is the dependent one. which in this case are defined as l1i = (Yi . The corresponding expressions are . Any least squares analysis depends on a model. Fortunately. (2. because the numerical values of both V and I will be subject to experimental uncertainties. y is the dependent variable. In that case it is reasonable to focus on the more uncertain parameter.2. the socalled dependent variable. so that I will be taken as the independent variable. Equation (2. where Sy characterizes the stochastic uncertainty in the overall fit of the data to the theoretical model.
2. and N3 contain labels for y. we make them easy to read as one unit. In B4:BI0 deposit some xvalues. B3. but Excel makes that unnecessary. Exercise 2. select !2istribution: Normal. we could use the spreadsheet and (2. and directly below it the standard deviation s\. and in cell CI the label sn =. activate Qutput Range.3. enter N4:NlO. and in N4:NlO some Gaussian noise. x.58 N R.2. FALSE. by holding down the Control and Shift keys before and while depressing the Enter key.3.. (2) Highlight an empty block. is the simplest (and most terse) least squares fitting tool Excel provides. such as E4:E5. Advanced Excel for scientific data analysis 2>~7 sY N L(Yi alxi)2 i=l = i=l ~x?~yf (~XiYJ (N Nl Nl l)L>l i=l N (2. 2.) Make A3. type =LINEST (A4 :A10.3) and (2. you . In cell AI. Had you forgotten to hold down the Ctrl and Shift keys while depositing the LinEst instruction. Sy. in its top cell.1: (1) First we make a set of mock data on which to practice. B4 :B10.2. in column N. because it has these operations already builtin. and copy it to A5:AlO. This is necessary to let the computer know that you want to enter an instruction into a block of cells rather than in a single cell. and noise respectively. the value of the slope at. activate the corresponding window. we have put the noise out of sight.2) through (2.2.) In A4 place the instruction =$B$1*B4+ $D$1*N4. then press OK or Enter. In cells Bland D 1 place some associated values. de Levie. (Use Iools => !2ata Analysis => Random Number Generation. and rightjustifying the associated labels.2.3 LinEst LinEst. one cell wide and two cells high. (By leftjustifying these.1.4) to compute a). In order to keep the monitor screen uncluttered. deposit the label a1 =. i. where sn denotes the standard deviation of the noise. except for the specific numbers. such as 1 (1) 7. for linear estimator. TRUE) and enter this with the special instruction CtrluShiftuEnter. The spreadsheet should now resemble Fig. The selected cell block will now contain.4) Again. and Sl respectively.e. why in this order will soon become clear.
see (2. see (2.4). and YY respectively. for false or true respectively). You will find the value of Sy in cell E6.3.2. now highlight block D4:E8. The other information: D6 contains the square of the correlation coefficient. in H4 the command =A4*B4. The arguments of LinEst are the ranges of the y. the intercept (here zero by definition).9489 9.3002 1. Copy these down to G5:I10. the value of Sy.2. The instruction in cell A4. Unfortunately.1984 1. for once.1466 10 15 . A 59 N Y 2. You can find that out by using Help => hontents and Index. In cells G3:I3 place the labels xx.2.FALSE. Lxy. and whether you want the associated statistics (again 0 or 1. so that these cells now contain the sums ~.7397 14. at. To guard against the possibility of taking the square root of a negative quantity in the instruction in Hl4 you might instead want to use =SQRT (ABS ( (G12*Il2Hlr2) / (6*G12) } }. verify that these numbers are indeed correct. since there is no intercept). that of the number of degrees of freedom. and again type =LINEST(A4:AIO.3.2).and xvalues respectively. (4) We will now. I for true.2777 0.3445 6.1: The spreadsheet with some test data. and your spreadsheet should now resemble that in Fig. (Excel uses 0 for false.2443 1. 2. The normally outofview cells N4:NlO contain Gaussian ('normal') noise with zero mean and unit standard deviation. Then change sn to a value larger than used earlier. and observe its effects. and the residual sum of squares. as =H12/G12. the values for al and Sl will be the same as those generated by LinEst. 1836 Fig. 2: Simple linear least squares would only have obtained the top answer.2765 1. and in 14 =A4A2.Ch. then the absence (0 or false) or presence (lor true) of an intercept. In G12 deposit =SUM (G4: G10). But where is the value of Sy? Go to cell D4. B4:BIO.B4:BIO. from top to bottom. D7 the value of the Ftest. this output is rather cryptic since no labels are provided to tell you what is what. XY. and Ll respectively. in column E you find. and copy this to H12 and Il2. its standard deviation (not applicable here. by then typing LINEST. =$B$l *B4 +$D$l *N4.6633 x 1 2 3 4 5 6 7 noise 0. In cell Gl4 compute the value of al using (2.7331 2. and in 114 find SI as =H14/SQRT (G12).3).1).2400 4. .O. but (because of the added noise) not quite match it.4553 11.) Pressing CtriuShiftvEnter will yield the answer. in Hl4 calculate Sy as =SQRT((G12*Il2H12 A2}/(6*G12}}. was copied to cells A5:AlO. Check this by changing the value of sn in Dl to O. (3) The value found for at should be close to that in cell BI. in cell G4 the instruction =B4A2. and by clicking on the LINEST worksheet function. (5) If you have made no mistakes.TRUE) or the equivalent but somewhat shorter =LINEST(A4:AlO. 2. and D8 the regression sum of squares.2.
It can also generate useful auxiliary graphs. using CtrluShifiuEnter. LinEst is convenient. and exit with OK. and select a cell for the Qutput Range next to or below your data. and the Input X Range as B4:BlO. . and Sy as Standard Error. and as X Variable I the values of GI and SI.1836 Fig.9489 7 9.1: (1) The Regression routine in the Data Analysis Toolpak is somewhat more userfriendly than LinEst.2: The spreadsheet of Fig. as illustrated in Fig. de Levie.6633 x 1 2 3 4 5 6 7 2. But don't forget to deposit the LinEst instruction in the entire block you want (e.30584 0 0. The data at the bottom come with the two (optional) graphs discussed in the next paragraph.4. In the top set.2765 1.3445 6. 2.5 noise 0. You will now find three sets of data.1984 1. 2.342 0. otherwise you will only see the value of al' A 2 Y 2.7331 2.0244 #N/A 0. because the output block is large. and cryptic. and doubleclick on Regression. as illustrated below. (In case you do not find nata Analysis under Iools.1. If the Analysis Toolpak is not listed in the dialog box. Exercise 2.99666 0.60 R. Another dialog box appears. select both Analysis Toolpak and Analysis ToolpakVBA.1 with the results of LinEst in block D4:E8. and will overwrite and erase any data in its way. Click on Constant is Zero (for the zero intercept). You get it with Iools => nata Analysis.4553 8 11.08 6 149. Click OK. if you want to see aJ. but may overload you with statistical information. 2. here A4:AIO and B4:BlO respectively. select Iools => AddIns. It updates automatically when you change one of the y and/or x parameters.4 Regression Regression is Excel's most extensive tool for least squares fitting of data. In the second block of ANOVA you will find the zero intercept and its nonapplicable standard deviation. SJ.g. use the scroll bar to the right of the list to see items too far down the alphabet to be displayed.3.2443 1.. as long as they fall within the ranges specified for them. It yields a large (some might say excessive) amount of statistical infonnation on the fit. It allows you to select what statistical information you want to display. you may have to run the Setup program to install it from the original CD or diskettes.3002 1 .3.1466 10 15.2400 4.) In the Data Analysis dialog box. and SY' select D4:E6). 2.2777 0. in which you enter (by typing or pointing) the Input Y Range as A4:AIO.28868 1792. compact. Advanced Excel for scientific data analysis (6) In summary.7397 9 14. the correlation coefficient is listed as Multiple R. labeled Regression Statistics.4.
065887152 2. LinEst is a function.9%663115 Adjusted R Squa Standard Error Observations 0. and similarly the sons of short fathers are not as short as their . To quote K. no corresponding statistical information for that intercept is available. all by dragging and clicking.30584072 4. and to distinguish more clearly between data points and fitted line.3. and updates automatically whenever the input changes.365539561 2.0% #N/A #N/A X Variable 1 2.365539561 RESIDUAL OUTPUT Observation Predicted Y Residuals 0.288676399 ANOVA df SS 149.246141878 2.223362879 .45466Ell 2. Figure 2.61168144 3 4 6.084172 Significance F 1. (3) Note that the Regression routine in the Analysis Toolpak is a macro.4. which you can move around on the screen. 409: "Galton observed that on the average the sons of tall fathers are not as tall as their fathers. (2) Repeat the same analysis.3 shows them after some adjustments to emphasize data rather than labels. but click on the square windows to the left of Line Fit Plots and Resigual Plots.311582302 0.2. On the other hand.917522159 9.083334063 Total Coefficients Intercept Standard Error #N/A t Stat Pl'ulue #N/A Lower 95% #NJA Upper 95% #N/A Lower 95.210466445 0. change their colors or other features. You can also enlarge or reduce them.38776E·07 Regression Residua] 6 0.500004381 149.024397609 94.4. 2nd ed.0% #N/A Upper 95.031329302 0.3416561 F 1792. and will not be used here other than to refer to the Excel macro of that name.30584072 0.246141878 2.5109313 9. and needs to be invoked every time the input information is changed. A. whatever..1. 2: Simple linear least squares SUMMARY OUTPUT 61 RegressIOn Statistics Multiple R 0.477602566 11. illustrated in Fig. 2. Brownlee from his Statistical Theory and Methodology in Science and Engineering. 2.Ch. As the intercept was set to zero. Wiley 1965 p. On pressing OK you will now see two graphs.267218073 0. the founder of the eugenics movement. Like much of the work of Galton.1: The lavish output generated by Regression upon its application to the 7 data pairs in Fig.5292036 6 7 13.14088504 Fig. the term "regression" is quite misleading.0.3416561 MS 149.4.231931829 0.998330163 R Square 0. 2.8416605 0.83504432 16.829996448 0.
ldu..s.62 R. highlight the data in the two adjacent columns (without including their labels) and call the macro. so the first generation tends to regress towards the mean.. The term "regression" implies movement back towards something.__.... 2. the righthand column the corresponding independent xvalues. in this case 'regression to the mean'. But if we look at the data the other way round.. we find that on average the fathers of tall sons are not as tall as their sons and the fathers of short sons are not as short as their sons.I ~____L _ _~~_ _~L__. de Levie. Advanced Excel for scientific data analysis fathers. I R. and will therefore be discussed in section 2. if we select the tallest individuals of one generation.e.. 2.3."=. It comes in two flavors: LSO fits data to a line through the origin.bl.. the second generation tended to regress towards the mean. as long as the average size remains constant.2: Two plots produced by Regression upon its application to the data in Fig...... We should not compare a biased (tallest) subset with an average.1 PIOI .:FI Fig..:F. i. LS.5 LS For reasons that will become clear later in this chapter. for least squares. we also provide a custom least squares macro. a directionality that simply isn't there. and compare them with the averages of the generation before and after them.'''''''''''''''''' ~ 0 ~ .J X arillblr I +. so this phenomenon has been called the regression fallacy. It seems implausible that both statements can be true simultaneously. X Varia ble I Line Fil Il lo r 20 r." Clearly. This will provide the slope al in bold italics in the . V:trlab le I X Vorl.1. then both their fathers and their adult sons on average will be smaller. or be surprised that the two differ.~i+. while LS 1 allows for an arbitrary intercept.7.. The lefthand column must contain the dependent variable y. adjacent columns. I r""""""""'''''''''''''''''''''''''''''''''''''''''''''''''.4.. The input format of LSO requires that the dependent data y and independent data x be placed in two contiguous. In order to use LSO.. 2.
3445 6. deleting background color.3. Sy.0244 0.6633 x nQise 2 3 4 5 6 7 2.5.l. 1 36 12 Sf. 20 10 Line Fit " 0 0 2 4 6 X 8 Rc idllals R 0 I 0 0 0 0 0 t 0 0 0 2 4 6 X 8 Fig.2777 0. and below that the standard deviation of the fit. Exercise 2.7397 14. cleaned up by rescaling. SY' answer the input box affirmatively.3: The plots of Fig.2443 1. 2. 2. underneath it the standard deviation Sl in italics.2765 1. moving labels inside the graph.7331 2.1: (1) Start again with the layout of Fig. or LSO. it is usually good practice to make a plot of the experimental data (as points). Regression.5.3002 1. Highlight the two columns of data (in this example.94 9 9.1466 15. 2: Simple linear least squares 63 column for x.2887 Fig.4. The spreadsheet should now resemble Fig.2.4553 11. and representing measured data by points.1. A 2 3 4 5 Y 2.: 13 0. block A4:B I 0) and then call LSO.Ch.Dev. 2. Whether we use LinEst. 2.19 4 1. directly below the data.4.2400 4. the assumed function by a curve. and add to that .3058 0.1: The spreadsheet after using LSD. (2) If you want to display the standard deviation of the fit.5. 2.
4. Click OK. de Levie.3. nor can it provide any statistical estimates beyond the square of the correlation coefficient r xy . likewise.6..1: (l) Trendline is simpler. thickness. In this case it is inconvenient that we have y in the first column. etc. rightclick. and x in the second.3058x o o 2 4 6 x 8 Fig.et intercept = 0 and Display ~quation on chart. In all cases the results should look like those in Fig. though not the associated imprecision estimates. 2. Regression will make these plots automatically when you ask for them.6. The equation for y will now appear in the graph. but it cannot fit to part of a curve. Trendline is a very convenient tool to determine the unknown parameters.1. with LinEst and LSO you must make them yourself. than the above approaches. using Insert :::> Qhart and the Chart Wizard. 2. For noisy data it is also advisable to calculate and plot the corresponding residuals. because a systematic trend in the deviations may reveal a flaw in the model used. You can click on the line and change its color.64 R. (2) Click on the data in the graph to highlight them.1: Trendline automatically plots the computed curve through the data in Fig. (3) Trendline automatically updates as you change any or all of the data in the graph. change its font.6 Trendline When we already have made a graph of the data to be fitted.6. (Alternatively. It requires that you have a graph of the data. because that will give you a graph of x versus y. but also more limited. and select Add Trendline.g. e. first exchange the positions of the x.1 when these are presented graphically. 2.) In the dialog box select Type Linear. Therefore. you can move the equation around and. together with a line representing it. which in this context is mostly a noninformative feelgood parameter. first make a graph of y vs. 20 Y 10 y=2. Therefore. but without useful imprecision estimates. and under Options activate both S.3. 2. Optionally it can also show the corresponding equation. click on Qhart :::> Add Trendline. Advanced Excel for scientific data analysis graph the computed line. . 2.and ycolumns.. Exercise 2. after highlighting the data. x on the spreadsheet. Compare your results with Fig.
which will be introduced in section 4. In that case (2. with intercept ao with the vertical axis.ao alxi).e.2. Because Greek symbols.2. and al respectively.7.1) (2. The doubling of the indices on the variances anticipates the introduction of covariances in section 2.7.4) (2. As before.7 Fitting data to a straight line Often we deal with a function that can be expected to fit a straight line.3) and (2.a1xJ 2 N2 (2. we will often abbreviate the sum as S..2) where (2. instead of (2.Ch. you will not need to evaluate the sums in (2.4) we now should use v YY = S2 = ~ = . Likewise. ao.!.alxi with respect to ao and al respectively.7.~Xi ~XiYi N N N N J! /D (2.7. the residual tl i as R.7.2.2.ao .7. and with slope al. and the corresponding sum Ltl2 j as . Solver. tl j = (Yi .6) N N2 voo =s~ st =Vyy txl / D / D Vl1 = = Vyy N for the associated variances v (the squares of the standard deviations s) in y.4) anymore! 2. Equations (2.2) must be replaced by ao = ( ~xl~Yi . as in y = ao + alX. Therefore you can rest assured that.2) can then be derived by minimizing the quantity Ltl2i = L(Yi .7.1) and (2. 'theoretical' expressions. with Excel. i.1. we will encounter yet another.2) through (2.9.5) (2.l_ _ _ _ __ Y L~~ N L(Yi ao . 2: Simple linear least squares 65 Even though we have so far already described four least squares routines.2. ao + alXj.7. subscripts and superscripts are somewhat awkward to use in spreadsheets.3) for the intercept and slope respectively.:i=:c!. the residuals tli are the differences between the observed quantities Yi and their assumed. its square tl2 j as RR.
On the other hand. as is often the case with least squares analysis. or those of the corresponding standard deviations. the covariance. We will therefore briefly explore the propagation of experimental uncertainty.1. This occurs when one wants to estimate how the experimental imprecisions in the slope and intercept work their way (or 'propagate') through a calculation to affect the experimental uncertainty of any subsequently derived results. and ask . In general.9. confidence limits. and the new parameter involved. In Regression. de Levie. For the custom least squares macro we select LSI rather than LSO. all we need to do is to specify the third argument as True or 1. Simple propagation o/imprecision The parameters produced by a least squares fitting of experimental data are seldom the fmal answers sought. etc. 2. the resulting parameter values are in general mutually interdependent. before returning to least squares data fitting per se. we do not activate Constant is Zero. We will now consider how the standard deviations obtained in the least squares data fitting propagate through any subsequent computation to affect that final numerical result. we do not activate ~et intercept = O. signifying the presence of an intercept. when fitting data to a straight line. While we could use the spreadsheet to make the above sums. In LinEst. after addition of a constant to the expression for y. Often they need to be combined with other numbers or otherwise manipulated to generate the numerical end results of the experiment. And in Trendline. but may complicate using these values in subsequent computations. This does not affect their values. the second a Square. because the latter forces the fitted line through the origin.66 R. We will first consider the relatively 'simple' case in which the input data to such a calculation are mutually independent. the slope and intercept will not be independent quantities. 2. where the first S signifies a Sum.and we have already seen that it is readily available. when two or more data are derived from a single set of measurements. We therefore consider a function F computed from one or more mutually independent experimental parameters Xj that have associated imprecision estimates (standard deviations.). The general treatment appropriate to that more general situation will be given in section 2. Advanced Excel for scientific data analysis SRR or (in order to fit conventional statistical notation) SSR.3. the expressions are now becoming so complicated that spreadsheet help is welcome .8. You might want to try them with a data set such as shown in Fig.
Y depends on a single parameter x. or by pointing to that block with the mouse. and z. and A3 enter the labels x=. and Z respectively.8. Exercise 2. but in general it is easier to use the general formula vFF sF which reduces to 2 Ij ['X. followed by OK or Enter). and you will need to write it down. (2) When we use the spreadsheet.g. and Z= respectively. and z=. in cell B5 deposit the instruction =LN (4 *B1+3/B2) +B2 * (B3" 2) . Y. that of the standard deviations as DI:D3. We can use partial differentiation to find of/oX = 4/(4X+3/Y). exponentiation & log taking) one can formulate simple rules.B. for the example of Fig. from which the sought imprecision! follows as its square root.2) when. Below we will illustrate its application. and answer the input boxes. If C5 is used and cannot be overwritten. A2.. multiplication & division. and z. in the form of their standard deviations. and z in X. Y. 2. . y=. and that of the function as B5. y. y. and in cell B6 compute! using the formula given above.rd 2hI4.8. together with the values of their imprecision x. the answer will come as a message box. The spreadsheet might now look like Fig. and compute the associated imprecision! given the imprecisions x. but the macro Propagation in the MacroBundle does. In cells CI through C3 deposit the labels x=.l: (1) Say that we want to compute the value of F = In(4X+3/Y) + YZ 2. we will usually have numerical values for X. Specifically. In cells A5 and A6 place the labels F= and f= respectively.5. Z. 2: Simple linear least squares 67 what will be the resulting imprecision in the final result. In a few special cases (addition & subtraction.2 aj s 8F 2 J Xi   Ij [J2 a'X j 8F V xx· J J (2. and place numerical values for them in cells BI through B3. enter the location of the input parameters as BI :B3 (either by typing. i. we might compute the volume V of a cylinder from its measured diameter d and height h as .8.Ch.1. Note that we left cell C5 free because that is where the macro will deposit its result. busywork you can do without. and therefore also different results for F and f (3) Call Propagation. then ask how the imprecision in V is obtained from the experimental imprecisions in the measurements of d and h. except that you will of course have different values for the parameters X. Y=.e.8. Y. =SQRT ( ( (4 * D1 / (4*B1+3/B2))"2)+(3*D2/((B2"2)*(4*B1+3/B2))+D2*B3"2)A2+(2* D3*B2 *B3) "2) . and of /0 Z = 2 Y Z.1) (2. For example. Excel does not provide a convenient tool to calculate the imprecision in y given the known imprecisions in x (or in several x/s).. of/oY = (3/Y2)/(4X+3/Y)+Z2.1. e. Exactly how Propagation achieves its magic will be explained in section 8. and in DI through D3 their values. y. so that!2 = {4x / (4X+ 3/Y)}2 +{3y/[y2(4X+3/Y)]+yZ2}2+{2zYZ}2. x. and Z on that sheet. for j = I. 2. In cells AI.
9.1) reads VFF=(8FJ2Vii+(8FJ2Vjj (2. 2.79 f = 282.1) 8ai 8a j but when ai and aj are correlated parameters we must replace (2.2) Fig.2) where the covariance vij between the parameters ai and aj is defined as . using the slope a[ and intercept ao of a straight line determined by a least squares fit. Propagation does not update automatically.1) applies only when the various parameters Xj are mutually independent.2)+(3*D2/((B3"2) *(4*Bl +3/B2»+D2*B3"2). and without having to use that expression to find its numerical value.84 x=4 y= 2 z = 0. in which case ao and a) will be mutually dependent. but that is often not the case. The instructions shown below the screenshot merely serve as reminders of the formulas involved. if empty.8. When F is a function of two mutually independent parameters ai and aj. automatically.1: The spreadsheet as it looks just before calling Propagation. in cell C5.8. de Levie.1) by VFF = (8FJ2 Vii + 2 (8FJ( 8F J vij + (8F J2 Vjj fJa i 8ai 8a j 8a j (2. Since ao and aj are obtained from the same data set in a single least squares minimization.9. but without having to derive the expression for f. Because the input data are organized columnwise.. Try other input parameters. the answer from Propagate will automatically appear to the right of F.5 B5 =LN(4*B1+3/B2)+B2*(B3"2) B6 =SQRT(((4*DlI(4*B1+31B2»). Beyond that it does it all for you. Say that we linearly extrapolate data.2+(2*D3*B2*B3).8. a deviation in aj may be partially compensated by a corresponding change in ao.68 A I R. Note that. Advanced Excel for scientific data analysis 2 3 Z = 10 F = 2004..9. like most macros. (2. 2.. and other formulas.9 Interdependent parameters Equation (2. (4) VerifY that you get the same result in C5 as that computed in B6.
For more than two input parameters the general relation for the variance VFF of the function F( XI. as we will see in section 4.4) can have values between {(oFloai) Si .18.+ (oFloa)sj}2. We here treat the variance Vii as a covariance Vi} with i = j.9. the covariance matrix is a 2 x 2 square containing four data.3) The covariance has the dimension of a variance. Fortunately the macro LS does.(oFlOaj) Sj}2 and {(oFlOai)s.Ch. 2. such a matrix is a square array of P x P terms that contains all variances and covariances between these P parameters.2) vanishes.1 we will analyze this data set. but can be either positive or negative. Regression. VOl. Phil. Xi. XN) is v FF = = r(OFJ2 i Vii +2 oXi rr (OFJ[ of J i vi} fti oXi OXj rr( i of oXi j J[ of I oXj ) v·· lj (2. . was used by Pearson for a quite different purpose. shown in Fig. For a straight line.9. Often the covariances between the various parameters will not be known. . and instead write X = bo + b1y. 2: Simple linear least squares 69 (2. The absolute value of the covariance is limited by (2. and write y = ao + ajx. . because many standard software routines (including Excel's LinEst. intercept ao and slope aJ.9. How much difference will it make to neglect the covariances? As we will see shortly. and Propagation can use that information to compute the correct precision estimate for any derived function F.9.l. To illustrate this we will use a centuryold test data set from K. the two parameters ai and aj are not correlated. and the middle term on the righthand side of (2.9.. In exercise 2..4) so that (2. with only two parameters. Mag.9. VII. When vi) is zero. Then we will invert these roles. and Trendline) do not provide them. and X the independent one.. in two different ways: first we will assume thaty is the dependent variable. 2 (1901) 559 which. X2. Voo.5) The variances and covariances are most conveniently arranged in a covariance matrix (also known by the unnecessarily long term variancecovariance matrix). Pearson. it all depends on the computation used. This was one of the major reasons to create a custom least squares macro. and VIO = VOl.. For a function with P adjustable parameters.9.
5 s instead of ±0.(0. if we use the standard deviations of bo and bl rather than the covariance matrix. with the label CM. (3) Go to the results obtained under (2). In the first case we obtainy = 5. . Again highlight the data block.77 ± 0. and in response to its queries highlight the values for bo and b). and al (0. and again call Propagation.2 0).76 ± 0. we find a quite different standard deviation (±0. but a quite different result for that in bolb). (4) Now call Propagation. Propagation will calculate the standard deviation of bo/b). The covariance matrix will show in color. but the same answer for lib). though not perfect.9. Since bo and b) are highly correlated quantities.1 (continued): (5) Move the results just obtained one row lower (so as not to overwrite them). Why is this so? Because the calculation of the uncertainty in lib) requires only one imprecision estimate.76 ± 0. How do these results compare? We rewrite the second set of results x = bo + bl}' as y = bo/b l+ x/bl. and the cell in which you have just computed bolb).(1. The small differences in the coefficients are due to the different assumptions made about the sources of the experimental uncertainties in the two analyses. The actual imprecision in bo/b) in this case is 2Yz times smaller than one would compute by neglecting their mutual dependence.2o) for ho/bJ.540 ± 0. and call LSI. highlight them. computed here with the macro Propagation using the covariance matrix. de Levie. and below them calculate the values of bolb) and lib) respectively.9. Note that you find the same answer as under (4) for the standard deviation in lib).15 and 2. in the other x = 10. in terms of their standard deviations.044)' The agreement between the two slopes and intercepts is close.1: (1) Enter the data shown in the first two columns of Fig. and place this immediately below its value.l4) y. Combining within parentheses the coefficients with their imprecisions.042) x. Advanced Excel for scientific data analysis Exercise 2. in response to the second input box. 2. the covariance matrix. The same applies to the corresponding standard deviations. whereas that of bo/b) involves the imprecisions in both bo and b).9. Much more dramatic examples of such effects will be encountered in sections 2. Do the same for lib). their interdependence must be taken into account. except that. You have the data. Exercise 2. we can then compare aD (5. and call LS 1. Repeat the procedure outlined in (4).3 6 ± 0. you now enter the standard deviations rather then the covariance matrix. so that the column for x is not to the left of that for y.l9 .8 6 ± 0.70 R.16. (2) Copy the columns for y and x to a different location.566 ± 0.19) with bolb l (5.5 4 . and you can therefore verify this for yourself. and change their order. However.1 in a spreadsheet.042) with lIb l (0.540 ± 0.
Phil.9. Pearson.5659*B 16 copied to: C17:C25 D17:D25 Fig. 76120. one often treats the input parameters as mutually independent ones. though none of them as convenient as the combined use of LS and Propagation. 2: Simple linear least squares A 6 71 Y 5 cell: instruction: C16=5. of course.86170. analyzed either as y vs. 2. 2 (1901) 559.1: A data set from K. for the . other ways in Excel to deal with the propagation of imprecisions in derived results. Mag. even though the resulting imprecision estimate will then be uncertain.10 Centering When the covariances are not known. 2.Ch.5396*B 16 D16=5. x or as x vs. There are. y. For example. and may sometimes be quite far off.
since we treat them in general as mutually dependent.9. which are defined as v·· v·· r. when mutually dependent data may be involved.72 R. And when the mutual dependence is minor. it usually is of little or no consequence. i. ***** A short digression may be useful here.. The term "(linear) correlation coefficient" can mean different things in different contexts. Sometimes we are not so much interested in the absolute values of the covariances vij as in the corresponding linear correlation coefficients rij. If they are not. de Levie.2  LX . before any experimental data are collected. From the linear correlation coefficients rij and the standard deviations Si and Sj we can readily reconstruct the covariances as Vij = rij Si Sj. whereas rij = 0 indicates the absence of any linear correlation. by centering the data set around its average xvalue. use of the Propagation macro requires the covariance matrix rather than the linear correlation matrix plus the standard deviations. hi = 1 signifies complete linear correlation between ai and aj. depending on which quantities are being correlated.jViiVn SiSj (2102) . . Xav = (liN) LX. the covariance matrix will contain zeros in the appropriate places.. On the other hand. there is no need to check first whether the data are or are not mutually dependent.= IJ IJ =_IJ_ .e. Advanced Excel for scientific data analysis straight line y = ao + alx the covariance VOl between the intercept and slope can be calculated from the variances Voo and VII as v01 v10   v00 . This can always be done by redefining the zero of the xscale.. Consequently we can make that covariance vanish by making L x equal to zero. answering the question whether there is a linear correlation be . as when rij < 0. However.v11 . the more eccentric the xvalues. In many least squares calculator and computer programs (including LinEst and Regression) a correlation coefficient r or R (or its square) is displayed for the correlation between x and y. Incidentally.e... These show us immediately the relative strength of the linear correlation between the parameters ai and aj. For a linear relationship we can therefore avoid covariances by proper design of the experiment. i. by selecting the xvalues at which data will be taken in such a way that Xav will be zero.. the larger the resulting covariance.LX LX N (2101) .
10.1 illustrates this for data taken from table 1. Such contours will enclose an imprecision band. First we find the average X av . Applied Regression Analysis.l0.11) in N. The moral of this digression is. whether using a cordless phone increases the likelihood of brain cancer (it doesn't).10. as in the examples used in this book. Draper & H.. calculate a column of values XXav.1 on p. The vertical spacing between the line y = ao + a] (xxav) and the imprecision contours will then be given by (2. Exercise 2. 9 of that same book. Wiley 1981. This is most readily done by drawing imprecision contours at.g.. The linear correlation coefficient rxy in LinEst or Regression is described by a formula just like (2. and fit the data to the line y = ao + a](xxav). Smith. but also some imprecision estimate of that line. R.3) where s denotes the estimated standard deviation for the individual observations. 2: Simple linear least squares 73 tween these two input parameters. e. plus or minus one standard deviation. On the other hand.. ask yourself the question: correlation between what and what? When the result of a least squares fit to a straight line is plotted. The procedure is as follows. but its meaning is entirely different. the linear correlation rab deals with the correlation between the coefficients a and b. This is also why rxy should not be used as a measure of goodnessoffit. therefore: when you see or hear the term "correlation coefficient". a topic that will be illustrated in the next five sections. rxy will not be unity.Ch. This will yield the mutually independent coefficients ao and a] plus the standard deviations so. That may be an issue when one needs to decide. x and y. This result is fully equivalent to (1. and provides information useful for subsequent propagation of imprecision when both a and b are involved. s]. ***** . one may want to indicate in the graph not only the original points and the bestfitting line through them. we apply least squares analysis to problems with wellestablished causality. 2nd ed. e. Their construction is most readily appreciated when based on mutually independent parameters. as with an exponential or power law.2). or whether unprotected sunbathing increases the chances for skin cancer (unfortunately it does). when the relation between x and y is strictly causal but nonlinear. but is usually irrelevant when.4.g. and Sy. Incidentally. We use the coefficients ao and a] to plot the line y = ao + a] (xxav) together with the experimental data points. which it isn't.
or from the MacroSamples file.1.22 0 10. 2.15 10 9.9422 0. 178025 0.8290 10.6340 8. 1236 8.9772 .6 46.7 71.6123 8. 1 0 0. Therefore first reorganize the data: move the column with xvalues to the left of the column for y.3977 10.17. 1 44.0350 .7968 10. 3 7.94 9. for our purposes.8945 9.3399 11.1913 12..13 12. 1006 11.4426 34 2 1.0 70.3129 10.74 R.57 .1: The data from table 1..5 5 .9 125 0.3 159 11.942 1 9.3 72.5 57.9567 10. 11 6.24 10.1: (1) The data are for y (pounds of steam used per month) as a function of x (temperature in OF) but.9 105 0.9275 0.2821 12.14 . 33.9116 0.0 7983 0.27 .: y 11.424 0.3235 12.5017 10..9 101 0.9363 0.7309 11.4 35.8792 9.68 6.10..9096 0.6837 7.8 48..10.379 11.8870 9.2573 12.0626 9.5001 s 0.9437 0.8090 7. (2) Somewhere on the sheet calculate the average value of x (with the function =AVERAGE) and use this to calculate in column C the corresponding values of XXav' .9414 0.9 9.909 0.9425 ycalcs 10.5 76.9288 0.9787 7.3 6 1. analyzed as described in exercise 2.0088 8.6200 8.5 6.7392 6.3745 10. 050 10.2 5 . 11.9300 0.3 .9367 6.6 28.0328 8. 5.000 111 1 *.9258 0. Advanced Excel for scientific data analysis x 2 ... de Levie..09 9.9 24.7 30.7996 9.8 .10. If you enter them from the Draper & Smith book..5 9.7513 9.0267 9.51 10.4 74.4 10.8600 8...0350 7. 19 11.7 )'calc 11. they will not be ordered for increasing or decreasing values of x. 0.2 .9 101 0.9260 0. 425 8.5 8.13.0 190 7.05 16 7.9 188 0.1 58.9849 8.36 10.)1 Fig.03 1693 0. 2.7215 .1 28.5 9.73 7.889 1 .1 089 7.9363 0.7980 8.979 1 7. they will be merely y vs.8392 9. x. and click on the sort ascending icon on the standard toolbar..929 1 8.9392 0.2 6. Your data should now look like those in columns A and B of Fig.0024 6.36 xxav 24.4361 10.5829 9.0 12.9 21.86 . 1 8. Exercise 2.9610 8...10.7474 6.0753 8.19. 1 4.3 39.9743 10. but that is inconvenient for making good graphs. 1.8673 7.5576 ycalc+s 12.7 22.2521 11.9366 0.1 in Draper & Smith.4 46. highlight the data block of both columns.660 1 9.47 .1 74.9312 7.8867 11.5 24. Dev.9 189 9.010524 O.90 0.9066 8.82 .9307 0.0 23.7 I.1643 10.9 29. 4.8 59.4205 10..9092 0. 90125 eM: .0 70. 21.9610 8.4 70.0 6.1.1 089 7..675 7.9260 0.
in which case the covariances are needed. 2.3) for s. But since tastes differ. at least to yours truly.10. We can also draw these imprecision contours without centering. and paste them into the figure.Ch. these contours are slightly curved. and judge for yourself. Your plot should now resemble Fig. try it.3) as a function ofxxav. (6) In column E calculate s using (2.2: The data from table 1. copy them.1 in Draper & Smith. Specifically. with a minimal vertical distance from the fitted line at x = x av • Note that roughly 1 in 3 data points in Fig.10. instead of (2. and in columns F and G compute YcaicS and Ycaic+S respectively. then highlight the data in column F.10. 2. Because of the term in (xxav) in the expression (2.10. and need not calculate the data in columns F and G either. 13 75 12 II Y •• • 10 9 • • • • 50 60 • •• • • 70 X 7 6 20 30 40 80 Fig. 2. for the straight line y = ao + aJx.1.3) we should then use . and the disadvantage that it doesn't look as good. as one would expect for a single Gaussian distribution.10. (4) In column D calculate Ycaic = ao + al(xxav ) based on the justcomputed values of ao and al' (5) Plot the experimental data points and the computed line Ycaic versus x (in column A).10.2 lie outside the contour lines. analyzed as described in exercise 2. and call LSI. 2: Simple linear least squares (3) Highlight the data in columns Band C. (7) Alternatively you could use the data in column E to plot error bars of length S on the line depicting Yeaic' This has the advantages that you need not order the data as was done here under point (1).2. The covariance matrix should show (essentially) zero covariances. Do the same with the data in column G.1 0.
as ex . dry air volumes Vd are found as Vd = V (P b . where Ia is the confidence level. N .2 34. Advanced Excel for scientific data analysis (2. de Levie.0 4.20 28. The corrected. and are therefore much more strongly curved. (2.P) for a = 0.80 5. 78 (2001) 238) recently described an elegantly simple high school experiment to determine the absolute zero on the centigrade temperature scale.4).4 3. Am.10.0 60.50 4.05. Its analysis consists of two parts.4) The imprecision contours defined here pertain to the probability that a single. as a function of temperature t. 24 (1929) Suppl. in °C.10 4.11. N the number of data points.0 4. measured the volume V of air trapped in an inverted graduated cylinder immersed in water. Assoc. and a standard undergraduate physicochemical lab experiment. e. including extrapolation. p.I.7 72. The resulting volumes Vd of dry air are a linear function of temperature t.NP). I. ranging between 0 and 75°C.2 23.g.9 18. calibration. Educ.80 3. multiply s as defined in.8 8. and P the number of model parameters. Since it involves various elements of linear least squares fitting.05 and NP = 12 is obtained in Excel with =FINV (0. First these air volumes V must be corrected for the volume occupied by water vapor. They therefore differ from those proposed by Working & Hotelling (J. standard addition.75 6.Pw) I Pw where Pw is the vapor pressure of water at temperature t. Statist. as a function of temperature t. we will analyze those data here.5 50.98 4.11: The measured volumes V.3) and (2. finding the intersection of two straight lines. individual observation will fall within a given band around the least squares line. Kim et al.11 Extrapolating the ideal gas law In this and the next four sections we will consider several rather common applications of fitting experimental data to a straight line.6 43.76 R.25 Table 2. The specific set of measurements we will consider is shown in table 2. by~F(a. and Pb the barometric pressure. in mL. 12). V 0.00 7.90 3.10. (J. Kim et al. do not include the term s/. 73). which pertain to the mean value.40 4. Chern. 1. 2.0 61. The distribution function F (a. If you prefer confidence contours and confidence bands instead.15 5.10.
at 72 DC. the correction amounts to almost 35% of the measured gas volume! Figure 2. so that the experiment yields to = (278. though not identical. From the point of view of the exercise the difference is rather immaterial. and have moved it and the covariance matrix to save some space. it is clear that V is by far the more errorprone. and should be taken as the dependent variable. 2. 6) °C. First we must determine what type of least squares analysis to use. The vapor pressure P w of water as a function of temperature t is a welltabulated quantity that is readily parameterized. as a function of temperature t. whether either Vor t can be considered to be the dominant source of the experimental imprecisions.1. because the computation results in additional digits not shown in Fig.05% in absolute temperature T. The main reasons for the large imprecision in the computed value for to are the reading errors in using graduated cylinders for volume measurements. In this particular case. . 2.05 mL and ±0.6 °c. This can be compared with the accepted value of (273.11.1.1. to about ±l % in volume and ±0. These data are so much more precise than the reported volumes V that the correction does not add significantly to the experimental imprecision in Vd.2. and the long extrapolation.Od °C. In making Fig. that actually has only a minor effect on the resulting precision. Note that the centigrade temperature scale t contains an arbitrary constant (t = 0 °c for T = 273.11.7 °C for the standard deviation. which yields to = 278.11. Make sure that the column with Vd is directly to the left of that with t.e.1: (1) Copy the data from Kim's paper or from Fig. The results will be similar.Ch.16 OK) that might be misleading in determining relative imprecisions.1 shows the raw and corrected volumes. or compute them from the values for V minus the volume occupied by water vapor as calculated in section 3. Exercise 2. (2) Call LSI.11. we have used the option to display the linear correlation coefficients.1 °C.1 °c respectively.11. see section 3. (3) Calculate to = ao I a].. After conversion into relative imprecisions. 2: Simple linear least squares 77 pected for an ideal gas. Since V and t have different dimensions. i. 2.20. (4) Call Propagation to find the associated standard deviation.20.1 6 ± O. even though. (5) Using the standard deviations ao and al would have given the incorrect value of 8.2. Using the covariance matrix in Propagation produces the (correct) standard deviation of9. V and Vd respectively. the relative imprecision on the Kelvin scale is the one that counts here. Analysis of the experimental data shows that the volume and temperature measurements have absolute imprecisions of about ±0. and can then be extrapolated to Vd = 0 to yield the absolute zero of the temperature scale. Since the data involve extrapolation to zero absolute temperature. Copy the values of Vd as listed. The remaining question is one of linear extrapolation. we can compare these numbers no better than we can compare apples and pears. 1 ± 9. incorrect because the linear correlation coefficients show that ao and al are strongly correlated.
50 8. .65 2949.58 2903. Fig.23 4.01363 0.68 V mL 3.3 1 2724.95£06 1.15 5.}.61 0.51 3097. after use of the LSI and Propagation macros.14 2.6 43.50 4.11.74 3.O}7 0.828 .08 4.6 temp degC 27 .75 6.00 2678.1: Spreadsheet for the data of exercise 2. Advanced Excel for scientific data analysis 1 8 2 3 6 4 4 2 0 V or Va / mL 300 200 Ph = 98.1.~o. 2.828 1 . .1 0. was computed in cell Ell as = C29/D29.40 4.88 3. Open circles: V. solid circles: Vd.87 12.68 4.76£07 LCC: ~ .033232 CM: 2.90 3.25 4.09 2.93£04 5.20 4.97 Coe//: StDev: Vd mL 3.78 3.82 3256.9 18. to.5 50.01 4.19 100 0 100 to= 2 78.2 34.88 301 1.58 4. 10 4.4 2 .00 7. The covariance matrix (CM) has been moved.12 3272.0 0.0 4.80 3.· 3S I 0.93 4.95£06 5.34 19.93 21..53 2764..1 9.98 4.80 5.02 2853.7 72.86 1.0 60.37 4.00042 Vmodel mL 0.11.50 4. 791 0.25 Pw kPa 0.22 3159.83 5.55 33. and the linear correlation coefficient (LCC) matrix labeled as such. de Levie. The sought parameter.49 3371.0 6 1.2 23.87 3.78 A R.
and the open circle the true value of the absolute zero temperature. as shown by the data themselves.g. the curvature of these imprecision contours is barely noticeable. Both figures are drawn to the same scale.s. thereby providing a good indication of the problems inherent in a long extrapolation. 79 5. one in which you calculate Vmodel . By adding the values for t = to you can show the extrapolated point and curve.10. As is clear from Fig. (7) Verify that you get the same result by centering the temperatures (see section 2. (8) Using the opposite assumption.0 VJl mL 4. but away from x = Xav they gradually move apart. (9) Finally. Plot these contours next to the line through the points.0 3. namely that the temperature is the more errorprone measurement and should be used as the dependent parameter. 2. in cell El7 use = $C$29 + $D$29 * D17. would have been incorrect. 2: Simple linear least squares (6) Use column E to compute Vmodel = ao + alt.5 ~ 300 280 260 240 220 Fig.5 0 1..11. It also would have been rather impractical. and a third in which you do the same for Vmodel + s. in order to illustrate generating imprecision contours. add three columns to the spreadsheet.l6°C.0 0.5 4. 2. both in the temperature range of the observations and near absolute zero temperature.0 20 40 60 80 r VJI ". . 273.4). and the same near the absolute zero on the temperature scale (bottom). because in that case correction for the water vapor pressure would have been much more complicated.11.10). the closed circle shows the extrapolated point. one in which you compute s according to (2.L 0. In the bottom panel.2.Ch. in which case there is no need to consider the covariance.5 0.2: The spreadsheet in the temperature range of the experimental data (top). e.
l.0010' Exercise 2. or open a new one.1: (1) In a new spreadsheet enter the data shown in cells A3 :B8 of Fig. 2. McGrawHill 2000. . so that we can compute the average and standard deviation ofyu in cells 09 and 010 respectively.l.12 Calibration curves A standard measurement procedure is to make a calibration curve of parameter y versus x. because the precision information is now a mixed bag: for ao and aj we clearly need to use the covariance matrix (since they are correlated). 2.1. Enter those data. and the function (in CI2). and copy the same data. 2. Propagation can handle either format. but not its mixture.12. based on the input parameters (in Fig. their standard deviations So and Sj. Exercise 2. This is therefore a problem of inverse interpolation: given a set of measurements y(x). we find Xu = 0.2: (1) Use another part of the same spreadsheet. Then make a measurement of the yuvalue of an unknown. namely those that consist of proportionalities or straight lines. and to use this to find the best estimate for the xvalue Xu corresponding to a measured unknown with observed response Yu. then a straight line.2. Advanced Excel for scientific data analysis 2. and use the calibration curve to compute the corresponding xvalue. 2. Typically. see Fig. and in C9 and ClO compute the corresponding average and standard deviation respectively.80 R. (3) Again assume that we measure the response of an unknown three times. (5) Finally. as = (D9B9) le9. As our calibration curve we will first use a proportionality. use Propagation to calculate the precision of this result. (2) With LS 1 find the intercept ao and slope aj. (5) Using the Propagation macro is now somewhat more complicated.12. plot these data. first determine the dependent variable y for a number of values of the independent variable X under wellcontrolled circumstances.12. 2.12. (4) In cell C 12 compute the value of Xu = Yu / aj. and the corresponding covariance matrix. de Levie. Harvey.12. using a set of wellcharacterized. known samples.1 in D.12.ao) / aj. Modern Analytical Chemistry. find the most likely Xu for a given YU' We will here consider the simplest calibration curves.12. (2) Use LSO (or either LinEst or Regression) to find the slope aj and its standard deviation. Sj. (4) Compute the value of Xu = (vu . with the results shown in cells C3:C5. XU' As our example we will use a synthetic data set from table 5. whereas for Yu we only have the standard deviation.2418 ± 0. as done in Fig. their standard deviations (in BlO:ClO). and determine the best proportionality y = alX or straight line y = ao + alx through those points.1 these are located in B9:C9). (3) Assume that we measure the response of an unknown three times. (6) As shown in Fig.
2.83 0. not significantly different from the value Xu = 0.36 0. of the covariance matrix (B l3:D 15).51 Fig. enter the locations of the input parameters (B9:D9). 1 24. because the absolute value of the intercept.: 121.Dev. or fill them with zeroes. Leave cells B15.5/ 7 0. 2.2413 ± 0.4 60.91 0.0014. Dl3. and therefore entering in Propagation the standard deviations in BlO:DlO instead of the covariance matrix in Bl3:D15. 2745 St. 16 29. (7) In cell D15 calculate the square of the standard deviation Su ofyu (located in DlO).003) for the standard deviation in XU" Incidentally. Since the imprecision in Yu is clearly unrelated to that in ao and a) we simply add the corresponding variance inyu to the main diagonal (from top left to bottom right) of the covariance matrix.1: A spreadsheet for reading data from a calibration line through the origin.: 0.2418 ± 0.5 Coeff. C15. and need not be entered.0 12. This will yield Xu = 0. laol = 0.12.2 35 . by ignoring the correlation between ao and a). but we can add the variance of Yu to the covariance matrix. and of the function (in C16). using the LSD and Propagation macros.32 29. the above result shows that there is no good reason to use a straight line rather than a proportionality in this case.12.79 0. and D14 blank.Ch. (8) Call Propagation and.42 0. 2: Simple linear least squares 81 A y 2 x yu 3 0.00 0.383 29. in reply to its queries.3 48. (6) Without loss of information we cannot reduce the imprecision in ao and a) to just two standard deviations. 2. see Fig. We therefore deposit in cell D15 the square of the standard deviation stored in DI0. The covariance terms in cells B15:C15 and Dl3:D14 are zero. you would have found the significantly larger (but incorrect) value of 0. So here is the procedure.209 is smaller than its standard .0010 obtained earlier. (9) Verify that.
The result obtained in exercise 2.232 0.ao) / a] = (Yuao' + a] xav) / a] where ao' and a] are now mutually independent.82 deviation So preferable.1 is therefore A Y 0.16 29. de Levie.0014 D9 = AVERAGE(D3:D5) 010 = STDEV(D3 :D5) DlS = DIOl\2 C16 = (D9B9)1 9 Fig.292 0.16. 2.42 0. e.. The box around B13:DI5 shows the enlarged covariance matrix.2: A spreadsheet for reading data from a straightline calibration curve.1 0.18 _ _ _ _ _ _ _ _ _ _ .12.12. as usual.79 60.0 5 0.292.3 0.33 0.32 29. Or we can use nonlinear least squares to accomplish the same.51 CoeJj: StDev: CM: 29.12. .964 yll 29.4 0. Again. by applying standard formulas thoughtlessly. = R.xav) with ao' = ao + a] Xav .2 0.209 0.031 XII: 0. if we insist on using a straightline calibration curve.403 x 0.2413 17 0. and then compute any unknown Xu from Xu = (yu . This is illustrated in exercise 2. 2. we can center the calibration data to y = ao' + a] (x . there are still more ways of getting it wrong.706 0. there are several ways of doing it right .232 0.J 0.g. Alternatively.0 0.12.9 1 4 .3 and Fig.although.5 /20.12. as described in section 4.36 24. using LS I and Propagation. Advanced Excelfor scientific data analysis 0. 3 35.2.929 0.3.00 12. and of course yields the same result as that obtained in exercise 2.
2S)/C9 0 0. as long as the methods used in making the standard curve and in detennining the unknown are the same.13 Standard addition The standard addition method is sometimes used in. i.91 48. using LSI and Propagation. (3) The standard deviation in cell Dl3 is now obtained with Propagation using the standard deviations in BIO:D10. then copy its value to cell D9. but does not presume a priori knowledge of that proportionality constant k.Xav where Xav= L x / N = 0.25)/C9.05 0.7057 0. This quantity approximates the standard deviation of a single ymeasurement. 2.00 12. with =(D9B9+C9*O.83 35. 2.79 60. e.32 29.403 XXUI' yll 0.25. as they should be anyway.g.3 0.3 illustrates a possible layout of the spreadsheet for a centered calibration line. And in cell D12 calculate Xu as Xu = (yu .0014 15 CM: 0.16 29. 2: Simple linear least squares A x 83 y 0.36 24.0 0.05 0..25 0.12.42 30.92942 Fig. Exercise 2.3: (1) Fig. multiplied by VcNP). measured response.e.2413 0.2 0.aD' + a\xav) I aJ.385 0.15 0.Ch. With only a single measurement of Yu we do not have a value for its standard deviation or variance. In column C we plot x .12.18 0.3: A spreadsheet for reading data from a centered straightline calibration curve..51 29. chemical analysis.4 0.5 Coeff: Sf De v: 0. In its sim . This method presumes a proportionality between the concentration of the sample and the resulting. (2) Compute the average value ofyu.33 0. In that case the best we can do is to use sY' the standard deviation in the fit of the standard curve. /65 0. 2.15 0.1 0.25 120.12.027108 0 D9 = A VERAGE(D3:05) DI0 = TOEV(D3:D5) DI2 = (09B9+C9*0.964 XII: 29.
de Levie.. Exercise 2. for an added volume Vi. reproduced in table 2. One then measures the responses of these solutions which. One adds a known volume Va of known concentration Ca to that sample.0 1.13.13.J'u] in terms of measured quantities.g.0 mL sample was added 25. adding to it known volumes Vi of a standard solution of known concentration Ca.33 mL Table 2.J'J[(Va+Vu)YaV. and adding solvent (e.5 2. In a more sophisticated form. and Vc as the independent one.1: (1) Enter the data from table 2. then calculate the average value of Vi. To a 25. These data are for measurements on an aqueous sample containing an unknown lead concentration. and measures the corresponding signal of that mixture.0 mg/L lead. J Chern. which should now be Ya = k(CaVa+CuVu)/(Va+Vu). a sample of known volume Vu and unknown concentration Cu yields a measured signal Yu = k Cu. which are both derived from the same set of measurements and therefore. Educ. R. Chern.04 2. Least squares fitting of Yi vs. and presumably the same electrolyte as the electrolyte solution.Vav .5 1. .0 2.84 R. Bruce & P. By centering the independent variable. Advanced Excelfor scientific data analysis plest form. Eliminating k then yields an explicit expression for the unknown concentration.0 mL of an electrolyte solution (in order to give the sample sufficient conductivity for the electrochemical experiment) and 1. As our experimental data we will use those reported by G. we can make ao and a] mutually independent parameters.86 0. in general. Cu = CaV. use LS I to compute the slope and intercept of a straight line through these points. S. Vi then should yield a straight line of intercept ao = kCuVulVt and slope a] = kCal Vt from which we obtain Cu = aoCaI a] Vu.0 mL of 10.11 1.13 in a spreadsheet. 76 (1999) 805. are mutually dependent.13: The data of Bruce & Gill.44 1.0 mg/L cadmium as an internal standard. and thus avoid having to use the covariance. The ratio R of the peak currents due to lead and cadmium was then measured by stripping voltammetry.5 1.74 2. This calculation involves the ratio ao I a]. 76 (1999) 805. and apparently was added instead of an equal volume of electrolyte solution in order to keep the final volumes of the solutions to be measured at 51 mL. and make a column for the centered volumes Vc = Vi . will be Yi = k (CuVu+ CaVi) I ~. Gill. Vi= R= o 0. (2) Treating R as the dependent variable. known volume Vu of a sample solution of unknown concentration Cu . The standard additive contained 10. one prepares a series of solutions of constant total volume VI by using a fixed. water) to give all solutions to be measured the same total volume ~. Anal.
017902 I CM: r.0160 3 CII = 0. and Vu = 51 mL..250 o su 2 = Vi 0. and note that the covariance terms are indeed zero.250 0.3E05 I cell: B13 instruction: = AVERAGE(B19:B24) C19 = B19$B$13 E13 = (B25/C25B13)*10/51 copied to: C20:C24 Fig.750 1.5639 Vi mL 0.250 0. 5...1: The spreadsheet analyzing the data of Bruce & Gill by centering them.5867 0.00 1. see Fig.250 0.3 05 O. .33 1.llglL.00 73 0.86 1.250 0. 2.009 1.. 2: Simple linear least squares Display the covariance matrix.llgfmL oflead.13. O.597 0. 85 A o o o o o o 2 Cav 1 = 1.44 1. (3) Calculate the unknown concentration as Cu = ao'Ca fa) Vu = (ao/araav) x (CalVu) where Ca = 10 .Ch. so that Cu is in .50 Coeff: StDe v: 27 R Vcentered mg 0.74 2.04 2. _ _ _ _ _ _ _ _ _ _ .50 2.1.OE+OO 7.OE+OO I I L.750 0.00 2. 2.00 0.13..11 1.50 1.
86
R. de Levie, Advanced Excel for scientific data analysis
(4) Use Propagation (with input parameters in B25:C25, standard deviations in B26:C26, and the function in E13) to calculate the standard deviation in CU' (5) Copy the input data to a different location on the same spreadsheet, or start a new spreadsheet, and analyze the data from table 2.13 without centering. (6) Note that, in this case, the covariance terms are not zero, and the covariance matrix must therefore be used as input to Propagation. Verify that, if this is not done, a slightly different (and incorrect) answer (0.012 instead of 0.016) is found for Suo Figure 2.13.2 shows the computational part of the new spreadsheet.
R
Vi
Cu = 0.5639
mL
0.86 1.11 1.44 1.74 2.04 2.33 CoejJ: 0.8410 StDev: 0.0130 0.017902 0.00 O.SO 1.00 1.S0 2.00 2.S0 0.597 0.009
su
=
0.0160
I
..
CM:
24
1.7E04 9.2EOS I L.. _ _ _ _ I 9.2E05 _ _ _ _ _ _ I 7.3E05 I
cell:
instruction:
D13 = (B22/C22)* lOIS 1
Fig. 2.13.2: The spreadsheet directly analyzing the uncentered data of Bruce & Gill.
2.14 The intersection of two straight lines
Say that we need to determine the coordinates Xx and Yx of the intersection x between two straight lines, y = ao + alx and z = bo + blx. The value of Xx follows from setting Yx equal to Zx as Xx = (b o  ao)/(al  bl), and that ofyx asyx= ao + alX x. For the imprecision in Xx we realize that the least squares fits to the two line segments are mutually independent. We can therefore generate their combined covariance matrix by merely adding the two along a shared diagonal, as shown below. Then we can use Propagate to compute the standard deviation in xx. F or the imprecision in Yx we cannot use Yx = aD + alX x because the imprecision in Xx is not independent of those in aD and al. We circumvent
Ch. 2: Simple linear least squares
87
this problem by expressing Yx explicitly in terms of the four coefficients ao through b l as Yx = ao + al Xx = (al bo  ao b l ) / (al  bl), and then use Propagation to calculate the standard deviation in Yx.
Exercise 2.14.1: (I) Arrange a spreadsheet for the calculation of the intersection between the two straight lines, y = ao + alx and z = bo + blx, by entering labels and values for the coefficients ao, a" bo, b" and the noise 'amplitudes' Sny and Snz. Also make column headings for y, x, z, x, and for two columns of random noise, ny and nz . Then deposit values for x, and let the spreadsheet compute corresponding values for noisy straight lines y and z. (2) Use LSI to find approximate values for the coefficients ao and al and the corresponding standard deviations and covariance matrix, then do the same for bo, bJ, etc. (3) Use these coefficients to let the spreadsheet compute values of Xx = (b o  ao) / (al  b l) andyx = (alb o  aOb l ) / (al  bl)' (4) If necessary, rearrange the coefficients so that ao, a" bo, and b l are aligned in a single, contiguous row. Arrange the two covariance matrices such that the rightbottom comer of one touches the lefttop comer of the other, so that they will form a single 4 x 4 matrix with a shared main diagonal. Then call Propagation to find the standard deviations in Xx and Yx. Figure 2.14.1 illustrates what you might get.
In the example shown in Fig. 2.14.1 we have used a small angle between the two lines, and rather generous noise, to emphasize the difficulties one may face. We find Xx = 8.29 ± 0.6 8 and Yx = 8. b ± 0.70 whereas, in the absence of noise (i.e., for Sny = Snz = 0), the intersection would be at the point (10, 10). By using the standard deviations of ao through bl instead of the combined covariance matrix, we would have obtained a standard deviation in Xx of 1.68 instead of 0.68 , and an (equally incorrect) standard deviation inyx of 1.72 instead of 0.70. Alternatively we can center the xvalues to avoid having to deal with covariances. In this case we first compute the averages Xav,y and xav,z of the two data sets y(x) and z(x), fit y = ao' + al (x  xav,y) and z = bo' + b l (x  xav,z) where ao' = ao + al Xav,y and bo' = bo + al Xav,z, and then calculate Xx = (b o  ao) / Cal  b l) = (b o' bl Xav,z  ao' + al xav,y) / (al  bl), andyx = Cal bo  aob l ) / (al  b l ) = [al (b o' h xav,z)  (ao' al xav,y)bd / (al  bl)' The standard deviations in Xx and Yx are then calculated with Propagation, using the mutually independent standard deviations of ao', aJ, bo', and b l • This alternative method is neither faster nor simpler, given the custom macros that generate and use covariance matrices. Nonetheless the exercise is useful, because it confirms the earlier result obtained with the covariance matrix.
88
A
R. de Levie, Advanced Excel for scientific data analysis
r20
y,z
15
10
5
0 5 5 y aO= a1= bO= b1= I 1.1 I 1.12009 0.19770 1.67934 1.42656 3. 3 0 1 4.22392 4. 6124 6.39060 7.572 3 .95394 9. 6920 11.63706 12. 12554 14.088 8 15.35026 16. 1645 16.81558 18.46757 I .5904 10 15
x
20
x
0 I 2 3 4 5 6 7 8 9 10 II 12 13 14 15 16 17
z
x
21 22
0.9
23
24
6.r I
Ilay= 0.4 Ilaz= 0.3
25
37 38 39
4.99287 6. 10671 6.66462 8.07879 8.99035 9.88893 10.8744 1 11.64604 12.95970 13.40353 14.01628 15.67066 16.27464 17.40254 17.71670 19.38294
5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
Fig. 2.14.1: A spreadsheet to calculate the intersection between two straight lines.
Ch. 2: Simple linear least squares
89
A
8
Coe/f:
tDev:
c
1.24887
o
1.13954
E
0.45729
F
0.933 77
40
41 x=
42 43 44
45
8.2913035 0.68269 CM:

0.20945 0.47462
0.01988
0.1 706 0.25889
0.01404
46 47
8. 1994208 0.69553
4.39£02 3.56£03 3.56£03 3.95£04            ~ 3~50E02  i.46Eoi
I
i
I
I 2.46£03
instruction:
1.97£04
cell:
copied to:
C20:C37 021:037 E25:E39 F26:F39
C19 = $B$19+$B$20*Dl9+$B$24*GI9 D20 = Dl9+$B$23 E24 = $B$21+$B$22*F24+$B$25*H24 F25 = F24+$B$23 A42 = (E40C40)/(D40F40) A46 = (D40*E40C40*F40)/(D40F40) Gaussian noise in G 19:H39
Fig. 2.14.1 continued: A spreadsheet to calculate the intersection between two straight lines. The data in C40:F42 were obtained with LS 1, and those in A43 and A47 with Propagation. The noise in columns G and H is shown in Fig. 2.14.2. The lines drawn through the points were calculated from the coefficients in C40:F40.
Finally we note that the computed coordinates are about three standard deviations away from their 'correct' values (xx = 10, Yx = 10), again illustrating that standard deviations should not be considered as outer bounds, just as a firstorder rate constant is not the time within which a firstorder reaction runs to completion. Both are characteristic parameters, not delimiting ones.
Fig. 2.14.2 (next page): Continuation of the spreadsheet of Fig. 2.14.1 showing the centered calculation as well as the noise columns G and H. The output in H40:L42 has again been aligned for subsequent use of the Propagation macro. The covariance matrices show zeros for the covariances. The data in H40:L42 were obtained with LSI, those in G43 and G47 with Propagation.
90
G 16
R. de Levie, Advanced Excel for scientific data analysis
H I
x uv,Y= 9
fly
J
K
x av,Z = 12.5
L
17
18 19 20 21 22 23 24 25
"z
1.2777 1.2765 1.7331 0.2342 1.0867 1.6904 0.9776 2. 1179 0.4040 0.3655 0.3702 0,0853 0.5132 0, 657 0.6549 1.6124 0.9022 0.0845 0.6751 1.2777 1.2765
Y
1. 1201 0. 1977 1.6793 1.4266 3.8380 4,2239 4.86 12 6,3906 7.5728 8.9539 9.8692 11.637 1 12. 1255 14.0 9 15.3503 16.1646 16. 156 I .4676 1 .5905
x
9 7 6 5 4 3 2 I 0
I
;.
x
0.3002 0.2443 1.1984 2.1836 1.0950 0.6902 1 .8469 0.7735 0.5679 0.1349 0.3270 1.3426 0.1862 1.9722 2.3757 1.6615 0.5389 1.9189 0.5238 0.3002 0,2443
2 3 4 5 6 7 9
4.9929 6.1067 6.6646 .0788 .9904 9. 889 10.8744 11.6460 12.9597 13.4035 14.0163 15.6707 16,2746 17.4025 17.7167 19.3829
7,5 6,5 5 .5 4.5 3.5 2.5 1.5 0.5 0.5 1.5 2.5 3.5 4.5 5.5 6.5 7.5
x= 8.29 130 0.68269
44 45
9.0070 0. 10 9 0.4746
eM:
1. 1395 0.0199
12.1294 0.0647 0.2589
0.9338 0.0140
y=
46
47
8.19942 1 0.69553 cell: i"struction:
    ~ 47%~j  o.0~+ OO
1.19£02 0.00£+00
0.00£+00 3.95£04
i
I
I
0.00£+00
1.97£04 copied to:
J16 = L16 = J19 = L24 = G42 = G46 =
AVERAGE(Jl9:139) A VERAGE(L24:L39) C19$J$16 J20:137 F24$L$16 L25:L39 (K40F40*L16140+C40*Jl6)/(J40L40) (J40*(K40E40*L16)(I40C40*Jl6)*L40)/(J40L40)
Ch. 2: Simple linear least squares
91
2.15 Computing the boiling point of water
A rather dramatic example of the need to consider the covariance was reported by Meyer (J Chern. Educ. 74 (1997) 1339), who described the determination of the boiling point of water from measurements of its vapor pressure as a function of temperature.
A
B
C
D
7
10
6
~
0,<> 0 0
0
5 0.0026 0.0028
0
liT
/" P
0.0030
p
°c
99.19 97.55 96.1 94.39 85.90 63.62 70.39 76.78 1.63 7.26 90.40 99.21
lorr
oK
In
·1
740.4 700.6 660.7 62 .4 443.2 172.3 235.1 308.7 375.5 469.6 529.3 740.4
Coelf:
SIDev:
6.6072 6.5519 6.4933 6.4432 6.0940 5.1492 5.4600 5.7324 5.92 3 6.1519 6.27 16 6.6072 20.3871
0.0547 0.006/
0.00269 0.00270 0.00271 0.00272 0.00279 0.00297 0.00291 0.00286 0.00282 0.00277 0.00275 0.00269 5130.0 /9.7
eM: :   0.0030    :"(0740 : I  1.0740 386.2979 I Lee: i  1'     :"0~9995i I 0.9995 I I
, _______________ 1
Fig. 2.15.1: The spreadsheet for fitting data to (2.15.1).
92
R. de Levie, Advanced Excel for scientific data analysis
In this example the pressure p (in torr) was determined with a mercury manometer, and the temperature t (in 0c) with a thermistor. A set of such data, kindly provided by Prof. Meyer, is shown in Fig. 2.15.1, which also shows its analysis. The latter consists of converting p into In p, and t into liT, where T = t + 273.16 is the absolute temperature. The quantity In p is used as the dependent parameter in fitting the data to the straight line lnp = aD + a]IT (2.15.1) From the resulting values for aD and a] the boiling point then computed as h
= tb
of water is
(2.15.2)
273.16 + a] I [1n(760)  aD]
where we have substituted p = 760 torr. More complicated, for sure, than directly measuring the water or steam temperature at 760 torr, but standard fare in undergraduate physical chemistry labs because it demonstrates adherence to the ClausiusClapeyron equation. We use (2.15.2) to compute tb, and estimate the corresponding precision with the custom macro Propagation, using the covariance matrix as input, and so obtain tb = 99.83 ± 0.07 DC, which is within 3 standard deviations of the accepted value of 100°C. However, had we instead used the standard deviations So and s] (shown in Fig. 2.15.1 in cells C29 and D29 respectively) for the computation of the standard deviation in tb, the result would have been tb = 99.83 ± 2.06 DC, a considerably larger (and, more importantly, incorrect) imprecision estimate. In this example, the linear correlation coefficient between ao and al is 0.9995. The above computation would be improved by using weighting, see section 3.20.1. Introduction of the appropriate weighting factors w = p2 indeed modifies the specific results to ao = 20.37, a] = 5125, and tb = 99.84 ± 0.06 DC. And if this were 'malculated' without the covariance, we would get h = 99.84 ± 3.43 DC, more than 50 times too large, because the linear correlation coefficient between ao and a] is now 0.9998. Clearly, weighting does not change our main conclusion regarding the need to use the covariance, but merely accentuates it.
2.16 Phantom relations
In using least squares it is tacitly assumed that the input data represent independent measurements. If that is not the case, quite misleading results may be obtained, as illustrated by the following problem (#9 on p.
Ch. 2: Simple linear least squares
93
383) of K. Connors, Chemical Kinetics, the Study of Reaction Rates in Solution (VCR 1990):
"From the last four digits from the office telephone numbers of the faculty in your department, systematically construct pairs of "rate constants" as twodigit numbers times 105 sI at temperatures 300 K and 315 K (obviously the larger rate constant of each pair to be associated with the higher temperature). Make a twopoint Arrhenius plot for each faculty member, evaluating Mit and Mt. Examine the plot of Mit against L\st for evidence of an isokinetic relationship."
Essentially, the reader is asked to take two arbitrary twodigit yvalues Yt and Y2, assign them to preselected xvalues Xt and X2 respectively, compute the resulting slope a1 and intercept aD, repeat this for a number of arbitrary input parameter pairs y (for the same xvalues), and then plot the resulting atvalues versus aD, or vice versa. The actual procedure is somewhat less transparent, since it also involves sorting the input data, a logarithmic transformation, and giving the slopes and intercepts thermodynamic names, all steps that tend to obscure the true nature of the problem. Moreover, the above assignment uses only positive input numbers. Below we will simply take pairs of random twodigit integer values for y, associate them with two fixed xvalues such as Xt = 300 and X2 = 320, compute the resulting slopes and intercepts, and then plot these against each other.
Exercise 2.16.1: (I) In cells B2 and C2 place the labels yl and y2 respectively. Do the same in cells E2:F2, and in cells H2:I2 deposit the labels aO and al respectively. (2) In cells B4 and C4 deposit the instruction =INT (200* (RAND () 0.5)), which will generate random twodigit integers between 100 and +100. Copy these instructions down to row 23. (3) The numbers in B4:C23 will change every time you change something on the spreadsheet. In order to have a fixed set of random numbers, highlight B4:C23, copy it with Ctrluc, highlight cell E4, and use .Edit => Paste fuJecial => Yalues to copy the values of YI and Y2 so obtained. After that, use the data in block E4:F23 as your random input data, while ignoring those in B4:C23 that keep changing while you work the spreadsheet. (4) Based on the data in E4:F23, compute in column H the slope of each pair of data points (XbYI), (X20Y2) as (yr YI) I (xrxd, and in column I the corresponding intercepts as (X2YIXIY2)/(X2XI). (5) Make a plot of ao (in column H) versus al (in column I), or vice versa, see Fig. 2.16.1.
The data in Fig. 2.16.1 seem to fall on or near a straight line, for which Trendline yields the formula y = 311.18 x  0.8877, with R2 = 0.9983. Is this what you would have expected for having used random input numbers for y? If not, what happened?
94
3000
R. de Levie, Advanced Excel for scientific data analysis
ao
1500 0 1500 3000 10
<0>
00
~,
00 0
5 0 5
Q)
QI
10
Fig. 2.16.1: An example of a phantom line you might find with Xl = 300 and X2 = 320.
Because each pair of input numbers y of this graph is completely determined by the calculated slope and intercept for given input values of x, the graph uses highly correlated pairs of input data. We already encountered the formula for that correlation, (2.1 0.1). The sign of (2.10.1) explains the negative correlation, and the effect is the more pronounced the larger is LX, i.e., the more eccentric are the xvalues used. Plotting such slopes and intercepts against each other will then lead to a convincingly linear but physically meaningless nearlinear relationship, approximating the proportionality y=xavx. Instead, you are merely verifYing the correlation between slope and intercept, see (2.10.1), as is perhaps more evident after we rewrite y = Xav X using more appropriate symbols as aD =xava\. This is the origin of the isokinetic relationship (J. E. Leffler, J Org. Chern. 20 (1955) 1202), and illustrates what the covariance can do for you if you don't watch it. An extensive discussion of this problem, as well as a suggested solution, was given by Krug et aI., J Phys. Chern. 80 (1976) 2335, 2341. For an interesting (and only seemingly alternative) explanation of this phantom relationship see G. C. McBane, J Chern. Educ. 75 (1998) 919.
Exercise 2.16.1 (continued): (6) Use the same yvalues collected in columns H and I, but now analyze them for a pair of xvalues centered around the average Xav = 310, so that Xl = 10 and X2 = +1O. Does this support the above explanation?
Ch. 2: Simple linear least squares
100
95
00 0 0
ao
50
o
0 0 0 0
0 50
8
0 0 00 0
0
0 0 0 0
100 10 5 0 5
at
10
Fig. 2.16.2: The same yvalues as in Fig. 2.16.1 analyzed withx\ = 10 andx2 =+10.
3000
ao
1500
<I>
(i)®
0 1500 3000 10
~,
e(!)
+
®
<%>
a
1
5
0
5
10
Fig. 2.16.3: The data from Fig. 2.16.1 (open circles) and, for comparison, those computed as ao = xava\ (filled circles connected by a thin line).
Given that the input data were random, which are the parameters that determine the 'line' in Fig. 2.16.1? There is no significant intercept, just a slope, and the latter is simply ~(Lx)/N, i.e., minus the average value of x. In the above example we have ~(Lx)IN=(300+320)/2=~31O, so that we would expect y = ~31O x, which compares well with the result of Trendline,y = ~311.18 x ~ 0.8877, as illustrated in Fig. 2.16.3. Indeed, as already noticed by Leffler, in many cases the reported slopes of isokinetic plots were close to the average temperatures of the data sets considered. In such cases the isokinetic effect is nothing more than an artifact of incorrectly applied statistics.
96
R. de Levie, Advanced Excel for scientific data analysis
2.17 Summary
Typical experimental data are occasional samples of some underlying continuous feature, corrupted by scatter. Linear least squares methods often allow the experimenter to recover the underlying trend from the sporadic, noisy data. Note that this underlying, noisefree trend can contain systematic errors, and that the standard deviations, variances and covariances generated by least squares methods only deal with precision, not with accuracy. Because least squares methods have become so easy to apply, they have become ubiquitous in many fields of science and technology. Keep in mind, however, that the method and the results obtained with it presume that the noise is random and can be described adequately by a single Gaussian distribution. We seldom have (or take) the time to verify those assumptions, and therefore should take the results with the proverbial grain of salt. In this chapter we have focused our attention on fitting data to a straight line, because this problem is so common in applied science. It is often assumed that the parameters obtained by least squares fitting to a straight line are mutually independent, but this is usually not the case. Consequently, quite misleading results may be obtained, as illustrated in section 2.16, where the culprit was the (easily overlooked) covariance of those input data. Working backwards, it shows how to convert perfectly random data from the 'scatter' plot of Fig. 2.16.2 into the convincingly linear relationship of Fig. 2.16.1 (with an R2 factor of more than 0.998), an object lesson in 'how to lie with statistics'. Beware, it is all too easy to fool oneself1 With appropriate software, getting the correct result is not particularly difficult, but attention must be paid. Fitting data to a proportionality generates only one coefficient, the slope, and therefore seldom involves problems of covariance. Fitting a straight line yields two adjustable parameters, ao and aJ, which in general will be mutually dependent. The macro LS provides the corresponding covariance (and, as we will see in chapter 3, does the same for polynomial and multivariate fits as well) and the macro Propagation can subsequently take their covariance(s) into account. Centering will avoid this problem, because it leads to mutually independent coefficients ao and aJ, i.e., it renders their covariance zero. In section 3.11 we will encounter the equivalent of centering for polynomial and multivariate fits.
Ch. 2: Simple linear least squares
97
The problem of data analysis starts with data acquisition. Then, if one does not want to hold on to the original data, the covariance VOl should be recorded and preserved, or the data analyzed in their centered form, and the result stored together with the value( s) of Xav. If only the fitting parameters ao and al are available, together with their standard deviations So and Sl, one will in general not be able to compute the correct precision of subsequently derived results, because there is no way to determine, retroactively, the value ofthe covariance VOl. Unfortunately, such careless and misleading use of least squares occurs far more often than one would hope. Of course, one should not get carried away with this: imprecision estimates are just that: estimates. Still, if time and effort are spent on making those estimates, they might as well be done correctly. In chapter 3 we will extend the least squares analysis to polynomials and multivariate functions. We will see that we can further broaden the application of least squares methods by transforming data that do not fit that mold into a polynomial form. That still leaves many functions out; for those, chapter 4 will describe nonlinear least squares.
2.18 For further reading
Excellent, highly readable starting points for linear least squares methods are An Introduction to Error Analysis by J. R. Taylor (University Science Books, 1982, 1997), the classic Data Reduction and Error Analysis for the Physical Sciences by P. R. Bevington (McGrawHill 1969, 1992), as well as S. Chatterjee & B. Price, Regression Analysis by Example, Wiley 1977, 1991. These books (and many others) clearly explain the underlying assumptions, and show many examples of practical applications. And for the lighter side, take a look at L. Gonick & L. Smith, The Cartoon Guide to Statistics (Harper Perennial, 1994), or its predecessor, D. Hill, How to Lie with Statistics (Norton 1982).
statistics) the equivalent choice is made by setting type to 0 or false in order to force ao to zero. LSI fits data to a general polynomial. and show how least squares methods can be simplified when the xvalues are spaced equidistantly. type. With LinEst(yrange. xm instead of one for x. while LSO sets the value of ao equal to zero. ••• . with . + a". the first two columns should still contain the values of Yi and Xi. thereby forcing the fitted curve to go through the origin. There is neither a requirement that consecutive orders of x be used (you need not include columns for powers you do not need). For LSO or LS 1. but neither the corresponding standard deviations Si nor the standard deviation Sy of the fit. ° With Trendline you do not need to make any new columns for higher orders of x. x/. In order to display its numerical results. and that these follow a single Gaussian distribution. contiguous columns. so that they can be entered as a block. again assuming that y contains all the experimental uncertainties. xrange. for x. 3. Trendline yields the individual coefficients a. then select Display ~quation on chart. but the highlighted block should now include one or more adjacent columns for. nor that they be in any particular order (although that is usually easier to work with).. with Regression the same is achieved by activating Constant is Zero in the dialog box. or LSI. We also describe weighted least squares.Chapter 3 Further linear least squares In this chapter we apply least squares methods to polynomials in the independent parameter x. click on the Options tab. For either LinEst. but merely select the power series and specify its order (between 2 and 6).. say. :x?.1 Fitting data to a polynomial Excel makes it easy to extend the procedures discussed in chapter 2 to fitting data to a power series of the general form y = ao + atX + a2:x? + 4 a')X3 + a4X + . and to multiparameter functions. where j = (I) m. It can only provide the rather uninformative R2. x 3. x/. Regression. the square of the multiple correlation coefficient..xm = L aj:l. LSO. etc. we merely have to arrange m adjacent.
Regression. For LinEst. confidence intervals (which depend on the standard deviation s. (3) The LinEst output is 9.Ch. for that purpose. A convenient Forecast feature of Trendline Options allows you to extrapolate the curve on the basis of the calculated parameters.04 ± O.11. 3: Further linear least squares 99 Options. or LS use columns for y.0412407 11. (5) For al and az the correct results lie beyond one standard deviation.1: (1) Start a new spreadsheet. While the fitted quadratic is noisefree.7355022 2.).793484 0. but (except for equidistant data) it is often easier to use LSO or LS 1 to find a straightforward polynomial fit and. (2) Figure 3. can be compared with the values used in generating the synthetic data. If they are to be used in subsequent computations. Sections 3.483 For polynomials of second and higher order. ao = 12'0 ± 2'2. We will discuss a particular set of such orthogonal polynomials in section 3. make up a data set from a quadratic expression plus some noise.2.11.8118 2038. you will need the corresponding covariance matrix.79 3 ± 0.2873408 #N/A 10 #N/A 52. if the results require .2 through 3. ao= 10.051121 0.974976 194.1 shows an example. (6) The coefficients aj obtained from a single least squares operation are of course mutually dependent. and a2 = 0.1) to find the coefficients ai used in computing Ycalc = ao + ao:x + ao:x2 in column D. 3. use three columns for the result. Display Rsquared value on chart. centering no longer suffices.9781 0. and fit it with each of the above methods. (7) Save the data for use in exercise 3. such as 95% or 99%) are more appropriate. but without any indications of the resulting uncertainties.h and a2 = 0. standard deviations should not be interpreted as indicating the likely range of deviations.2. the presence of noise in the parent data results in rather uncertain coefficients: the relative standard deviation Iso/aol is more than 18%.4 illustrate fitting data to a polynomial. For Trendline the data placement doesn't matter. which is then plotted as a line through the points. (The second range is now a block. 1.05). which can be displayed by the custom macros LS. and orthogonal polynomials are needed to make the covariance(s) zero. al = 8.7. for both x and x 2• Since there are three coefficients to be calculated. B15 : C2 7 . 0. in which we have used the instruction =LINEST (A15: A2 7.319277 #N/A (4) These results in the top two lines. the number N of data points used. as long as the data are plotted in a graph. al = 9. and enter the array instruction with CtriuShiftuEnter. As we already saw in chapter 2. and a selected probability.1. and x 2 in this order.2 Fitting data to a parabola Exercise 3. x.23884 2.
0867 0.6902 1.732 7.300 0.1209 0.24426 1. Iz.333 8.1: A spreadsheet to fit noisy data to a parabola.19835 1.2342 1.391 13 . Cells NI5:N27 contain Gaussian noise with zero mean and unit standard deviation.2. A 30 2 20 10 ~ ~ 3 10 20 30 1 0 2 4 ~ 40 ~ 6 8 10 12 14 aO = 10 al = 8 a2 = 0.7039 12.490 2.56 5. 469 Fig.3 The iodine vapor spectrum As our next example we consider the visible absorption spectrum of iodine vapor.1. Because of its symmetry.93045 8.09502 . 3.645 8.4911 13. while the instruction =B15 A 2 in cell CI5 is copied to CI6:C27.4299 9.3002 1.745 28.536 .173 8.00427 11. a homonuclear diatomic molecule.266 7.914 8.27647 1. Iz has no dipole moment in any of its vibrational or rotational . 3.7 na = 2 y 3.080 18.541 noise 0.897 16. de Levie.73313 2.6904 I.1836 0. to use the covariance matrix with Propagation to compute the precision of the resulting answers. Advanced Excel for scientific data analysis further mathematical manipulation. 189 13 .994 x I xx I 2 3 4 5 6 7 8 9 10 II 12 13 4 9 16 25 36 49 64 81 100 121 144 169 yca/c 3 .2777 0. and is copied to AI6:A27. Cell Al5 contains the instruction =$B$11+$B$I2*BI5+$D$11*CI5+$D$I2*NI5.17.7303 2.100 R.353 14.1 8 1 27.
(3. 2. the rotational states are not resolved. where h is Planck's constant. The data we will analyze here reflect transitions between the vibrational and electronic ground state (VO = 0) and different vibrational states (with vibrational quantum numbers v') in the electronically excited state. that of the harmonic oscillator.) in the electronic ground state. the molecules exist in discrete vibrational and rotational states. When. In the visible part of the spectrum. table l.Ch. J. and can therefore lead to light absorption.OJexe(v + 112)2 where OJexe is called the anharmonicity constant. however. 1. 78 (2001) 827. the energy E( v) of a particular vibrational state can be described as E( v)/he = We (v+Yz). This same data set has also been reproduced by Ogren. The latter can be expressed in terms of the force constant k and the reduced mass Jl (here half the atomic mass of iodine) as OJe =[1/(2Jre)]~k/Jl. In both the ground state (here indicated with 0) and the electronically excited state (labeled with '). defined by the quantum numbers v and J respectively. and leading to various vibrational levels (with their quantum numbers v') in the electronically excited state. Edue. as in the experiments used here.1. Pursell & L. As our experimental data we will use a set of measurements discussed in C.I ). Chern. as in E(v)/ he = OJe(v + 11 2) . etc. and We the fundamental vibrational 'frequency' in units of wavenumbers (cm.3. the actual data used were kindly provided by Dr. It therefore does not absorb light in the infrared and microwave regions of the spectrum. 3. as indicated by the superscripted bar. A more realistic description includes a secondorder term. Edue. J. In the simplest model. Davis & Guy. Pursell and are listed in Fig. 76 (1999) 839. There are no quantummechanical restrictions (selection rules) that govern the changes in vibrational levels during this electronic transition. lz can be excited from its electronic ground state to an (at room temperature essentially unoccupied) electronically excited state. each originating from a different vibrational level (such as VO = 0. 3: Further linear least squares 101 modes.1) . J. the vapor phase absorption spectrum is observed with relatively low resolution (L1A ~ 0. Chern.3. where one usually observes vibrational and rotational transitions respectively.5 nm). One can observe several series of spectral absorption lines. Doezema. The electronic transition is associated with a change in dipole moment. e the vacuum speed of light.
5 26.5 37. 42 15603.25 992.5 36. de Levie.25 19 0. I 0.5 33 .38 0.993 .0232 17703 17797 17888 1797 1 065 1 150 1 234 315 394 471 546 619 691 759 1 826 18 91 I 954 190 15 19074 19130 191 5 1923 192 19337 193 3 19428 19470 195 10 19549 19585 0.25 506.5 35.5 34.25 930. 7 6. 3.25 1 92.5 38. 5 32.092 342.25 1640.5 39.25 1..25 1260.5 28.25 2256.25 702.25 1560.5 20.968 0. .5 29.0 2.1: The spectroscopic data for I2 vapor from Pursell & Doezema.5 4 1.5 43 .002 I ~.25 1722.5 24.5 3 1. _ _ i i _  _______  ____ I Fig.5 47.25 2162.25 420.5 30.0]"33.144 0.25 11 22.25 600.243 .25 462.5 19.25 12.5 25.25 3 0.25 1332.5 40.25 11 90.5 44.102 A vibrational qllanl1lm number R.25 1056. eM: I 2. 5 42.25 1 06.QO_O_ I 0.990 0..0057 42 43 .25 2070.5 23.3.993 11 i_ i .243 0.25 650. 5 22.5 132.5 2 1. 968 Lee:. 0.35)402.25 756..5 46.~33 ___ !~O~ __ ~.5 27.5 45.25 1406.25 70.990 0.25 552.25 14 2. Advanced Excel for scientific data analysis wave wave number (obs) 17702 17797 17 89 17979 18064 I 149 18235 18318 18396 18471 18546 18618 18688 18755 I 825 I 89 I 954 19019 19077 19131 19186 19238 19286 19339 19384 19429 19467 195 12 19546 19585 number (calc) 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 I .
Chern.3) where Eel is the (theoretical) energy difference between the minima for the electronic ground state and excited state in the diagrams of potential energy vs. enter the vibrational quantum numbers v' and the corresponding wavenumbers in columns A and B respectively.Ch. call LSI. bond length.2 illustrate how well a quadratic fits these experimental data. in wavenumbers.x. 76 (1999) 839.3. me'.x. for the iodine vapor spectral data of Pursell & Doezema.3.l and 3. line and small solid points: fitted data. as is more readily seen after rewriting it as (3. 14 (3. 12 + w.3. (2) In column C calculate v'+Yz. 3.1: (1) In a new spreadsheet.3.m.3. 3: Further linear least squares 103 The optically observable transition energy v from the vibrational and electronic ground state VO = 0 to the electronically excited state at various vibrational quantum states v' is then given by  v = [E(v')E(vO)]/ he . 20000 v 19000 18000 17000 10 20 30 40 V' 50 Fig. 14) + we'(v'+ 1/2) .3. J.x. Educ. . and me' xe " and the associated uncertainty estimates.W.2: A quadratic fit of the observed spectral lines. and in column D the quantity (v'Hif (3) Highlight the data in columns B through D. Figures 3. /2+ m. as a function of the assigned vibrational quantum numbers v' of the electronically excited state. = Eel +We'(v'+1I2)  12 We'Xe ' (v'+ 1/2)2 + w. /4). Exercise 3. and find the corre ~ = (Eel . Open circles: experimental data.2) which is a quadratic function of (v'+ 11 2).we'xe' (v'+ 112)2 sponding values for (Eel .w.
Advanced Excel for scientific data analysis (4) Compute the quantity xe' from me' and me' xe'. xx (or X"2). for Propagation specify the location of the input parameters as B35:D35. (6) Plot the resulting parabolas.007727 ± 0. where the choice of sign before the square root depends on the numerical values used.1 instead. 76 (1999) 841. J. order to determine the coefficients ai and bi. deposit Gaussian ('normal') noise ofzero mean and unit standard deviation. 3.1. and copy this down.3. and nz respectively. We fit the first curve (or curve fragment) to y = ao + alX + a2>. xx. when such numbers are used for further calculations.?. which are close to ±1.007727 ± 0. and in cells A4:A6 labels for bo through b 2 • In cells B1:B6 deposit their numerical values. 3. Educ. can be approximated as parabolas. and the address of the function as B41. However. (2) In cells AIO:HlO place column headings for Ym x. ny. This will yield xe' = 0. in cells E7 and E8 place labels for the noise standard deviations Sny and Sm. de Levie.g. that of the covariance matrix as C38:E40.1 and Xe ' is computed in cell B41 as mexe / me or = D35/C35..1.?.1: (1) In cells A 1:A3 of a new spreadsheet enter labels for ao through a2. compute Zn as =$B$4+ $B$5*$E12+$B$6*$F12+$F$8*H12. This should not be surprising. to a standard deviation more than twice as large. starting from row 12 down. This is also clear from the linear correlation coefficients shown in Fig. (4) In columns G and H. or of curves that.e. as illustrated by Pursell & Doezema and also by.4 The intersection o/two parabolas Say that we need to determine the coordinates of an intersection of two parabolas. since me' and me' xe' are clearly mutually dependent quantities. When the data are organized as in Fig. using the covariance matrix and the Propagation macro. e. (5) Apart from the calculation of the standard deviation of xe' you could also have obtained the same results with LinEst or Regression. in the neighborhood of their intersection.000048 cm.3. Likewise. Long et ai. i. 3. (5) In A12 compute Yn as =$B$1+$B$2*$B12+$B$3*$C12+$F$7*G12. Exercise 3. whereas specifYing the standard deviations in B36:D36 would have led to 0.104 R. The corresponding values yx and zx then follow from substitution of Xx into y = ao + alx + a2x2 or z = bo + blx + b2x2. (3) In BI2 and El2 start duplicate columns with xvalues.. so that (ar b 2)x2 + (alb l ) x + (aob o) = 0 or Xx = {(albl) ± v'[(alb 1)2 . and the second to z = bo + blx + b2 in >.4. . Chern. and adjust the parameters in Bl:B6 to yield intersecting curves. and do the same in Cl2 and F12 for X2.000021 cm. Extend all columns down to the same row as the xvalues in columns B and E. starting with cell D12. and in cells F7 and F8 deposit their numerical values. Zm x. We then compute the xvalue at their intersection by setting y equal to z.4(aobo) (a2b 2 )]} / {2(arb2)}. the covariance matrix may again have to be taken into account..
and place the covariance matrix in (no. can then serve as the input for the custom macro Propagation. copy the resulting coefficients bo through b2 to C4:C6.Ch.1: Determining the coordinates of the intersection of two parabolas.pecial ~ Yalues. 3: Further linear least squares 105 (7) Call LS I and fit the data in columns A. just as we did in exercise 2.s. (8) Now call LSI for the data in columns D through F. 3.e. Again using E. So far this is not much different from exercise 3. . B.1. and their standard deviations to DI:D3.. What is new here is the estimate of the resulting standard deviations in Xx and Yx. As long as the two parabolas are mutually independent. we can handle this by combining the covariance matrices of their fits.s. i. (9) For the coordinates of the intersection. this is no misprint) H4:J6.2. with the six variances on its main diagonal. compute the value of Xx in cell C7 from the formula given above.dit ~ Paste . and in cell C9 calculate the associated value of Yx = Zx as Zx = bo + bjxx + b2x}. because the first was located in El:G3. Place the covariance matrix in EI:G3. The value of Xx depends on six parameters.14. Using E. The resulting 6 x 6 matrix El:J6. and C.pecial ~ Yalues.Z Cb 00 0 0 1500 0 0 0 &0 1000 500 o o 20 40 60 o x 100 Fig. 2000 y. Qo through Q2 and ho through h2' of which the Qj are a mutually dependent set. The solid lines show the least squares parabolas used in the computation. as are the hj .dit ~ Paste . by placing the two matrices such that their main diagonal is joined. This is why we put the second matrix in H4:J6.1. copy the resulting coefficients ao through a2 to cells CI:C3.4. Copy the corresponding standard deviations to D4:D6.
0028 ] I sn y = sn z = 30 30 Fig.31 1 0.1 the resulting standard deviation in Xx is then found as 3.08377 0.5 1. then use Propagation to find its standard deviation. Advanced Excel for scientific data analysis A aO= a1= 02= bO= bJ= b2= 2 3 7 8 1400 65 1.029 1 4.195 3332 102.44 Fig.61.4. 3. For the standard deviation of Yx the procedure is similar.08377 I 339.100 0. Exercise 3.4.2 3300 . 1 0.9326 I G I H I I J r .4 0.4. just as we saw earlier in sections 2.21 6.26 0..106 I R. {(al~b.~ 72. 3.8 I .2248 0.586 0.68 69 1.49 2.2l.) ± vI[(al~bli ~ 4(ao~bo) (azb 2)]} / {2(azb 2)f Exercise 3.4.) ± vI[(al~b. as Yx = ao + a.053 0. whereas the correct answer is only 0.0617 0. for their uncertainties the combined covariance matrix in El :16. 4. For input parameters use the data in C I :C6.12 and 2.4. 3.02 339.26 0. (11) Ifwe use the values in Dl:D6 instead of those in El:16 as the uncertainty estimates.3: The section of the spreadsheet showing the combined covariance matrix.3 0.14.9326 I 4. e. . except that we must first convert the expression for Yx explicitly in terms of the coefficients ao through a2 and bo through b2. For the data in Fig.2248 4. I ElF 1228.2 72.)2 ~ 4(ao~bo) (azb 2)]} / {2(azb 2)} + a2 {~(al~b. and for the function cell C7. de Levie. we will obtain a quite different (but incorrect) result.1 (continued): (10) Call Propagation to calculate the standard deviation in xx.1.g.883 38.4.85 x inters = z inters = 64.00086 I i i 6507.06 17 0.2: The left top corner of the spreadsheet of exercise 3.1 (continued): (12) In cell C8 calculate Yx as just indicated.
the number of adjustable parameters should preferably be much smaller than the number of data points.5 ± 105. Chem. one can consider fitting to a polynomial as a special case of a multiparameter (or multivariate) fit.5% naturally occurring 37 CI.Ch. Rotationalvibrational transitions are usually observable in the infrared part of the spectrum. due to the presence of about 24. VZ and log t. In our numerical example we find Yx = 69l. and that of a more complete theory.1 (for the first harmonic or overtone) as published by R. in fact. The corresponding frequencies E( v. 3. J) / h e of a heteronuc1ear diatomic molecule such as HCI. and LS all will accept multiparameter fits in the same way as polynomial ones.4 using the covariance matrix. and can be assumed to follow a single Gaussian distribution. Trendline cannot handle them since the graph has only one horizontal axis. W.5 Mu/tiparameter fitting We can also use LinEst. 3: Further linear least squares 107 (13) Again check the difference between using the covariance matrix or merely the individual standard deviations of the coefficients.5 ± 6.6 and 3. Sections 3. 3. 76 (1999) 1302.7 will illustrate such multiparameter data fitting.1 (for the fundamental) and between 5400 and about 5800 cm.7 when the covariances are ignored. Edue. as long as the experimental uncertainties are all concentrated in y. The various parameters need not be independent.6 The infrared spectrum ofH 35 Cl As our first example we will fit a set of frequencies of the infrared absorption spectrum of H 35CI vapor between 2500 and 3100 cm. and it therefore has no way to determine what the independent parameters might be. The actual spectrum ofHCI vapor consists of a set of doublets. Regression. and LS (but not Trendline) to fit data as a linear function of a variety of parameters. Schwenz & W.J)1 he = we(v + 1/2) + BeJ(J + 1) wexe(v + 1/2)2 . F. LinEst. X3. the corresponding lines are easily recognized by their lower intensities. are given by E(v. These data can be fitted at two levels: that of the usual firstorder approximations of a rigid rotor and a harmonic oscillator. say x. which has a permanent dipole moment. Just make columns for the parameters to which y is to be fitted. Polik in J. As in all applications of least squares methods. Regression. We will here use the second approach since there is no good reason to oversimplify the mathematics when the spreadsheet can just as easily apply the more complete theoretical model. and Yx = 69l.
hope..6. J the rotational quantum number. while a e describes rotationalvibrational interactions.1) where E( v. The harmonic vibrational frequency OJ e can again be expressed in terms of the force constant k and the reduced mass J1 = 1 I (1ImJ + 11m2) as OJe=(1/2Jrc)~k/Jl.6. In this case the observation does not involve an electronic transition (as in the preceding example of iodine vapor). and De the centrifugal distortion constant.2 (J'+1)2 _F2(F +1)2] ae [(v'+1I2)J'(J'+I)(vO +1/2)F(F +1)] (3. The experimentally observed frequencies are listed in table 1 of the paper by Schwenz & Polik. As Be as Be = in section 3. and can also be downloaded from their web site at http://www. de Levie. me is not misinterpreted as an angular frequency. e.eduJ~polik/doc/hc1.6.2) where the zero and prime denote the lowerenergy and higherenergy states of the particular transition respectively.xls.g.J') .e. and ml and m2 are the atomic mass of Hand 35CI respectively.F)]I he =OJe [(v'+1/2)(vO +1/2)]+Be [J'(J'+l)JO(F +1)] OJexe [e v '+1/2)2 _(va +1/2)2] De [J. and light absorption is now restricted by the selection rule M = ±1 (except for diatomic molecules with an odd number of electrons. Finally.3. and e the speed of light in vacuum.. Equation (3. r the bond distance. Moreover. the bar indicates that the quantities involved are energies expressed in wavenumbers.chem. meXe is the anharmonicity constant.6. and the rotational constant hl8Jr 2cIe where Ie = Jl? is the moment of inertia. 1) is the energy. We will fit them to the expression v = [E(v' . Advanced Excel for scientific data analysis (3. The lines in the absorption spectrum correspond to transitions between these energy states. so that. room temperature observations are mostly restricted to transitions from the vibrational ground state v = 0 either to v = 1 (producing the fundamental) or to v= 2 (yielding the overtone).108 R. i. to differences between the energy levels given by (3. v is the vibrational quantum number.1). h is Planck's constant.2) can be rewritten in a compact form suitable for multiparameter least squares as .E(vO . such as NO).
and call LSD. which you will need in order to compute Xe as OJexe I OJe . v I. Zs =(v'+1/2)J'(J'+I)+(vO +1I2)F(F +1) Exercise 3.. otherwise the problem is no longer linear in the fitting parameters..J') . J.8) and Zl =(v'+1/2)(vO +1/2) z2 =J'(J'+I)F(F +1) O Z3 =(v'+112)2 +(V +112)2 Z4 =_J. Chern. in five adjacent columns for V'.5) through (3. These results can of course be used to compute the bond distance r = V(h/8t?BeCf.6.6.9) The experimental data for y = [E(v' .6. They can also be compared.6.1 and 3.1) and to fit a potentialenergy surface.4) (3. and (E' EO)lhc respectively. as reported in table 1 of P. This will provide you with the values and standard deviations in OJe . Note that we must treat the product wexe as an independent parameter.F)]/ hc (3. e. enter the data as provided in the abovementioned paper or web site. so that the functions Zi are all known~T~problem therefore reduces to finding the five unknown parameters we' Be' WeXe' De' and a e. for ZI through Z5 respectively. with similar data (from the same source) for D35 CI.7) (3.3) where y = [E(v' .. B..6. 78 (200 1) 827.1: (1) In a spreadsheet. 3.Ch.E(vO . O .6.F)]/ hc are given as a function of the (readily assigned) quantum numbers v v'. the researcher can focus on the interpretation and further uses of the extracted information. Be' OJeXe' De' and a. a linear correlation coefficient of 0.982). oF. (4) Your spreadsheet should resemble Figs.g. f'. 3: Further linear least squares 109 (3.2(J'+1)2 +F2(F +1)2 (3. Educ. and J' .9).(3..6.6.6. . Ogren.6.J')  E(vO .6. in this case. which are of course strongly correlated (with. By relegating the mechanics of curve fitting to the spreadsheet. a situation tailormade for fitting with a multiparameter linear least squares routine. (3) Highlight the columns for (E' EO)lhc and ZI through Z5. Guy. (2) Enter five more columns. F. in which you calculate these functions using the relations given in eqs.2.6.: Also displ~ the covariance matrix.5) (3.6) (3. Davis & N.
some column widths have been adjusted.110 A 1111 R.2 10 256 8 15 24 35 48 8 2 2 2 2 6 7 0 0 10 12 14 16 18 20 22 24 26 500 .4 2 2 2 % 5 3 2 0 o 2 3 4 5 I 2905.641 2775.234 3072. de Levie. Advanced Excel for scientific data analysis z2 2 4 6 .158 5739.4 18 3129.14 1 56 7.202 3029.499 2751.1: The top part of a spreadsheet with the infrared spectral data on H35Cl vapor of Schwenz & Polik (columns A through E). 3.995 2925581 2944577 2962.965 3059.9 14 3119.494 2 4 3 4 5 0 0 0 2 3 4 5 6 .907 2677.94 1 3044. containing the fundamental band.689 2598.6.79 5767.099 2 64.955 29 0.262 2 2 4 6 14 6 8 6 6 6 27 44 o 37 38 0 0 2 2 256 .00 10 12 65 90 2 6 864 Fig.220 2516.249 279 .109 5753.817 2727.932 2625.86 1 2544. .689 2997.788 3014.7 8 9 10 0 0 0 6 7 8 9 10 II 7 8 9 10 II 12 14 15 16 63 2 SO 99 120 143 168 195 224 2 2 2 2 2 2 II 12 \3 14 0 0 0 0 0 12 13 13 14 15 IS 28 30 32 2 10976 \3500 163 4 4 o o o 19 20 21 0 2 2 2 2 255 ·288 I I 2 0 4 32 108 256 500 64 1372 o 3 0 0 3 4 2 4 5 6 6 10 12 14 16 18 20 22 2 2 22 23 24 25 26 0 5 6 7 2 2 2 15 24 35 48 0 0 0 0 8 7 2 2 2 2 204 29 16 9 10 II 8 9 10 12 63 o o o 4000 5324 6912 8788 10976 80 99 120 1 43 168 195 5 12 I 1 14 13 I 2 2 2 2 2 2 2 3 4 5 15 0 2 3 4 5 6 14 24 26 28 30 2 2 2 2 o o o o 2 6 13"00 4 32 108 o o 5705.697 2651.979 2571.315 2821.77 1 30 5.834 2 43. In order to display the calculation.64 1372 ·2048 2916 4000 5324 69 12 .600 3097550 3108.624 2702.926 5723.
63E09 i I I 1. 3.012 54 55 56 I i I lo.21736£09 7.10.Ch.975£06 0. In exercise 3.842E09 0. 74El0 4. instead of the exponential part you can also use the instruction =NormDist (x.3649£12 3 .84E09 I I 0.048 0.09£11 9.796 47 48 10 12 14 16 18 2 2 2 2 49 50 51 1372 2048 2916 104 135 52 53 Coeff: StDev: CM: 10.612 5577.2: The bottom part of a spreadsheet with the infrared spectral data on H35Cl vapor of Schwenz & Palik (columns A through E).ooJj548 2.00089 0. .969 5624.1 we simulate such spectra.'0003[14. false) to generate Gaussian curves (1/0'. made of one or more Gaussian peaks.58E07 2. containing the overtone data as well as the results obtained with the macro LSD. compute the spectrum of a mixture of these components (assuming the applicability of Beer's law.74469£10 8.96E07 7. 3: Further linear least squares 39 40 41 42 43 0 0 0 0 0 0 0 0 0 0 0 0 0 2 2 2 2 2 6 7 8 9 1 7 8 9 10 0 1 2 3 4 5 6 7 8 5779.000132 7. 3.895 2 2 2 2 2 2 2 2 2 2 2 2 2 2989.1~83£~061 I 2.6.235£10 9.7 Spectral mixture analysis Figure 3. add noise to all components.896 5601.281 14 16 18 20 111 6 6 \372 2048 2916 4000 4 32 108 256 500 864 119 152 189 230 6 6 6 2 4 44 446 2 2 2 2 2 3 4 5 6 7 8 9 6 6 2 9 20 35 54 77 6 8 6 6 6 6 6 6 51.871E07 9.9.000384 9.23475£. and where 'false' specifies the Gaussian curve rather than its integral.968 5437.578E07 2. 0' the standard deviation (defining its width). then use multivariate analysis to reconstruct the composition of that mixture.971 5467. mean. and of an imaginary mixture made of these species.975E06 7.0005206 0.185 5551.30167 O.7 illustrates the absorption spectra of four fantasy species. stdev.034 0.827£06 I________________________ ~ I Fig.441 5790.0000029 0.865 5496.0883£11 4.141 5646.&) exp[(xxi /20'2] where x is the mean (locating the position of the peak center).833 5808.96E07 3.7.571 5524. Such peaks can be calculated as a exp[(xci/(2b2)].58919 0.312 5799.21 7£09 7.OOOJO 0. and the absence of chemical interactions).
1 and 3.4 0.7.) (4) Plot the spectra of columns G through K. each multiplied by its assigned concentration. and considerable overlap be .l. 0.7. (2) In columns M through Q generate random Gaussian ('normal') noise.6 0. as well as the composition of the mixture and the results of its analysis. Noise in the singlecomponent spectra and in the spectrum of the simulated mixture should of course be independent. etc. de Levie. You should of course use your own data to convince yourselfthat this is no stacked deck. (5) Highlight the data block in columns G through K. (You could do without columns B through E by adding noise directly to the data in columns B through E. and call LSD for a multivariate analysis of the mixture spectrum in terms of the spectra of its four com 0. Advanced Excel for scientific data analysis Exercise 3. In this particular example we have used the data of table 3. and in columns B through E calculate four fantasy spectra.112 R.1.7. and a center frequency c or mean x . Fig. and then subtracting that same noise from the mixture spectrum. and use these in column G to make a synthetic 'mixture spectrum' of the four singlecomponent spectra. and in columns H through K make somewhat noisy singlecomponent spectra by adding some noise from column N to the spectrum of column B.8 0. dev. a standard deviation b or 0.7.2..2. plus added noise from column M. Note that the resulting curve does not show distinct features easily identifiable with any of its constituent spectra. 3. in order to create more realistic singlespecies spectra.7. each with one or more Gaussian peaks. which might now look like those in Fig.2 o 400 450 500 550 600 650 700 ponents. The results of that analysis are shown in table 3. are listed in tables 3. (3) Near the top of the spreadsheet enter four concentrations. Each Gaussian peak requires three constants: an amplitude a. The simulation parameters used. 3. Despite the added noise. st. the absence of stark features.7.005) has been added to all curves.1: (1) In column A deposit wavelengths.1: The simulated singlecomponent spectra (thin lines) and the spectrum of their mixture (heavy line).005 for all components as well as for the synthetic mixture. together with noise standard deviations of 0.7. Independent Gaussian noise (mean 0.
1: The constants for the Gaussian peaks used in generating Fig. And because the least squares method is about data reduction while simultaneously filtering out as much noise as possible.: 300 100 50 70 270 480 550 760 80 50 50 80 30 200 15 500 300 580 40 60 30 Table 3.500 0. ampl: curve 1: curve 2: mean: st. Moreover.dev. rather than standard deviations. curve 1 mixture composition: recovered: curve 2 curve 3 curve 4 0. defeating the datareduction purpose of least squares as well. 3: Further linear least squares 113 tween the various singlecomponent spectra. the calculation will simply yield nearzero contributions.dev.: ampl: curve 3: curve 4: mean: st.7. how it can be determined.496±O. However. 3.I (with its N coefficients ao through aNI).003 0.7. The above method is simple and quite general. 3. as long as spectra of all mixture constituents are available. the answer cannot be . Ultimately this will defeat the noisereducing purpose of least squares.1 with the function NormDist (). on the (mistaken) assumption that 'the more terms.2: The assumed and recovered composition of the synthetic mixture. since we can fit N data points exactly (including all noise) to a polynomial of order N .650 O. such as a power series in x.003 0. Long before we reach such an extreme situation. we may already include statistically meaningless terms.OII 0. In the analysis you can include spectra of species that do not participate in the mixture: for those species.7.207±0.007 Table 3.300 O. and if so.648±O. the composition of the mixture is recovered quite well.8 How many adjustable parameters? Now that it has become so easy to fit data to a polynomial.200 0. Therefore the covariance matrix should be used in subsequent computations. we would then have replaced N experimental yvalues with the same number N of fitting parameters. We therefore ask whether there is an optimal polynomial degree. a missing constituent spectrum will cause the method to fail if its contribution to the mixture spectrum is significant.Ch.305±O. one might be tempted to throw in more terms. the better the fit'. A final note: the numbers obtained for the recovered composition are mutually dependent.
For instance. however. we lack sufficient theoretical guidance. (2) Absent theoretical guidance. even when a reliable theory is available. none infallible. we must first consider two caveats. such as sine waves. the information on the optimal order of the polynomial must be extracted from the data themselves. individual judgment is still called for. Ideally.114 R. Advanced Excel for scientific data analysis based solely on the exactitude of the fit. with enough highorder terms. While such criteria can guide us. such a power series would be totally inadequate. Such possible sources of systematic error are best avoided at the time of measurement. Often. However. (It is sometimes claimed that. Consequently. an expansion in terms of a power series in x would be rather inefficient.9 through 3. When the signaltonoise ratio is too small. we can only hope for reasonable success when the signaltonoise ratio is sufficiently large.) There are several such criteria. for others. because (short of making a perpetual motion machine) we cannot have a current I without a driving force V. of which we will here illustrate three. In many cases we do have theoretical models to guide us in deciding what function to use in fitting experimental data. de Levie. but then we will need some criterion to help us decide what the order m of that polynomial should be.. In sections 3.g. Before we use a power series in x without theoretical guidance. rather than corrected later. + arWCm as a rather generalpurpose fitting function. in which case we might indeed want to fit the data to the line V = Voffiet + I R. one could draw an elephant. such as . For many functions. but that is bull.6). Ohm's law requires that we use V = I R without a constant. But the measurement might still show a nonzero intercept if the meter had not been zeroed properly. for points that lie approximately on a circle. in which case decisions such as the optimal length of a power series must be made on the basis of statistical information inherent in the data. e. (There are methods available to pull a small signal from a large amount of noise. through careful calibration. they do not always yield identical answers.13 we will address that problem. theory should be our guide in selecting the model to which to fit the data. but that is of little help once the experiment has been performed.3) and H 35 CI (section 3. Since random noise can be of no help in this respect.. such methods must fail.. Back to the main question of this section: what do we do when we have no theory to guide us? In that case we may want to use a polynomial such as the power series y = ao + alX + a2x2 + a)X3 + . (1) The overall trend of the experimental data should be representable in terms of the model used. We did just that with the vaporphase spectra ofIz (section 3. matters are not always that simple.
which in Excel can be computed with the function =FINV (cri terion. we will use a custom macro for that purpose. so that the number of degrees of freedom is N . Both macros yield syvalues together with the individual fitting coefficients ai and their standard deviations Si.) 3. It likewise considers the variance Vw of the entire fit. but these require additional information. the variance Vyy .Ymodel.) Exercise 3.. Because Vw = ('Lt::. named after one of its originators.m+ 1 / Sy. (If we were to leave out the constant term ao.10 The Ftest A second. we would instead have N . In order to find the minimal value of Sy we can use LinEst. because there are p + 1 terms. R. where the residual ~ is given by Y . or its square. Fisher. 3: Further linear least squares 115 synchronous detection or boxcar amplification. and typically must be built into the experiment. and each time extracting the corresponding value of Sy. Regression.1 illustrates the use ofLSPolyO. or even to display only that polynomial. but can easily be modified to do so. in effect setting the termination requirement equal to Sy. In the power series Y = ao + alX + a2~ + a3X3 + .?)/(NP). gradually increasing the order of the polynomial. dfl. we can then make our choice of polynomial if Sy indeed goes through a minimum. Sy and Vw will decrease with increasing polynomial order P even if 'L~ 2 remains constant. we use the custom macro LSPolyO or LSPolyl. There usually is at least one minimum in a plot of Sy vs.) The Ftest.1. This would be a somewhat tedious process. from aQXo through ap?. 3. and P = m + 1 denotes the number of parameters used to describe Ymodel by the polynomial. closely related approach is based on the Ftest. df2).P degrees of freedom. LSO. Sy. they cannot be used ex post facto. Collecting those values. and the above protocol is eminently suitable for automation. polynomial order. requires the user to select a criterion for the . by selecting that polynomial for which Sy (or vw ) is minimal.e.. or LSI repeatedly.p .. (They were not designed to stop automatically whenever Sy encounters a local minimum. + ap? the number of parameters used to define the polynomial is p + 1. Depending on whether or not we want to force the polynomial through zero. but since we have a computer at our fingertips. i. A.m > 1 where the added index on Sy denotes the highest term in the polynomial used.Ch.9 The standard deviation o/theftt The simplest approach we will illustrate here is based on the standard deviation of the fit.12. One often stops at the first local minimum.
p) (vpp /vqq)(N q) (q.. such as 0.N qI) _ (N .e.p I)(L~~ IL~~ I) (3..N q) (3. for q = p + 1 with a = 0.p =~~~ (L~~ L~~)/(q .N q) = =.q 1) (q.p. while dfl and dj2 denote the degrees of freedom of the two fitting functions that are being compared..q 1).p F(a.01 (1 %)..10.3) where vpp = L!1~ /(N  p) and Vqq = L!1~ /(N q).q. The above result applies to the general polynomial y = ao + a\x + a2>.P .. +a/ through the origin we have instead FRa = Fq.10.01 respectively.q. 5%) and 0.q.10.1) where vpp = L!1~ /(N . Note that qp is the difference between the degrees offreedom of the two fits...2) If FRa is substantially larger than one. An application of the Ftest to evaluate the usefulness of extending a polynomial fit of N data points from order p to order q (with q > p) involves comparing the ratio F q.1) with q = p + 1 although.p)F(a.05 and 0.:..~ (N .p.116 R. The most commonly used avalues are 0. Advanced Excelfor scientific data analysis acceptable probability a (expressed as a decimal fraction.p) F(a. +a/.p)F(a.p F(a.q.2).10.(N .p) L~~ I(N .p.05 for 5%)..05 (i.p) test F( a. with the F (q..? + ayc3+ ...1)(vpp / vqq) .N ql) (3.N qI) _ (N . qp.10.q I) _ (N .q. We typically apply (3.p. de Levie.p..1)(vpp / vqq) .(N .. .1. and that this method can only be applied for q < N . Nql) in F FRa= q.q 1) (q.. it may be useful to consider q = p + 2 as well.P ... the additional terms up to q can be considered to be statistically significant at the chosen value of a. The custom macro LSPolyl displays both Sy and two values of FRa as defined in (3. especially when dealing with symmetrical functions.P 1) and Vqq = L!1~ /(N . For the polynomialy = a\x+a2x2 +ayc3 + .
3: Further linear least squares 117 Likewise. one can make the covariance disappear by fitting to y = ao' + aj(xxav) instead of to y = ao + ajX. are therefore provided to compute orthogonal polynomials for finite sets of xvalues.g.e. on the order j of the polynomial. Two custom macros. (1) In general. OrthoO and Orthol. this complication is relatively minor. finding orthogonal polynomials can be rather laborious. e. In principle this would seem to be the optimal way to determine at what term to terminate a power series. and in fact is entirely equivalent. and on the average xvalue x av . and requires that we use orthogonal polynomials.xm. For equidistant data. on the basis of its individual xvalues. Note that the data fit obtained with orthogonal polynomials is no better than that obtained with the corresponding power series or parametric expression. (2) A particular set of orthogonal polynomials corresponds to a particular power series or other parametric expression. + a. its Fratio should be larger than one.3).. We already encountered this in centering: the quantity (xxav) indeed depends on all xvalues in the data set.e. and can then be evaluated individually for statistical significance. X3. i. so that they can be computed one at a time. .11 Orthogonal polynomials In fitting data to a straight line. At any rate. 1% for FR1). LSPolyO incorporates Sy and two FRavalues based on (3.. because the sought polynomials then are the readily computed Gram polynomials. FRavalues for q = p + 2 can also be incorporated in those custom macros. OrthoO should be used for curves that go through the origin. preferably by at least a factor of 3. or to multiparameter fitting as in y = ao + axx + azz + . 3. This principle can be extended to least squares fitting to a power series such as y = ao + alX + a2x2 + aye3 + . which depend only on the number N of data points in the set..1. data for which the increment At.. i... . because such polynomials must be constructed anew for every data set.10.. If desired. . The xvalues need not be equidistant. For an added term to be included within a given probability (5% for FR5. whereas the intercept of curves fitted by Orthol is not constrained. In practice there are several constraints..is constant. Making the covariances zero makes the fitting coefficients mutually independent. the orthogonal polynomials of order j considered below will be linear combinations of xo. being laborious is no valid excuse when we can use macros to take care of the busywork. The ad .Ch. by using a function of x (in this case xxav) rather than x itself.. x\ X2...
a quadratic to a cubic. can be overwhelmed by too much noise. e.g. If P3 passes that test... orthogonal polynomials allow us to approach data fitting as a problem of successive approximation. which may be convenient to simplify the propagation of experimental uncertainty through subsequent calculations. their covariances are zero. say. which can be tested for statistical relevance. nothing new: statistical data analysis always labors under the constraint that the validity of the analysis results is questionable when the quality of the input data is insufficient. an advantage of expressing y in terms of orthogonal polynomials Pix) instead of as a direct power series in x is that the resulting coefficients Pj are mutually independent. of course. In going from. we need not recalculate the coefficients Po through P2 because these remain the same. (3) The method indeed works beautifully for noisefree data but. a set of polynomials Pix) is orthogonal if IP/X i ) Pk(xi ) =0 for j IPj(Xi) Pk(xi ) *" 0 for j *" k = (3.1) (3. garbage out"... including a termination criterion. uncorrelated) coefficients. This is summarized succinctly in the expression "garbage in. de Levie.. Moreover. we can test whether P3 is statistically significant by comparing its absolute magnitude with.. + p". we can increase the order of the polynomial by 1. As already indicated. Advanced Excel for scientific data analysis vantage of using orthogonal polynomials is that they provide mutually independent fitting coefficients.11. Specifically.Pm(x) yields the various coefficients Pj' Because these Pi are mutually independent. But enough provisos already. This method therefore satisfies the intuitive notion of testing each added term for its statistically significance. In other words. + amXm involves the same number m+ 1 of terms. This is. with the polynomial Pix) containing terms in x up to and including . Since the coefficients are mutually independent.11.. its standard deviation and/or the numerical uncertainty of the computation.118 R.e. as with any other statistical method.2) k Expressing an unknown function in terms of orthogonal polynomials y = Po + P1P1(X) + P2P2(X) + P3 P 3(x) + . Fitting a set of data to the function y = Po + P1P1(X) + P2Pix) + P3 P 3(x) + . + p". multiplying a given term Pix) by a constant Aj .. etc. so that we only have to compute the added coefficient P3.t.Pm(x) instead of as a power series y = ao + alX + a2x2 + a3x3 + . Orthogonal polynomials are defined as polynomials that have mutually independent (i. repeat the procedure by testing P4.
Ch..e.3) (3..11. where d = Lh is the distance between adjacent xvalues. additional polynomials Gj can then be computed with the recursion formula Gj+l(X)=~Gj(x) i (N 2 2  4(4j 1) i) Gj_1(x) (3. differing only by orderdependent constants Aj. The resulting multitude of equivalent expressions.11. J. which can be represented by the corresponding set of Gram polynomials G/x).11. This has led to various normalizing schemes. The first two Gram polynomials are Go(X)!An= 1 GI(X)!AI = ~ (3..7) (3.11... and Aj is an arbitrary constant for the Gram polynomial Gj . Gram. reine angew. but no dominant convention has yet emerged for Aj>l. y = go + gl GI(x) + g2G2(X) + g3G3(X) + . can be quite confusing to the novice. In order to illustrate how orthogonal polynomials work we will here illustrate them first for data that are equidistant in the independent parameter x.6) (3. + gmGm(x) with coefficients gj.4) with the compact notation ~ = (xxav)/d. both An and Al are set equal to 1..11.5) so that (for An=AI = 1) the next few terms are G(X)!A 2 2 =~2_(N21) ~ 12 2 (3. Typically. 3 = ~3 _ = ~4 _ ~ G (x)! A 4 4 (3N 7) ~ 20 2 2 2 (3N 13) ~2 + 3 (N 1)(N 9) 14 ~ 560 .11.8) G (x)! A:. since it does not rely on a fixed formula for the orthogonal polynomials but. orthogonal polynomials can be defined with arbitrary scale factors Aj. instead. As our example we will consider fitting data to the power series y = ao + alx + a2x2 + a3x3 + .. 3: Further linear least squares 119 will only affect the corresponding coefficient Pj. computes these for any data set by GramSchmidt orthogonalization. Math.xm. For a set of N data points. i. Consequently. in which case the resulting Gram polynomials (J. 94 (1883) 21) take on a rather simple form. The custom macro Ortho works for all situations to which LS can be applied. because (Pi Aj) x Aj Pix) = Pj Pix). P. + a.
(2) Highlight the data block A 15 :C27.2.11.3).2. PI> and pz shown in e . The instructions below will assume the format of Fig. 3.10. or make a new data set along similar lines. and may need modification if you make your own data set...11.1 for the data shown in Fig. de Levie. see (3.2.230N + 407) . With the tools now at hand we can extend this approach to power series of the type y = ao + a.:6_ 5 (3N 31) .6). so that Orthol generates Gram polynomials. + amxm. application of (3 .Sj j (3. Exercise 3. providing the graph with some visual uncertainty estimates. Since the custom macro Ortho1 provides both Pj and sj.1. which skips one column and will therefore leave the data in D15:D27 intact.7) . the input data are equidistant. and call Ortho 1. 5 (N 2 1) (N 2 9) (N 2 25) ***** In section 2.xav)/d with d = 1 for the increment Llx.. Verify that column F indeed shows ~= (x .10) 14784 We note that G.: (3.11) indeed reduces to (2.11. (3) Because the increments in x are constant.} in (3. and that column G contains (Nz_ 1)/12 where N = 13.. =. Advanced Excel for scientific data analysis 2 2 4 5 (N . (4) In column H. + pmPm(x) these can be fitted by least squares to find the coefficients Pj and the corresponding standard deviations Sj. For a curve through the origin a different set of Gram functions is obtained.9) 5 5 ~ 18 ~ 1008 ~ 2 4 2 G (x)/A.} = 1.10 we saw how to draw imprecision contours around a straight line. Label the data in columns E through G with headings for Y.:4+(5N 110N 329) . Po = 1 and PI = XXav.11. Since Go(x) = 1. We now generalize (2. as is readily verified with the custom macro OrthoO. so that centering is indeed equivalent to fitting data to a Gram polynomial of order 1.11) j=O For a straight line.11. the above polynomials pertain to a power series with arbitrary intercept.11) should run from 1 to m. = (xx av ..1. =. under a second labeiYcab verify that you obtain the very same answers as in column D when using the coefficients Po.11) is rather straightforward.(x) = . Upon their transformation into the orthogonal power series y = po + PIPI(X) + pzPlx) + P3 P 3(x) + .11.3) to I 2 2 D2 2 D2 2 p2 S=\jSy+SO£O +SI£1 +". For curves through the origin.1. so that (3.11.:2 6 6 ~ 44 ~ 176 ~ (3.)/ d. 3.10.120 G (x)/ A.+Sm m = m 2 " 2p2 Sy+ L.11. as illustrated in exercise 3.11.:3 + (15N .1: (1) Return to exercise 3.:5 _ R. the number of data points used.x + azx2 + a}X3 + .. PI> and P z respectively.
We can also find the imprecision contours from the usual series expression for y.1 with imprecision contours at ± one standard deviation."f{F(a. and in columns J and K compute Yea{e 8 and Yea{e + 8 respectively.2.2. In other words.12) instead of (3.11). NP)}. ± one standard deviation corresponds roughly with a confidence levellaof2/3.11) or. etc.Ch.11.7Y that was the starting 30 20 10 0 10 20 30 40 0 2 4 6 8 10 12 14 point for the data in Fig.11) through (3.13) by .1. 1.e. 3.11.. F(U3. (7) To keep matters in perspective (and keep you humble).1: The data of Fig. . as long as we have access to the corresponding covariance matrix.81. 1.1. Again.13) where Xji denotes the value of Xj for observation i.11. In that case we must use Sj = Sy 2 m m +L L VjkX! Xj j=O k=O . for curves where ao == 0. those in column B. 3: Further linear least squares 121 E28:028. J. Fig. for a multivariate analysis. and K vs. the indices j and k should start at 1.11. also calculate and plot (in a different color) the function Y = 10 +8x (J. 3. see Fig. (6) Plot the data in columns H. k (3. i. NP) ::::: 1. Again.11.11. For NP ~ 10. and 82 in cells E29 through 029 respectively. together with the individual data points. those of 80.11. Sj = S~ + L m m LVjkXjiXki j=O k=O (3. multiply s in expressions such as (3. The value of 8 y can be found in cell E31. for confidence contours. 3.11. 3. the instruction =$E$28+$F$28*F15+$G$28*G15 in cell HI5 will reproduce the value in cell D15. (5) In column I calculate 8 according to (3.
12.2. would pass through the points (0.11. with closedform solutions of the type shown in section 3.1. x will show that the data exhibit a clearly nonlinear trend. For Go through G3 the (absolute magnitudes of the) corresponding coefficients gj clearly exceed the corresponding standard deviations s.2.7 40 31. i.6 90 83.8). We will use Gram polynomials to check the data for nonlinearity. 3. they are statistically significant. x= y = 10 8. (3) For each application of LSI. 3. who reported the observations listed in table 3. 4 (1) 4 (= 4. dev. which will calculate the fit to a firstorder Gram polynomial.12. in columns D through G calculate the polynomials G2 . as a function of the ethanol content (in volume %) of an injected ethanolwater mixture. 0. = (xxav. Finally. this is not the case for the . and call LSI..12. and arrange the results in tabular form.1: The relative peak area y due to ethanol. 57 (1985) 956.. if extended. Exercise 3. Advanced Excel for scientific data analysis 3.7 80 70.8 60 49. as in Fig.6) through (3.. and verify that it only contains significant values on its main diagonal.12.11. in area%. 3.0) and (100. Repeat LSI. Condense the output by overwriting the label st. Anal. In column A deposit x.1: (1) In a spreadsheet deposit the values for x and y respectively from table 3.1 for the peak area fraction in a gas chromatogram due to ethanol. I.)/d. display the corresponding covariance matrix. G4 . Chern. We do not know what causes this trend.100). i. each time including one more column to the right as input to LS 1. and Gs using (3.12. (How to force the least squares program to do that will be illustrated in section 4. 4) because Xav = 50 and d = 10.) Unfortunately. in volume%.16 20 15. G3 . We merely use this example because it contains equidistant data points. 3. as in Fig. 1.12 Gaschromatographic analysis of ethanol As our first example we will use a small data set provided by Leary & Messick.11.122 R. 3.e.12.: with SY' then save the results by moving them at least three rows down. However. a plot of y vs.5 50 39. In column C calculate G1 = .6 vol% area% Table 3.e. so that we can illustrate the application of orthogonal polynomials at its simplest level.12.9 30 22.2 is that the coefficients gj obtained at lower polynomial order are not changed when higherorder polynomials are included in the data analysis. Theoretically these data should fit a straight line that.4 70 59. The following exercise is simply that: an exercise in finding the optimal fit of these data assuming that they can be represented meaningfully by a power series. as a function of the volume fraction x of the injected sample.1. 2. (2) Highlight the data in columns Band C.9. and in column B the corresponding values of y.. (4) The most significant aspect of Fig. de Levie. nor do we know whether a power series would be an appropriate model. all offdiagonal terms should be zero except for computer roundoff errors.
The numbers below the labels Coeff. 3.0090 0. 3.7 31.: 0.339 CoeJJ.2: The results of applying LSI to the data in Fig.296 0.: 0.12.021 42. 12 0.046 0. 3: Further linear least squares 123 next coefficients.296 0. or FRI are all nonmonotonic functions ofpo\ynomial order.12.1 are therefore best represented by a thirdorder polynomial.1. representing a = 0.33 36 . and depends on the value of a.l2.12 0. Note that Sy.516 Coeff.: are the standard deviations of the fit. 373 0. These Sy values show a first minimum at order 3.43 60 18.2 do not indicate whether they are best represented with or without a term ao.1 in terms of Gram polynomials.17 8 16 11 0 11 16 8 17 I 0 2 3 4 2 3 6 7 6 3 2 9 24 26.4 59.12.7 70.354 term 0 term I term 2 42.: 0. the result for the third order Fratios is more ambiguous. 3.066 0.) A x 10 20 30 40 50 60 70 80 90 Y 10 11 .5 39.05 and 0.3 7 9. For LSPolyO. the data in table 3. 0.296 0.0018 0. 3. and drop to well below I at order 4.0049 0.3 73 0. Coeff. as does LSPolyO in Fig. 3.296 0.0029. FRS.12.0046 0. see Figs.4 are the same.6 4 3 2 9 .029 42.296 0. 6 26. For LSPolyl.019 42. 37 9.0262 0.18.1: The analysis ofthe data in table 3.86 26.3 7 9.044 0.9 22. sr' .67 73.509 Coeff. 49. the data in Fig. (5) Because G] is simply the average value of x.12. both Fratios (FRS and FRI.018 and 0.17 0.67 36 73 .0262 0. with FRS> I.12.4.12. and we can use this as a criterion for determining the optimal polynomial order.6 83 .43 30.0025 Fig. FRI < 1.020 term 3 term 4 term 5 0.372 CoeJJ.2 and 3.12.3.0094 0.0099 0.3 73 0.: 0. In terms of an integer power series in x. (Note that this requires interchanging the columns for x andy.3 73 0.3 and 3.33 24 26.Ch.048 0.67 Fig.0262 0.84 0. For this we now use LSPolyO and LSPolyl. 3.11 0.12.37 9.0018 0.12. 16 15.325 42.0029 0. 3.67 60 15.01 respectively) clearly exceed unity for the second and third orders.37 9. (6) We first note that the values for Sr in Figs.86 0 15. : 2.
if you want to make these integers as small as possible.025 0.366 0.976 term 3 term 4 Sy: 0.21 ± 1.: St. as is best seen when we consider the resulting coefficients.322 0.041 term 3 term 4 term 5 Sy: 0. Chern.Q75 0. nor the possibility of a curve through the origin.022. and a3"" (3.: St.181 0. The method does not tell us whether a power series expansion is appropriate. this conclusion differs from that reached by L.6 1 ± 0.197 0.) x 103.1 by multiplying G2 by 3. by 5/6). a fit of the same data in terms of the general cubic expression y = ao + alX + a2x2 + aJX3 is much less successful.399 0.036 FR5: 6. ao = 0.3 6. M.799 ± 0.266 term 2 1. G4 by 7 (or 7/12). and a2 is not significant at all. az = (0.: 8.088 FR5: N/A FRI: N/A Sy: 0.319 FR5: 0. a2"" (1.Dev. Advanced Excel for scientific data analysis (7) Where does all this lead us? The orthogonal polynomials suggest that we use terms up to and including x3 . where ao is only marginally significant. .994 0. Anal.) x 105.73.12.272 0.) x 105.002 FRI: 0.002 FR5: 0.: Order 4 Coeff.936 FR5: 14. 3. Schwartz.183 term 1 6.12.Dev.641 Sy: 3.712 0.1 as a cubic power series through the origin.3: Analysis ofthe data oftable 3.100 0.12.389 0. Whether that power series should include a constant term ao is a separate question that we have here answered using LSPolyO and LSPolyl.: St.: Order 3 Coeff.048 0. the orthogonal polynomials can be written in terms of simple integers. and leads to coefficients aj that are all much larger than the corresponding uncertainties Sj.301 FRI: 6.833 0.12.394 0.: Order 5 Coeff.715 Order 2 Coeff.530 term 3 Sy: 0.317 0. at higher polynomial order. VeritY that this is indeed the case in the spreadsheet of Fig.: St. is clearly supported by all the above criteria.458 term 1 9. (II) For equidistant data.) x 103. 3.133 0. who considered neither the ratios of the individual coefficients and their standard deviations. de Levie. and Gs by 3 (or 3/20).: term 2 0.223 term 1 7.1 with LSPolyO. 58 (1986) 246. the standard deviations Sj are of the same order of magnitude as the corresponding coefficients ai' (8) Representing the data of table 3.8 2 ± 0. Order 1 term 1 Coeff.001 Fig. and a3 =(2. we can use orthogonal polynomials to determine how many terms to include. al = 0.62 ±0. y "" alx + a2x2 + aJX3 with al "" 0. (10) Incidentally.5 6.373 0.124 R.030 term 2 0. at i = N.7 2.218 term 1 8. (9) On the other hand.006 FRI: 2. G3 by 5 (or.044 0.648 0.362 0.: St. but if it is. In that case all polynomials will alternately start (for i = I) with either I (for even values of}) or I (for odd}).Dev. Another common way to standardize these polynomials is to divide all polynomials of given order} by the value of that polynomial at its last data point.072 term 2 0.723 ± 0.060 .90. (12) The above illustrates both the advantages and limitations of fitting data to a power series.99 ± 0.Dev.Dev.003 FRI: 0.
0 9048.176 6.7 7.662 2.1 with LSPoly 1.15E06 5.9989.: term 0 4. The resulting lownoise calibration data (kindly provided by Prof.98E06 term 1 term 2 term 3 term 4 0.: St.13. 3.723 2. In section 2.94E08 2.72EOI FR5: 1.76E+OI l.5 45 47.54EOI term 5 FR5: 1.36E03 8. and x the ethanol percentage.20E+OI Sy: 3 . between 2825 and 3096 cm~1 over a oneminute period.360 2.9978. Sanford) are listed in table 3.09E01 2.2 4295.66E03 9.13 Raman spectrometric analysis of ethanol In a recent paper. feelgood parameters provided by many software packages.1 3998.01E05 1. 82E07 0.993 0.40x with a squared linear correlation coefficient R2 of 0.4: Analysis of the data of table 3.62E05 0. Mantooth & Jones (J. 3.82E02 FRl: 6.90E04 FRl: term 1 terml term 3 0.1 and illustrated in Fig.63E04 7. They reduced random noise by integrating the signal in the area of the Raman peak.22E03 1. including LinEst and Regression in Excel.73E03 0. i.08E04 2. Educ.1.: St. x Y 0 0.648 term 0 0.1 x Y 37.Ch.030 2. 3.2 10 15 20 25 30 35 874. Sanford. where y is the integrated.437 term 0 1. visual inspection of Fig.13.82 + 10 1.2 3230. We therefore .: 2.54E02 6.: Order 2 CoejJ.86E07 term 2 term 1 term 3 term 4 1..150 2. However.45E08 FRl: 4. baselinecorrected peak area.12.7 2.: St.8 5763. those in the middle region lie below it. 78 (2001) 1221) used laser spectrometry to determine the ethanol content of ethanolwater mixtures.70E04 6.20E02 Fig.9 4526.441 2.1 4799.13.9 2769.e.Dev.41E01 2.557 3.060 l.8 5 420.: Order 5 CoejJ.12.Dev.1: The measured Raman intensities y (in arbitrary units) as a function of the percentage of ethanol x in the ethanolwater mixtures.6 1359. and they applied a baseline correction using the average of the signals at 2815 and 3106 cm~l.: St.5 50 60 70 80 90 100 3532.733 term 0 1.708 term 1 5).5 667.13.Dev. 3.1 suggests some curvature: the reported points at the extremes of the curve tend to lie above the fitted straight line.1 1764. Chem.: Order 4 CoejJ.: Order 3 CoejJ.Dev.: St.930 0.107 1.4 7974.7 10352.828 term 0 2.39E0 I FR5: 1.0 1 100.Dev.5 40 42.52E+00 FR5: N/A 0. 3: Further linear least squares 125 Order 1 CoejJ.724 0.:/ or rxy .16E06 0.5 194.439 1.39E02 5.6 Table 3.25EOI Sy: 3.76E05 4. R = ±0.10 we already commented on r.9 6817.62E03 Sy: 3.032 FRl: N/A Sy: term 1 term 2 FR5: 0.29E+00 FRl: 5.9 2287. These data were reported to fit the straight line y = 181.1 3708.
Likewise. the F ratios are much larger than 1 for first and second order. we use the custom macros OrthoO and Orthol to generate the appropriate orthogonal polynomials.13.13. 3.2.. as shown in Fig. The magnitudes of those residuals are rather small. The line shows y = 181. and thereafter shows only minor changes. We see that the value of Sy exhibits a big drop in going from 1st to 2nd order.1: The measured Raman intensities y as a function of the percentage of thanol x in the ethanolwater mixture (open circles).4 show the resulting outputs. and just barely exceed 1 for fourth order. these integrated data show relatively little random scatter. . Alerted by this residual plot.40 x. and the small solid points the resultingyvalues at the xvalues ofthe data. 12000 y 10000 8000 6000 4000 2000 o o 20 40 60 80 x 100 Fig. 3.. i.l3. Figures 3. 3.•• • ••• 20 200 •• . we use LSPolyO and LSPoly 1 to find whether a power series in x might improve the fit. 40 • • 60 • • 400 0 80 x 100 Fig..6 show the results obtained with orthogonal polynomials.2: The corresponding residuals clearly show a systematic trend.13. the residuals clearly exhibit a systematic trend. considering that the signal range is about 10000. de Levie.13. Because the data are not equidistant. Figs.. However.l3.. 400 R 200 • o ~ ..82 + 101.. Advanced Excel for scientific data analysis calculate and plot the residuals.5 and 3. 3.13.3 and 3.126 R.e.
llEOl 6.269 term 3 6.203 FR5: 0.: Order 4 Coelf. . l1E04 3.479 0.Dev.64E05 6. 3: Further linear least squares Order 1 CoejJ.: OrderS Co<if. Fits to LSPoiyl (see Fig.154 0.47E07 3.: term 1 127 Sy: 1.11E06 I. because the data had already been baselinecorrected.96E03 3. and at or below 3 for terms of order 3 or higher.26E03 S. 3.: StDev.144 term 2 0.82E06 1.: Sf.98E04 FR5: 3.: Order 2 CoejJ.39E02 Sy: 35.73E03 6.942 FR5: N/A FRl: N/A Sy: 36. for both OrthoO and Orthol.210 0.OlE04 4. instilling little confidence in their statistical significance.082 ferm5 3.: St.202 term I 101.31E05 1.80E07 4.59E07 4.58E07 FR5: 1.632 FR5: 0.86EOl term 1 term 3 term 4 Sy: 2.Dev.25E03 term 3 1.071 term 1 87.238 5»: term 2 143.014 term J 82.212 3.S4E+Ol 2.: Order 6 Coeff: St.24EOl Fig.553 16.514 0.282 term 1 85.: Order 6 CoejJ.13.73Et{)1 I.28EOS 3. or only marginally larger.: Order 5 Coelf.83E03 9.4: The output of LSPolyl for the first 6 orders.48EOl S.75E04 term 3 6. 76EOB FR1: B.18E04 term 4 6.: St.02EOl 3.Dev. for all but the first order.124 FRI: 0.SSEOl FRS: 1.83Et{)1 8.439 8. 3.: St.065 Sy: 30.Dev.S4E+Ol 8.02E02 term 3 1.8SEt{)1 term 2 8.114 19.831 3.225 0.66E06 FR1: l.29E03 1.82E07 term 5 term 6 4.041 FRl: 1.33Et{)O 2. This is not surprising.293 22.40E03 l.96E+Ol B.88Et{)1 1.87Et{)1 8.701 fermO 0.f: St.600 FRI: 0.S8Et{)1 8.Dev.310 5.371 term 2 0.823 50. the absolute magnitudes of the corresponding coefficients ao are either smaller than their standard deviations So.: St.4) yield rather large relative standard deviations for ao.699 FR5: 65.36E+OO term 1 term 2 term 3 term 4 term 5 Sy: 2.40EOl 2.6lEOl 3.26EOl 1.: StDev.243 FR5: 2.09EIO FRS: 1.S3E02 term 1 term 2 term 3 term 4 term 5 term 6 Sy: 2.491 lermO 20.SSEt{)O 2.503 2.009 term 2 0.158 17.3lE+Ol 4.22E04 FRI: 1.32E06 0.S7E04 5. Order 1 Coelf.090 term 0 4.220 term 1 85.: St.48EO 1 l.Dev.IOE1O 5.43E05 1.810 lermO 0.Dev. Moreover.09EOl 1.320 term 4 Sy: 29.13.06E07 9.S3E+Ol 2.49E04 Fig.53E06 5.067 0.16EOl 1.14E05 term 4 5.Dev.36Et{) 1 term 1 term 2 term 3 Sy: 3.18Et{)2 S.84EOl 8.: Order 3 Coeff. 3.520 lermO 6.8lE02 S.13.81Et{)2 FRS: N/A FR1: N/A term 1 term 2 Sy: 3.: Order 2 Coelf.861 term 1 90.24EOS 4.83E03 FR1: 6.Dev.396 1.11E03 2. The corresponding ratios from the orthogonal polynomials are above 20 for the linear and quadratic terms.34E02 3.24EOS FR5: 2.68E03 3.054 term 2 0.604 FR5: 4.3: The output of LSPolyO for the first 6 orders.23E+OO 6.Dev.: St.896 FRI: 35.Ch.231 18.: Order 3 CoejJ.f: St.43E06 FR1: 2.23E07 FR1: B.: Order 4 Co<if.33Et{)O 4.03Et{)O l.97E08 Sy: 29.: lermO 181.
15511 0.52 term 4 3.04 6.59E07 4. The above example illustrates the benefits of plotting residuals.dev. 3. including the integration procedure.1. the function y = alX + a2x2.: term 1 term 2 term 3 term 4 3567. dev.69 101.13. term 5 term 0 Coeff. plot them.13.32 564.30 term 3 3. which has no more parameters than the linear function y = ao + alX it replaces.93 3.: ratio: R.1535 0.1 IE04 3. and calculate and plot their residuals. Verify that you indeed obtain the results shown here. Note that the residuals now indicate only a very slight amplitude increase at larger xvalues. and the corresponding residuals.l6E04 1. Exercise 3. would best fit the data. Exercise 3.82E07 0.13. (4) Do the same using the equationy = atX + a2x2 + a4X4. Advanced Excel for scientific data analysis term 1 98. This fit.8 respectively.: ratio: Fig.13.23E07 0.128 Coeff: st.1 (continued): (3) Calculate and plot the data of table 3.3250 0.58 term 2 0.97 3.13.00548 28.13. we therefore opt for the simplest of the secondorder fits.1 to the equation y and also compute and plot the corresponding residuals. the quadratic calibration curve should yield more accurate results than a straight line. 3.33 term 5 3.220 460. Say that we make three replicate .0073 21. (2) Apply LSPolyO and LSPolyl to those data.l5E05 2.1: (l) Make a spreadsheet with the data from table 3. F or the purpose of chemical analysis.13. We will now illustrate how to use such a quadratic calibration curve for that purpose.5: The output ofOrthoO for the first five orders.73E06 3. or reflect a more basic nonlinear feature in the data. which perhaps could be corrected by the inclusion of a fourth order term. 3.55 6.24E05 9. are shown in Figs. = atX + a2x2.75 st. nor do they clarify the reason for the nonlinearity in the above results. assuming of course that all other experimental parameters are kept constant. de Levie.1310 750.85 Fig.396 0.58E07 4. if any. as well as OrthoO and Orthol. Guided by the parsimony principle.7 and 3. The above tests do not reveal what specific form. The latter may be an artifact from the integration procedure used.62E04 1.05 0.6: The output of Ortho 1 for the first five orders. and of subjecting the data to statistical tests that can indicate whether higherorder polynomials would yield a statistically better fit.98E04 2.43E05 l.13.
quite close to the answer under (8). y = alX + a2x2.13.44 vol%. and Su instead of the covariance matrix.325. Unfortunately. Display the covari (6) In another column. with the variances Vll. yield Raman intensities of 4000.3 4 ± 0. there are many ways to get incorrect answers even from quite good data. (8) Call the custom macro Propagation and find the standard deviation suo For this example. a2 = 0. We now wish to determine the corresponding ethanol percentage Xu. andyu. . 4050. S2. The line is drawn with a1 = 87. 3: Further linear least squares 129 measurements on an ethanolwater mixture of unknown compositIOn that. Exercise 3.3 4 ± 0. or Xu = 41. 3. With the expression y=alx+a2x2 we have Xu= [al ± 'i'(aI 2+4a2)lu)] / [2a2] where Yu is the average Raman intensity of the unknown. If you had used the straightline fit y = ao + a1x you would have found Xu = 41.1 to y ance matrix. Likewise align the variance Vuu so that the covariance matrix and the standard deviation Su inyu share a diagonal. 12000 . and its uncertainty. 3. and 3980. cf. JOto° t 8000 6000 4000 2000 j o 20 I o 40 60 80 x 100 Fig. and its variance Vu with =VAR ( ) . and Vuu in the same order as a]. a quite respectable result. 4050.1 (continued): (5) Use LSO to fit the data of table 3. a2.13. with a relative standard deviation of less than 1%. Fig.3 7 vol%.7: The same data analyzed as a quadratic through the origin. (9) If you had used the standard deviations s]. (7) Use Copy:::::> Paste Special to copy the average yuvalue to be aligned with the values for a].e..13. V22. and 3980) and compute its average Yu with =AVERAGE ( ) .6 8 ± 0.13. the correct answer for the unknown ethanol concentration is Xu = 42. = a1x + a2x2.9. place the three observations of the Raman intensity of the sample of unknown composition (4000. so that they are on one contiguous row. or in one contiguous column.3 1 with the covariance matrix (or with centering). a2. i. the answer would have been 42.6 8 ± 0.Ch. on the same equipment and under the same experimental conditions.155.6 6 without it.
0 9048.903~ 12 _ 4.0033121I eM: 0. We now ask a practical question: given these calibration data. 3.9 4526. 3.1551 0.13.13. The scale is the same as that in Fig.4 7974.8: The residuals of the quadratic fit in Fig.. 4295.511 0. ..130 400 R.7 exhibit only a slight trend towards largeramplitude residuals at larger xvalues..25 2500 50 60 3600 4900 70 80 6400 90 8100 10000 100 87.84173 0. .8 5763. 3. is it worthwhile to make more repeat measurements of Yu to obtain a result with higher precision? For the answer we go to the part of the spreadsheet shown in Fig.13. and the subsequent computation of xu' The variance Vuu = su2 is entered in the covariance matrix.6 Coeff: StDev: 45 2025 47.315 0.262 0 .13.06 1300 xu= su= 41..1 4799. Advanced Excel for scientific data analysis R 200 0"""" • 200 .374 Fig. 3.0068 35. andyu. substitute 0 (instead of 1300) for the variance of the measurement. 40 60 • • x • 400 20 80 100 Fig.9: Spreadsheet detail showing the bottom of the columns for the calculation of ah a2.. de Levie.• • •• .13. o 1.9.7 10352. .2. 3.~7E9~ 4000 4050 3980 4010 4010 36.684 0.• •. which now yields a standard . and use Propagation. . .9 6817.5 2256..
and we therefore ask what the result would be if we quadrupled the number of observations. and which (if any) would better be left out? The experimental data are shown in Fig. we highlight block B3:F15 and call the macro Ortho1. We note that the method used here and illustrated in Fig. 3.1 would increase from 3 . tricalcium silicate.11. and speaks for it .Ch. except that we here deal with the intersection of a parabola and a horizontal line. That is encouraging. tricalcium aluminoferrate. H. Because the sum of the four weight percentages must add to 100%. However. and that their individual deviations would be similar. 3. H. Starke (Ind.1 = 2 to 121 = 11.14. the four weight percentages are mutually dependent. The result is shown in Fig. Steinour & H.2. the above method cannot be used. Eng.14 Heat evolution during cement hardening A muchstudied example of a muitiparameter fit can be found in the measurements of H. R. In other words. Woods. Depending on the subsequent uses of the result of the analysis. the major part of the uncertainty in the answer indeed comes from the uncertainty in Yu. 3: Further linear least squares 131 deviation of 0. Establishing why the calibration data do not quite fit a straight line might be a wiser use of your time.14. If the curvature of the calibration curve is so severe that terms of higher than second order are required.3 7 to 0. it mayor may not be worth your while to spend so much extra effort on a relatively minor improvement in the precision of the answer.4. In order to answer this question. a rather minor improvement for four times as many measurements. N . so that the variance Vuu would be reduced by a factor 4 / (11/2) = 8/11 to 8 x 1300111 = 945.1.45. and ~dicalcium silicate. assuming that the average would stay the same. 3. In that case l:~2 would increase 4 times.13. 24 (1932) 1207) on the heat evolved during the first 180 days of hardening of Portland cement.3 2. an uncertaintyfree measurement is not a realistic goal. However. from 3 to 12. studied as a function of the amounts (in weight percentage) of its dry ingredients: tricalcium aluminate. We now consider the question: which parameters should we include in the analysis as significant. Chern. 3. because it relies on having a closedform solution for Yu. y = Yu. in that case it may still be possible to approximate the calibration curve in the region around of Xu by a parabola. Substituting this into the covariance matrix and again using Propagation we find that the standard deviation in Xu would go down from 0.9 is similar to that of section 3.
a3 is only marginally so. because the nomials for forcing a curve through the origin are quite most readily seen when we compare the orthogonal try OrthoO. Armed with this knowledge we now use LSI on block B3:D15 in Fig. 3.14. larger spreadsheet.046) X2. the reader will have no difficulty seeing how they fit together.8 113.66 3 ± 0. Draper & R.6 95 . The answer we obtain is y = (52. 24 (1932) 1207.2 102. This is polynomials for . a].b) Xl + (0.132 R. R. Steinour & Starke. 3) + (1. 2nd ed.l. Is this the only possible answer? No..1.4 7 I II II 7 II 3 2 2J II 10 26 29 56 31 2 55 71 31 54 47 40 66 6 6 15 60 52 20 47 33 22 6 44 22 26 34 12 J2 6 9 17 22 I 4 23 9 Fig.9 83. Smith. for the heat evolution y (in calories per gram of cement) as a function of the weight percentages x of the clinkers used in making the cement. 635649.7 72 . It agrees with the result obtained via a much more laborious route by N. Eng.47 ± O.14.5 ± 2. Chern.3. and a4 is not at all. we have yet to that OrthoO is not a special case of Orthol. 3. Advanced Excel/or scientific data analysis self: the parameters ao. Ind. These three figures are adjacent parts of a single.5 74. and a2 are significant. It had earlier been discussed in A. who devote most of their chapter 6 to this problem. Wiley 1981.14. whcrc Xl refers to the chemical composition A12Ca306. Statistical Theory with Engineering Applications. A y 78. RaId. 1 115. Note orthogonal polydifferent.2 104. Applied Regression Analysis. and X4 to P.5 93 . Wiley 1952 pp. 3. and obtain the results shown in Fig.3 7. X2 to Ca3SiOs.3 109. de Levie.1: The experimental data of Woods.Ca2Si04.9 109.14. 3. X3 to AI1Ca4FelOIO. see Fig.
4 95. 3.9 83.579097 5.2 102. 3: Further linear least squares 133 equidistant data of y vs.251449 0. The columns for Xl through X4 display the orthogonal polynomials for the data in C3:FI5.676303 141.3 87.683594 2.402125 19 20 21 22 5.326357 0.095_i .5 93.09 i 443! CM: 10.715081 1.24676 5.538462 3.: Ratio: 0.175126 0.X av .882072 0.290753 L X4 Coeff.523012 0.0~126.Dev.2 104.182183 6.271438 0.?_0.242237 0.870304 0.~914~~ :0.538462 1.792525 0.31123 0.8 113.538462 0.048395 0.0839 J X2 21.461538 6.2: The results produced by Orthol.:002:.24676 9.550354 1.305634 0.175126 1 0.29344 4.2285791 23 rl Fig.87477 15.165842 0.68199 4.53846 6.125232 4.134513 4.54391 13.048395~O.148574 9.13832 0.6 95.14.706859 0.706558 25.Dev.461538 3. .746268 0. LCC: 10.121089 0.461538 6.340126 4.522698 0.2641 K X3 5.461538 3. L6 Coeff.9 109.: 2. their standard deviations s.194808 1.128102 0.1 115. and the ratios als.195678 Fig.046467 14.469527 0.: St.3 109.14.62947 1.662804 0. 1 0.716573 2.7 72.538462 4.443878 0.3: The results produced with LSI on the original data set. while the three bottom lines show the corresponding coefficients a. restricting the analysis to Xl and Xz. G H J Xl Y 78.538462 2.0012671 r L0.31691 0. 3.208321=0.045774 2. x.53388 1.662804 St.706558 19.5 74.070883 1.70656 18.246758 15.: 52.715707 0.282175 0.892361 0.461538 3.888204 0.Ch.461538 13.119665 15. in which case Orthol produces Po = Go = 1 and PI = GI = X .461538 5.41538 0.014662 0.875347. whereas OrthoO uses Po = Go = 0 and PI = GI = X.037882 0.
14.53468 6.036 4. Using LSO we then find the coefficients al through a4 displayed in Fig.73201 2.46532 2.88414 102.022193 0.743438 4. H y XI J X2 K X3 L X4 37.2 1 24.1 2 45.77198 78. 3.46532 4. de Levie.14.5 7 4.3 11 18. 3.3 over the fourparameter version of Fig.249342 2. Advanced Excel for scientific data analysis Figure 3.67867 10.5.54271 93.603877 10.2 11 7.75066 5.9 83.78665 6.585151 2.4: The results produced by OrthoO.131376 StDev: 0.35733 1.9925 3.67867 6.886002 109.910859 109.505143 10 24.223176 10.5 26.57251 7. 3.108016 Ratio: 123. 3.67867 12.46532 0.201072 11 8.672732 74. here the parsimony principle favors the threeparameter solution found in Fig.718872 21 43.464174 17.58007 113.9 7 21.4 Coeff: 8.37379 3.5.14.14.071371 0.83383 11. .89775 25.485647 0.93956 22.7 58.4056 71.408722 Fig.134 R.769587 0. The orthogonal polynomials are quite different from those shown in Fig.6 11 16.041254 11.497086 0.129871 3 72.222839 115.8 35.4 shows what we find with OrthoO instead ofOrthol: all four parameters are now statistically significant.3 87. 3.213322 104.807638 1.42726 19.2.74802 4. We therefore have a choice.13.14.92501 2.01623 95.
voltage.14...004~09 _ o. 3: Further linear least squares 135 16 21 22 23 24 25 26 27 Coeff: 2.047764 0. even probable.866155 1 0. Least squares analysis can be simplified significantly when the data are equidistant..007636 0.o_o..194809 1. _0. i.025255 0. because many instruments produce data at constant increments in time.: Fig. 3. As long as we restrict our analysis to linear terms. i.857707 0. But wait a minute: what does ao mean in Fig. This has led to new applications of least squares methods.e. for a chemical understanding of the hardening of cement. magnetic field.408722 r .. that ao is an artifact of the data fitting..0048091 :0. as in the above example. which are readily implemented on a spreadsheet.O~754908 .3? Could there be heat evolution independent of any of the chemical constituents of cement? And could that be a significant part of the observed heat evolution? Given the care the authors have shown in obtaining their data..034071 0.~~6239~ 10.1:'0.000945! 1 0. What constitutes the 'best' description is here seen to depend on the intended purpose of the data analysis.7334951 .00651 0. wavenumber.041254 2.759162 0. in which case they can be represented conveniently in terms of Gram polynomials.754908 0..OQ475}_ Q.857707 1 0..?g~ I Lee:! .. ao is unlikely to be due to calibration error. Consequently it is possible. or to some unexpected offset.. 3. The use of . 3..866155. etc.007636 0. for which all successive increments Ax in the independent variable x are equal. 1 0. But as far as an interpretation of these results is concerned... Such data sets are quite common in science...Ch.00Q945_ . and our conclusion must therefore be a little more ambiguous. a description as y = atXt + a2X2 + a3X3 + a4X4 in terms of the four constituents would seem to be more appropriate.::Q:. ! 0...5: The results produced with LSD on the original data set. . fitting the heat data to y = ao + atXt + a2X2 provides the most compact description. eM:' 0. wavelength.47~50~ 0}33~95 _____ !.022144 0..158919 0...14..004751..~239}_ Q.e.184584 0. and is therefore 'best' if data compression is the goal...!.153154 0.002281 0.022144 0.4795071.00651 0.15 Least squares for equidistant data Equidistant data sets are collections of data for which the independent variable is equidistant..485647 StDev: 0.
and subsequently uses them in the analysis. such as Milne's 1949 Numerical Calculus and Wylie's Advanced Engineering Mathematics (1951. Mitt. we can compute a smoothed replica thereof. Versicherungsmath. For an odd number of equidistant data points. Edinburgh 40 (1920) 112. Proc. Chern. who all provided tables of socalled convoluting integers.14. because the least squares analysis can then be performed with a set of fixed. From this fit we compute and store the resulting. tabulated convoluting integers. the length of the moving data sample should not exceed the characteristic width of the smallest features you want to be resolved without appreciable distortion. Schweiz.136 R. contiguous sample section of the data to the 'moving polynomial'. any small subset of the data can usually be fitted reasonably well to a loworder polynomial. the moving polynomial method is very easy to implement on a spreadsheet. This . The order of the polynomial should be as low as possible for maximum smoothing. (2) 13 (1914) 97. odd number of data points.. and as high as possible for minimal data distortion. and the length and order of the moving polynomial. subsequently corrected by Steinier et aI.1. except near the ends of the data set. In chemistry it is often associated with the names of Savitzky & Golay. This approach was further popularized by Whittaker & Robinson in The Calculus of Observations. Roy. (1915) 3. As illustrated in exercise 3. 36 (1964) 1627. Advanced Excel for scientific data analysis Gram polynomials for moving polynomial fits was developed by Sheppard. their wellknown 1924 treatise on numerical analysis that was still reprinted in the 1960's. Gram. Unfortunately. 44 (1972) 1906. who reminded analytical chemists of this method when computers became more generally available. the latter authors confused their readers with tables containing an unusually large number of errors. and Sherriff. Soc. Anal. whereupon the process is repeated. Here we will explore how the moving polynomial method can be used with a relatively loworder polynomial to fit a small. de Levie. Chern. London Math. The macro ELSfixed makes it even simpler. 1960). As long as the data density is sufficiently large with respect to the shapedefining features of that curve. typically containing a small. because it computes those convoluting integers. Proc. smoothed value at the midpoint of the sample. You specify the data set. Ver. and adding a new data point on the other side). The polynomial is then moved up by one point along the data set (by dropping an extreme point on one side. as well as in other textbooks. Anal. By doing this until the moving polynomial has slithered along the entire data set. where a slightly modified algorithm is needed. Soc. An advantage of this method is that we need not know the precise mathematical description of the curve.
noisefree) data set. and copy this instruction down (e. 12. y = ao + at x + a2x2. (2) First we will do the analysis manually.15. e. we will use peaks of unit height but of gradually diminishing halfwidths. 35. In this case. but that is not what is meant here. Exercise 3. . ELSauto. It is obviously impossible to determine the derivative of a set of individual data points. 12. Instead.1*(A41100)A2 +10) +10/ «A41200) A2+10). such as occur in.g. 3. its first derivative is immediately available as dy/dx = a\ + 2a2x. Column G will now contain the smoothed values.. you should see a graph such as Fig. in cell B4 the instruction =10/ (0.1.g. by clicking on its handle). It is sometimes necessary to compute the first or higher derivative of a function represented by experimental data points.Ch. We use a sliding 5point parabola.01*(A4900)A2+10)+10/(0.15. In order to illustrate the effects of distortion. but we will not worry about such details.e. because it provides smoothing to an algebraic expression. For example. There are ways to fill in those missing ends. A variant.. 3: Further linear least squares 137 choice therefore depends on the signaltonoise ratio. experimental data points are assumed to lie in the absence of noise. it can readily be modified to use the ratio of the highestorder orthogonal coefficient and its standard deviation as an alternative criterion. and 3. except for the first and last two points. The moving polynomial method fits the bill. In A4:A1403 deposit 1 (1) 1400. smoothing is usually necessary because differentiation is highly sensitive to noise. with as common divisor their algebraic sum. for which the convoluting integers are 3..1: (1) As test set we will use a set of four Lorentzian peaks. so that the order varies throughout the curve. when we smooth a data segment by fitting it to the quadratic expression y = ao + a\x + a2x2. using an Ftest. but also the corresponding first and second derivatives. ELS can therefore be used on equidistant data to determine not only the smoothed values of the sample. you can modify ELS to provide higher derivatives.001 * (A4550)A2+10)+10/(0. continuous curve on which the individual. Since it uses Gram polynomials. we desire the derivative of the reconstructed. to get an idea of how it works. Plot these data. which can then be differentiated. 17. which will represent our idealized (i. nuclear magnetic resonance spectra. (3) In cell G6 place the instruction = (3*B4+12*B5+17*B6+12*B73*B8) /35 and copy this instruction down through cell GI401. Again. automatically selects the order each time the polynomial is moved.
the length of the polynomial (say. 3. Figures 3. You should get a result such as in Fig.2 R.0 'f.7 shows results obtained with ELSoptimized.2 0.2 with = SUMXMY 2 (B 4 : B 14 0 3 . e.6 illustrate what you might obtain with ELSfixed.2 0. where the moving cubic is too long to represent the sharpest peaks in the graph. Call Iools ~ Macro ~ Macros. . and the order of the derivative (0 for smoothing).15. and then apply them to the data.2 1..8 1. and click on Run or Enter.2: The noisefree data set of Fig.0 0.2 200 0 400 600 800 1000 1200 1400 Fig. 3. its order (e. and repeat these analyses.15. de Levie.15. see Fig.112= 1. 35).0 0. 3.g.15.4.4 0. 3).8 0.1.138 1. compute L. (4) Having to look up the convoluting integers for a given polynomial length and order can be a bother. In the input boxes enter the location of the data set (B4:BI403). 1. To quantify the distortion. 3. only 7 data points. select ELSfixed. Advanced Excel for scientific data analysis I =t o 200 400 600 800 1000 1200 1400 Fig.0 0.6 0.5 and 3.6 0. Verify that you find less distortion with a shorter moving polynomial of.1: A set of four Lorentzian peaks of diminishing widths.15.4 0.33 0.2.2 0. The custom macro ELSfixed will compute these integers.14. 3. C4 : C14 03) . while Fig. and place the resulting value in the graph. 3.1 after smoothing with a 35point cubic.15.14.g.. (5) Now make the data set more realistic by adding Gaussian noise.
2 1.15.032 0. 1 0.05. 3.15.4 0.6 0.8 0.4 0. 3.2 0.15. obtained with a 7point moving quadratic (thin line) and. 1. 1 0.2 o 200 400 600 800 1000 1200 140(1 Fig.3 0. the separately computed true first derivative of the function.6 0.3: The first derivative of the data of Fig.1.1 with added Gaussian noise. 3.0 LL12= 3. The signal amplitude is 1.0 0. sn = 0.2 o 200 400 600 800 1000 1200 1400 Fig. The above examples demonstrate several points. 1.4 smoothed with a 7point moving line.2 1.2 139 LL12= 0.0 0.66 0.05 = 20.2 0.5: The data of Fig. so that the resulting signaltonoise ratio is 1 /0. 3.4: The data of Fig.Ch.3 o 200 400 600 800 1000 1200 1400 Fig. 3: Further linear least squares 0.66 0.2 0.15. for comparison.15. 3.15. 3. The few data points around the fourth peak contribute virtually all ofLL12.0 LL12= 0. .8 0.0 0.
g.2 0.2 0. The selfoptimizing ELSauto can negotiate a useful compromise between these two.15. even in the absence of any noise.4 smoothed with a 7point moving polynomial of variable order.2 o 200 400 600 1200 1400 Fig.15. 3. but the more the signal is distorted (see the line at x = 1200). e.2 0. Advanced Excel for scientific data analysis (1) When the sample length exceeds the characteristic width of the smallest feature you want to resolve. as illustrated in Fig..15. Figs. 3.15.15. (2) Comparison of. de Levie.6: The data of Fig.0 A""'~I""+ + 800 1000 0.15.6 0.6 illustrates that the lower the order of the smoothing polynomial.15. but you still must tell it what polynomiallength to use. 3.140 R.4 0. 3. 3.4 0.8 0.4 smoothed with a 7point moving quadratic.8.7: The data of Fig.6 0. you can also use ELSauto (or ELSfixed) for differentiation.2 o 200 400 600 800 1000 1200 1400 Fig.25 0. between 1 and 5.2 1. the more noise is removed.82 1. distortion will result. 1.7. 3.15. As can be seen in Fig.0 0.5 and 3. 1.8 0. . 3.0 I:~2= 0.0 I:~Z= 1.
1 0.. A second need for weighting arises when we use a transformation to make experimental data suitable for least squares analysis. Unfortunately. 3: Further linear least squares Fig. If we then fit the transformed data using least squares.15. In this particular example. 1 0. which is the expression for a straight line of In y vs. and how to handle such individual weights or weighting factors Wi in an analysis.~"""~"""'~"~"""~ 0. If we have sufficient replicates of each observation.~_\IIWII"""_". such information is seldom available. we minimize the sum of the residuals in In y rather than those in y.. 141 3. that courts supposedly deal with accuracy.4 computed with a 7point moving polynomial of variable order..16 Weighted least squares 0.8: The first derivative of the data of Fig.e. namely when the errors in y are relative ones. measurement repeatability. This is not unlike listening to witnesses in a court of law. or the electrical current following a stepwise voltage change in a resistorcapacitor circuit.g.2 0. we might. assign each measurement its proper individual weight. But when the experimental errors are absolute. i. 2. so that In y = In a .3 1 0. In that case we can put more weight on some of the data than on data obtained in a different region. Assigning weights to data requires that we know how much (relative) weight to allot to each measurement. In some cases that may well be correct. For instance. between 1 and 5. equal to the reciprocal of its variance.. the resulting fit will overemphasize the tail end of the data set.15. of course.3 o 200 400 600 800 1000 1200 1400 Sometimes we know that an instrument is noisier in one region of its range than in another. It is usual to 'rectify' such an expression by taking (natural) logarithms. data involving radioactive decay.2 t~62= 0. firstorder chemical kinetics. but least squares with precision. 3. 3. t.036 0.. the transformation . proportional to the magnitude of the signal y.0 ~. The difference is.bt.Ch. e.. all follow an exponential decay of the type y = a ebt where t denotes time.. and giving more credence to those that appear to be more trustworthy.. Wi = l/Vii = lis.
For example. WLS can also be used for unweighted least squares. Advanced Excel for scientific data analysis runs into additional trouble when the signal decays to the baseline. We would bias the result if we merely left out the logarithms of the negative data.142 R. de Levie. upon transforming the dependent parameter y into Y. in general.16. the data set must be truncated before the data become negative. We therefore provide a weighted least squares macro.1) ~y dy which is valid when ~y is sufficiently small. because by definition dlny == lim In(y + ~y) lny = lim ~lny (3. the experimental data will be scattered around zero. 1 w· = .~ (3.3) or. in which case Excel will not know how to take the corresponding logarithms. so that some of them may be negative.4) (dY /dy) so that the total weight will be the product of the individual and global weights. which in this case simply follows from ~lny dlny .16.2) dy ~y~O ~y ~y~O ~y Consequently we can assign each point of the transformed data set a global weight (3. Excel has no explicit.e. equation y=a/x+ b y y x lIx a w comments b . builtin facility to handle weighted least squares. The first of these problems can be avoided by using weighting. By leaving the second column empty (or filling it with ones).:. because the experimental noise will then make a number of observations negative..~(3. i. Because the spreadsheet cannot take the logarithms of negative numbers. 1 Wi = 2 (3. WLS.5) I (dY/dy)2 sf There may be additional problems with a transformation. that is similar to LS but has an additional column (between the columns for the dependent and independent parameters) for the weights Wi.16. when an exponential decays to zero.16.16.
16. 3.1: (1) As a first illustration of a weighted least squares analysis we again use the data set from D.y>O y>O #O. WLSO for curves that should go through the origin.2 list several functions that can be transformed into linear or quadratic form respectively. Exercise 3. Tables 3. y>O Iny Iny Inx In a + b21c In a+ b21c . Harvey. .1. . 3: Further linear least squares y= ax P + b y=alnx+b y=ax P y y yxq Iny Iny Iny xP a a a b b b p must be known x>O for noninteger p x>O 143 + bx q Inx xpq x lIx x lIx Inx xlnx (In x)lx x lIx I . . the associated sample standard deviations.. / / y y= lIy lIy x x I/a lIa 2bla 2bla c + b21a c + b 21a Table 3. .y>O x>O.16.16. the xvalues in column C (leav . As with LS.*O Table 3.y>O I y=ax+b x y=ax+b }'*O #O.16.Ch. as well as the corresponding covariance matrix. with the associated global weights. / / p and q must be known x>O for noninteger p or q y>O x>O.y>O x>O.1 and 3.16.2.y>O x>O. WLS comes in two flavors. y>O Cauchy distribution. Some equations that can be transformed into the quadratic form Y = ao + a1X + a2X2.16. equation y= a+ blx+ex y=ae(Xb)'/c y = ae(1nxb)'/c a (xb)2 +e a (x+b)2+e Y xy X x x ao e lie lie a1 a 2ble 2blc a] b w x 2 comments van Deemter eqn. .}. #0 Gauss distribution lognormal distribution x>O. and the associated global weights. WLS provides the bestfitting coefficients ao through am. .y>O Lorentz distribution. . Enter the yvalues in column A. and WLS 1 for curves that need not do so. with the added standard deviations Sj shown in column D of Fig.1: Some equations that can be transformed into the linear form Y = ao + a1X.?q y=bd y= ba l/x y= be ax Y = be a/x y=bx a y = bx ax y = bx a/x Ina Ina a a a a a a a Inb Inb Inb In b In b In b In b b b Iny Iny Iny Iny lIy lIy .
3.5 0. Then copy your data and now call WLS1. (3) Highlight the data in columns A through C and call WLSO. 59. (5) As in exercise 2.15) without a global transfonnation.42 Coeff: 2500 0 0.02 0. The results are shown in Fig. 3.9 1 59..1 24.044459 122. (2) In column B calculate the weights lis? according to (3.1716 0.07 0.2 O.42 9. and absent any contraindication.3 4 ..e. because in the latter case the standard deviation in the intercept is larger than the absolute value of the intercept. 13 0.33 t1 A y 0.1. and the standard deviations Si of the individual Yivalues in column D.9652 IDeII: 0.13 0. 3 35.33 6 7 8 9 10 StDev: 11 12 0.661 16 0. 144r8 III X 2 3 4 5 0 12.4 9.12.22 0.0 16 0.0 16 0.935897 0.36 24. . r 13 I. these data are best fitted to a proportionality rather than to a general line.6457_9 0. 1 2500 204.641 J 0.0..22 0.02 0. A Y 2 3 4 5 III X 6 7 8 9 \0 2500 0 0 12.007296 .144 R..66116 0. 1 2736 0.07 0.156195 CM: . _0. 17 16 20.79 20.02 0. Advanced Excel for scientific data analysis ing the B column free for the weights w).36 2500 0.1: Fitting a set of data with standard deviations Si of the individual Yivalues to a proportionality with WLSO (top panel) or to a straight line with WLSI (bottom panel).2 35.02 0.4 60.16. with Y = y so that dYldy = I. 0. 3 204. de Levie.:.1.~590_4J Fig. ~:053!2.5 oeff: 122. i.053/91.79 60.0 5417 0..9 1 48 .16. 1 2736 0.16.
contains error messages. Gaussian) noise in column N. and interfere with taking logarithms. Save the spreadsheet for later use. (2) fitting enzyme kinetic data. but usually associated with the names of Michaelis and Menten. However. In column B calculate the transfonned quantity Y= lny. the added noise can bring the signal below zero. 135 (1902) 916. If the dependent variable.17. The Excel function LogEst (for logarithmic estimator) is the logarithmic analog of LinEst.5 0 ± 0. first derived by Henri. and (3) fitting data to a Lorentzian curve.Ch.16. and in column C the weights/.250975 ± 0. For our example we find In ao = 2.e. Z.5 respectively.415494 ± 0. either by overwriting cells B21 :D22. Try again. 3. which can be compared with the assumed values of 10 and 0. (2) Highlight the data in columns B through D.023354.17 through 3. It automatically takes the logarithm of y.17 An exponential decay Exercise 3.17. using Iools => Qata Analysis => Random Number Generation. Vm is the maximum rate. No resounding success. but what can you expect from only 7 data points in the presence of considerable noise? (4) By changing the standard deviation s n ofthe noise.1. 49 (1913) 333. using noise from column N multiplied by some noise 'amplitude' sn. i. and in column A compute y = ao exp[ajx] + noise. 3. LogEst does not include any weighting.3 0 and aj = 0. .1. in the above example by highlighting range Bl2:D20 instead. 3: Further linear least squares 145 Sections 3..1 in cells B21 and B25. 3.e. with their standard deviations. and K is a constant. is S vm V=(3. Deposit 'nonnal' (i.1) K+s where v is the initial rate of the enzymatic conversion of the substrate S with concentration s.19 will illustrate the application of WLS to (1) an exponential decay of the formy = aebx . the macro will tell you so. Include a plot of your data. 3.023.1. Compt.17. 3.18. ao = 9. (3) Compare the values of ao = exp [In ao] and aj obtained with those used in the calculation ofy.031581 and aj = 0. and can therefore be used to fit an exponential. In y. In column D deposit some xvalues.415 ± 0. verify that the difference indeed vanishes in the absence of noise. and call the weighted least squares macro WLS 1. Biochem. and sign off. Rend. The custom macro will now provide values for both In ao and a]. or with message boxes that need to be acknowledged by pressing OK or Enter.1: (I) Start by making a test set. Your spreadsheet might now resemble Fig. range B12:D25 in Fig.17.18 Enzyme kinetics The simplest kinetic relation in enzyme kinetics. as can be seen in Fig. see the entry for y = b eQX in table 3.. As the exponential functiony = ao exp[ajx] approaches zero.
17.146 A II ~ R.678 1. 190 0 .154 0.005 0. Am. 3. Lineweaver & Burk.l8.008 Y noise aO = 10 al = 0.087 0.908 0.299 0.584 0. Hanes.057 2.2) which suggests a linear plot of slv vs.5 sn = 0.012 1.538 0. The text below the screenshot shows some of the instructions used. J.256 1.317 1. experimental data have been fitted to this equation after linearizing it.373 0 . Chern.444 0. Cells N14:N27 contain Gaussian ('normal') noise with zero mean and unit standard deviation.878 2. Biochem.1: Fitting a synthetic exponential decay with Gaussian noise using weighted least squares.585 3. s.000 0 2 3 4 5 6 7 8 9 10 II 12 13 cell: instruction: copied to: A13:A25 B13:B25 C13:C25 A12 =$F$12*EXP($F$13*D12)+$F$15*N12 B12 =LN(A12) e12 =A12"2 Fig.319 13.107 1. 26 (1932) 1406.l) to .885 43.002 0. Soc.981 0.707 0.392 0. rewrote (3.014 0.363 1. J.l8.827 0.003 0.055 0.439 0.618 3.495 0. inverted (3. de Levie.l8. Traditionally.347 0.745 4.137 0.l) as s K s =+(3.683 # UM! 0.739 2.499 0. 56 (1934) 658. Advanced Excel for scientific data analysis y 7 o o l 4 3 I 0 2 6 IV 8 X 10 12 X 14 Y 9.381 0.248 89.325 1. with slope 1/vm and intercept Klvm • In a similar vein.936 0.118 0.151 0.470 6.3 11 1.885 2 .032 0.856 1.008 2.950 1.341 #NUM! 0 .4 1.
lis would be linear. Leave column w blank. 80 (1961) 318. and we will therefore need the covariance matrices to estimate the resulting precision of K. w.560 0. It makes no difference for the pre .291 0.18. and lis.3) will yield lIvm and Klvm so that K must subse.. Data from Atkinson et aI. Since lIvm and K/vm are obtained from the same data analysis. and s. or (which is equivalent) specify the weights as 1.4) leads to w = v4 • Perform the analysis. and thereafter repeat for the Hanes method.220 0. as reported by Wilkinson.1 in the first two columns. are the experimental data: slmM v * 0.18.18.18.493 Table 3. Biochem.l).18. ao) and IIv m (the slope. so that (3. Now use WLS1 to analyze these data according to (3. using the covariance matrix (with results as shown in Fig.1.1: (l) Open a new spreadsheet.18. The two methods therefore yield somewhat different results for the very same experimental input data. then incorrectly. (3) Enter the appropriate weights in the columns for w.Ch. For the LineweaverBurk method. they will be correlated.uM 13 min 1 mg enzyme) of formation of nicotinamide adenine dinucleotide at pH = 4. Again compute Klvm and lIvm (which are now slope and intercept respectively) with their standard deviations. Both (3.1.18.460 0. Here. a\) with their respective standard deviations and the corresponding covariance matrix. and K as (Klvm )/(1/vm ). Exercise 3. slv.1 to illustrate this in exercise 3.3) =+v S vm vm so that a plot of I/v vs. then call Propagation to calculate the corresponding precisions.l6.l71 0. One final comment before we are ready for this exercise. where we have instead w = v4/. just using the standard deviations. with columns for s. and enter the data from table 3. Biochern. 3. For comparison do this in two ways: first correctly.. with slope Klvm and intercept lIvm • The usual experimental uncertainties are such that the initial reaction velocity v should be the dependent variable. d(l/v)/dv = 1/v2. then. J. then use this to analyze the data according to (3.390 1.18. because the transformations from v to either slv or lIv introduce a different bias to the residuals.234 0.l48 0. (2) Extend the spreadsheet by adding columns for lIv. and determine both Klvm (the intercept. 80 (1961) 324.18.18.1: The initial rate v (*: in. w. 3: Further linear least squares 147 1 K 1 (3. v. Verify with either LinEst or Regression that you indeed obtain the results for a nonweighted linear least squares analysis. 3. J.3).766 0. We will use the data in table 3. which uses lIv as its dependent variable. (4) Compute Vm as lI(lIvm).18. These are included in Fig.18.2).95 as a function ofthe concentration s of the corresponding mononucleotide.138 0. quently be computed as K = (Klvm) I (l/vm).2) and (3.324 0.
766 1.: 0.E02 1.545 1.932 4.29 1 0.E04 9.035 SI.: 5. 171 0. Dev.14. but does affect the standard deviation of K.. 728 1.: V III K Vallie: 0.: 0.076 VIII K 0. which depends on only one coefficient.035 29 30 31 32 33 Fig.560 0.E03 I.545 5.082 V III K Vallie: 0.02 CoefJ.: 0.1: The spreadsheet analyzing the enzyme kinetic data of Atkinson et aI. 13 0.Dev.E04 3.840 1.E02 2.E02 0. 3.436 1.13 0.493 6. (5) Your spreadsheet might now look like Fig.564 1.324 0.220 0.305 1.766 1. .: 0.274 3.038 18 19 S V exp LineweaverRurk weig hted 1I I1 w l is s/v Hanes weighted w .757 5. because we will return to it in section 4.De v.708 0. Biochem.Dev.Dev.220 0.964 0.560 0.585 SI. 80 (1961) 318 with unweighted and weighted linear least squares.246 0.560 27 0.052 0.060 0.964 0.471 0.840 Coeff.460 111M 0.460 111M 0.048 0.078 Vm K Value: 0.766 28 1.De .848 4.305 0.274 3. Advanced Excel for scientific data analysis cision estimate of V m .138 2.436 1. 753 I.850 1.028 Coeff.E02 0.1.: 0.460 0. 4 4.Dev.148 R.786 1.324 0.757 7.287 1. 02 0.246 4.29 1 0.786 3.220 0.932 1. 14 0.291 0. 02 0.091 0.961 Coeff. A B C LineweaverRlIrh lI111veighted E F Hanes IImveig hted s/v w H 1 2 3 4 V e.076 0.171 0.961 1.244 1.18.220 4.056 0.5 ImM 0.680 0.056 St.685 2.390 6. which depends on both intercept and slope.E02 3. de Levie.57 1 0.560 4.: 0.14 0.460 16 17 2.493 2.291 4.Dev. 3.685 0.728 1.tfl Jl v w l is s 1m 0.44 1 0.564 2.287 3.390 0.104 0.303 0.234 0.244 1.0 6 2.685 SI. 138 0.18.: 0.6 0 Vallie: Sf.E02 6.: S f.582 0.E02 0. J.766 3.234 0.460 SI.: 7.: 1. Save it.0 6 2.471 0.
56 (1934) 225) strongly emphasized the need for proper weighting. For the same peak height.1: (I) Open a new spreadsheet. enter column labels for Y. .16. a Lorentzian peak is much broader at its base than a Gaussian.1: K= 0. (for properly weighted LineweaverBurk and Hanes analysis).68 0 ± 0. the two methods give identical results for identical input data.19. table 3.68 5 ± 0. W = / in column B. c. and the numerical constants for a. and area. c. as can be seen in Fig.19 Fitting data to a Lorentzian The inherent shape of many optical absorption peaks is Lorentzian. b. W. Exercise 3.19. With such weighting. Also place a label for noise at the same row in column N.57 1 ± 0. it is described by an equation of the form y=(xb)2 +c a (3.1).. For Sn enter the numerical value O. then calculate the corresponding yvalues in column E using (3. a.03 5 . and c next to their labels. v I s = Vm I K . and of Scatchard. and y in A through E respectively.1) can be converted into a quadratic with the transformation Y = 1Iy = x 21a .. (5) Now complete the columns: calculate Y = lly in column A.4) this transformation must be given the weight w = / . But then. as well they should. Soc. it cannot be linearized. and the corresponding values Vm = 0.12 (for the unweighted LineweaverBurk analysis) vs. (2) Enter numerical constants for a.18.1) plus noise. Burk & Deming (J.e. For noise use So times the value in column N.K v I s.l) with a maximum at x = b.03 8 or 0. who reads the original papers any more? In the above discussion we have not considered two other wellknown linearizations of (3.06 4 . v = Vm ..2.16. and x 2 in column D.44 ± 0.19. b.2bxla + 2 (c+b Ia).. 3: Further linear least squares 149 In this particular example.582 ± 0. Chern.v I K. 0. (4) Enter xvalues in column C. i. (3) In column N enter Gaussian ('normal') noise of unit amplitude and zero mean. those of EadieHofstee.19. the unweighted LineweaverBurk analysis is significantly off. According to (3. but (3. Below those. 3. 3. halfwidth. But the failure of the unweighted LineweaverBurk analysis is not their fault: Lineweaver. and are therefore unsuitable for linear least squares analysis.Ch. with space at the top for a graph and for four constants. and s . because they contain the independent variable v on both sides of the equal sign. Am. cf. viz. x.18.069 (for unweighted Hanes analysis) or 0. b. and so. 0. The value for b should be well within the range of xvalues you will use.58 5 ± 0. As a threeparameter curve.104 vs.
whereas the transform Y = lIy is maximal away from the peak center. de Levie. fitted curve for Yreconstructed as lI(ao+ alx+ a2x2) with the coefficients Qo. highlight the data block in columns A through D.l9. When in column B you use unit weights instead.19. 3. and Q2 computed by WLS 1.1: Points calculated for a noisefree Lorentzian (filled circles) with Q = 1.02 b 0. and call WLSI again (because macros do not update automatically). 3.l. b = 11. you would have found a correct answer in the absence of noise. 3. for x'" b. 3.3. Q).l9.19. .2 o 0. You might now get something like Fig.1 (gray background band) plus Gaussian noise (filled circles). but nonsense even with a relatively low amount such as shown in Fig. and the curve fitted through them using the coefficients computed by using the macro WLS with appropriate weights. (9) Keep pushing the envelope.2 Fig.19. 3.2: Points calculated for the same Lorentzian as in Fig. 0.6 sn y = 0. where the signal/noise ratio is maximal.19. equivalent to an unweighted fit. by increasing the standard deviation Sn of the noise. Advanced Excel for scientific data analysis (6) In column F compute the resulting. and compare with Fig. and the curve fitted through them using the coefficients computed with the least squares macro WLS. 3. (8) Now change the value of Sn from 0 to a more realistic value (but still much smaller than 1). 3.19.2 6 12 18 x 24 Fig. and c = 2.2 6 12 x 24 0.19. 0. as in Fig.6 n =0 y a 0.2.2.150 R. Figure 3. This is because Y is maximal near the peak center.3 illustrates what will eventually happen: the fitted function Yreconstntcted will broaden and no longer serve any useful purpose. (10) Save your spreadsheet. where the signal is much smaller than the noise. (7) Ploty and Yreconstructed as a function of x.
in Fig.15. 3: Further linear least squares 151 The above illustrates the usefulness of testing realistic (i.l9. 3. By now you will know your way around the spreadsheet in order to solve them. see Fig. That helps for weak noise. Here. 3. since the measured quantity is pressure p while the fitted quantity is its natural logarithm. .15. then. 3. The fitted curve no longer provides a useful representation of the data..15 we indicated that a proper analysis of the data shown in Fig.3. 3. which they emphasize by converting noise that averages to zero into an always positive contribution.1 The bailingpaint a/water In section 2. but are wholly inappropriate for noise. 3. 0.2 0. 3.19. the analysis becomes overwhelmed by noise. 2.e.3.20 Miscellany After so many workedout examples here are a few that are just mentioned. What went wrong in Fig. 3.19. without much handholding.3? The weights w = l are indeed appropriate for the Lorentzian curve. This is why.20.2.2 but with a larger noise amplitude. and hold on to the spreadsheet: a more noiseresistant solution to this problem will be described in section 4.6 n y = 0.05 c • 0.1 requires weighting. and compare your results with those listed at the end of section 2. Use weighted least squares to analyze these data.3: The same as in Fig.Ch.19. because we cannot find weights that would be appropriate for both signal and noise.19. but is still not very robust. noisy) data. we clearly see the limit of usefulness of weighted least squares. but a small amount of added noise disrupts it unless we use the proper weighting.19. as illustrated in Fig.15. 3. The transfonn looks fine on paper.2 Fig. In p. But don't give up.
Since In P w is a more linear function of temperature t. Highlight the data set and call LSPoly 1.3 Fitting data to a higherorder polynomial While the foregoing examples all involved successful applications of least squares methods.itl. That is far more than can be understood by the change in v' (N~ P) from v'(82~6) = 8. then import the Notepad file into Excel. D.g. ed.20. the vapor pressure of water can also be represented as a lowerorder polynomial of the type P w = exp[aO+alt+a2r+ . determine its coefficients. the CRC Handbook of Chemistry and Physics. Compute In P w and express it as a power series in t. as reproduced in. This file contains 82 X.11 we used an algebraic expression for the temperaturedependence of the vapor pressure P w of water between 0 to 75°C. and indicates that the sum of squares of the residuals increases as the polynomial order becomes higher! That can only be the result of computational errors.2 The vapor pressure o/water In section 2. the coefficients of the tenth order polynomial trace a curve that does not even come close. . Go to the web at the above address.dat. then calculate P w = exp [In P w] and display the resulting residuals.. de Levie. And while the parameters of the sixthorder polynomial yield a crude approximation to the data.dat. Especially with polynomials of order 6 or higher.20. As the order increases. and plot the resulting residuals. gov/div898/strdllls/datalLINKS/DATAlFilip..72 tov'(82~10) = 8. 3. Advanced Excel for scientific data analysis 3. ].Y data pairs. specifically Filip. It is always prndent to check the answer. save it as a Notepad. but are readily represented in terms of a power series in t.. the value of Sy goes through a fairly deep minimum: from Sy = 0. to be fitted to a tenthorder polynomial in x. truncation errors in the numerical manipUlations can sometimes hijack the computation. Chemical Rubber Co.nist.006 for a sixth order polynomial to a 50 times larger value for a tenth order polynomial. As our specific example we will use a file from the Statistical Reference Dataset. 2000/2001.. Use LSPolyl to find the optimal order of such a series. which is accessible at www. 81 st edition.152 R. Lide. 610. we will now illustrate an unsuccessful one: fitting a data set with relatively little structure to a highorder polynomial. copy it. and yield nonsensical results. These already smoothed data clearly don't fit a straight line. Use tabulated values for Pw as a function of temperature t from the 1984 NBS/NRC Steam Tables. e.49. R. highlight the page. p.
0. 3.8 0. 1 2 r'l~ 8 6 4 .397 1X l + 1.598lx 3  1.6444 1. Of the least squares routines we have encountered so far.0. 3. 4. 3: Further linear least squares 153 You might suspect a custom macro such as LSPoly. The results obtained with Ortho 1 suggest that none of the coefficients is statistically significant.0 .34 7X4 . or 6 respectively. This can be seen in the results obtained with Trendline which..9899 ~ ~UR .8 f:pU!~ 10 ~'I"" 6 R = 0. a con .1 1. but the Excelprovided least squares routines yield equivalent answers.1 I l.0. specifically one that keeps truncation errors to a minimum by avoiding matrix inversions. mercifully.973 2 4!iiifi11'2' + 0.0242x5 .0 0.0492x 3 + 0.8 0.0007X6 .Ch..7 8 4 2 y = 0.e.098 1. see Fig.3744x + 2. clearly reflecting a similar algorithm.1 + 10.1: Polynomial fits obtained with Trendline for the Filip data set for polynomial order 2.1. that the polynomial model used is really no good.7 10 Fig.7 8 6 4 2 y = 0. 1 + 10 1.9 0..0 0.297x .0022x4 + 0.0. only goes up to sixth order. i. only Ortho uses a significantly different algorithm.J .20.9 R2= 0.2.577x 2  22.20.9 0.
The takehome lesson: always check whether the model used and the resulting answers make sense.1.20.2: Tenthorder polynomial fits of the Filip data set (large open circles) as obtained with LSPolyl (gray filled circles) and with Orthol (small black solid circles). Ortho comes closest to fitting the data satisfactorily to a tenthorder polynomial. It provides a convenient way to fit data to a polynomial or a multiparameter function. NIST lists a set of 11 coefficients. more traditional least squares routine does in this case. that are all about 5 times their standard deviations.l 9 8 5 4 3 Fig. The model may be inefficient or unsuitable.20.2. the more such errors can accumulate.20. require a computation with much more than double precision. 3.154 R. and that indeed can be used to fit the data set quite well. however. Because the computer takes care of the mathematical manipulations. 3. which also shows how poorly the other. 3. Since Ortho uses a different. with Sy = 0. The more complicated the calculation. 3. simpler algorithm. as in Fig. the least squares method has become very easy to apply.0033. and has therefore become ubiquitous in many fields of science and technology.21 Summary In this chapter we have encountered applications of linear least squares methods beyond those fitting a proportionality or a straight line. it can serve as a useful first verification tool for critical results. . or to any other function that can be transformed into one of these. ao through alO. Advanced Excel for scientific data analysis clusion you might have reached simply by looking at a graph of the data. Yet. l. in which case weighting may be required to correct for the bias introduced by the transformation. Still. as illustrated in Fig. Such results. de Levie. the algorithm may be mathematically correct yet may introduce enough numerical errors to invalidate the results.
and differentiation. the likely errors resulting from incorrectly assuming a single Gaussian distribution for small data sets will usually be rather inconsequential. which often can only be verified or falsified with large data sets.1 illustrates but one of the many ways in which incorrect answers can be obtained by thoughtless application of least squares. we will encounter two distinct Gaussian distributions. in its entirety. ***** . the convenience of least squares methods should not lead to carelessness. 3. The latter is a useful working assumption. 3: Further linear least squares 155 The least squares method furnishes estimates of the precision of its results.g.18. 1982). The least squares algorithm can be simplified greatly when applied to equidistant data. and disregarded. unless there is evidence to the contrary. Of course. and around 2. as well as maximal noise reduction. The example of enzyme kinetics summarized in Fig. interpolation. The probability of such oscillatory behavior between data points increases with higher polynomial order and with larger data spacing. When theoretical guidance is absent. solid copper pennies. e. It is therefore customary to assume a single Gaussian distribution of errors. are good reasons to favor the lowest possible polynomial order in fitting experimental data. which typically have a mass of about 2. An example is the weight of US pennies. thereby making them practical for use in 'sliding' polynomials for smoothing. For that same reason. If our measurements include only one old penny. This. such as oscillations between the points used for determining the parameters. it might easily be considered an outlier. warning us of potential problems.5 g. Outliers are often the canaries in the coal mine. even if the data set has no known theoretical description and. Such evidence may be in the form of 'outliers' that may. And they are often not quite as obvious as those described by Darrell Hill in his delightful book on How to Lie with Statistics (Norton..Ch.5 g for the zinc ones. indicate the presence of more than one distribution. and they can be particularly troublesome for interpolation or differentiation. does not fit a polynomial at all. you can readily prove to yourself (by trying it out on a spreadsheet) that use of terms higher than necessary can lead to problems. But if our sample includes pennies from before 1982. based on the (usually tacit) assumptions that the noise is random and can be described adequately by a single Gaussian distribution. But in doing so we merely create a semblance of orderliness where none exists. when they were still made of copper rather than of zinc with a copper coating. there are ways to determine whether higherorder terms are statistically significant. around 3 g for the older. Even though we have not illustrated this here.
There is another. say. Remember that all that glitters . and we seek to answer a different question: not whether a linear relation exists. a quadratic or cubic fit will do. a straight line. but not in terms of the underlying assumptions about the nature of the experimental fluctuations (random or biased. We already commented on this in section 2. That can be a valid question when one needs to answer whether cigarette smoking causes lung cancer (as it does). ***** In this and the previous chapter we encountered the covariance and its companion. The optimal solution to this dilemma is to collect a sufficiently large number of closely spaced data. in the limit of infinitesimally small spacing. which specify the interrelatedness of specific parameters obtained by least squares. everything is linear. but specifically what that relationship is. more prevalent but often incorrect application of the linear correlation coefficient. In the usual application of the linear correlation coefficient as a measure of goodness of fit for. or whether living close to power lines causes leukemia (it does not). that correlation is between a straight line y = ao + a\x and a straight line x = bo + bJ)'. and that. Advanced Excel for scientific data analysis On the other hand. and/or weighting factors to use in fitting the data). in which it is used as a measure of goodness of a least squares fit. For such applications. This assumes that we are not yet concerned with any details of the specific numerical relation between x and y. the linear correlation coefficient. However.20. in the type of data analysis we emphasize in this book. the linear correlation coefficient or its square are poor (and inappropriate) gauges. polynomial order. the existence of a causal relationship can be assumed. for slightly larger spacing. In this case the working assumption is that.10. loworder polynomials can introduce systematic distortion if the underlying signal has sharp features.156 R. ***** It is sometimes assumed that least squares methods are inherently ob jective. following a single or multiple distribution. but ask the more basic question whether x and yare (linearly) correlated at all. such as the slope and intercept of a straight line. Least squares analysis can be misapplied just as easily as any other method. assumed to be either Gaussian or other) or in terms of the various choices made by the experimenter (which variable to take as the dependent one. and to analyze them with a relatively short. but the point bears repeating. or what equation. and therefore answers the question whether there is a linear correlation between x and y.3). loworder moving polynomial. de Levie. This is true insofar as their numbercrunching aspects are concerned (apart from such numerical errors as encountered in section 3.
1966. Nonlinear least squares may even be preferable in cases where one has a choice between linear and nonlinear least squares.22 For further reading There are many books on linear least squares methods and on multivariate analysis. a nonlinear least squares analysis is possible as long as we have an analytical expression (or find an approximate one) to which the experimental data can be fitted. is Applied Regression Analysis by N. and that fool's gold (FeS2) is far more abundant than the more precious commodity it resembles (Au). and used in most computerbased least squares computer routines. including those provided by Excel and in all but one of the custom macros of the MacroBundle. 1981. Nonlinear least squares will be described in the next chapter.18. R. 3. Draper & H. (Wiley. A very good text. 1998). Smith. although not that many that bridge the gap between the introductory texts and the specialist books written for statisticians. It uses the powerful (but initially perhaps somewhat forbidding) matrix formalism now standard among statisticians. more advanced than the ones already listed in section 2.Ch. For the many functions that cannot be fitted to a polynomial or be transformed into one. . 3: Further linear least squares 157 isn't necessarily gold.
g. in section 4. when the term a3 exp(z5) in the above expression for l is replaced by exp(a3z5). and the noise is not overwhelmingly large.Chapter 4 Nonlinear least squares In the previous chapters we have encountered the linear least squares method. and X3 = exp(z5).. X 2 = log(x). We compute the sum of squares of the residuals. usually welldefined results. Y = ao + alXl + a2X2 + a~3. so called because it fits data to equations that are linear functions of their adjustable coefficients ai. one of the best checks on the reasonableness of our fit is to plot the residuals. Xl = x 2. . we usually end up with a similar answer. Where a direct comparison with linear least squares is possible. even though in some of our exercises we will again simulate it. then minimize that parameter by adjusting the numerical coefficients used in the model. Linear least squares methods use algorithms that lead to singular. as in l = ao + alx2 + a210g(x) + a3 exp(z5). there are many problems to which linear least squares analysis cannot be applied. or some other appropriate single parameter.g. The only least squares methods then applicable are nonlinear ones. SSR. there is no guarantee that we will necessarily find the 'best' fit for the model assumptions made. Unlike the case of linear least squares. In a nonlinear least squares method. as long as they are linear in the coefficients ai. which will be discussed in the present chapter. as in this example. e. as long as the initial (guessed) values of the coefficients are fairly close to their final values. Again. we can usually find a set of coefficients to provide a reasonably close fit to the experimental data. one compares a given data set (which we will here call the experimental one.. We will also encounter a few examples (e. Unfortunately.15) in which it may be preferable to use nonlinear least squares analysis even though a linear least squares analysis is feasible. This is readily seen by substituting Y = l. Note that the equations themselves can be highly nonlinear in the dependent and independent parameters. for lack of real experimental data) with a model expression that depends on one or more numerical parameters. If the model is appropriate. which converts the above expression into the standard form of multiparameter linear least squares.
As provided by Excel.1. such as Use Automatic Scaling and Show Iteration Results. e.2). Q. There is no known transformation of the Planck equation to allow a linear least squares analysis of these data. Allan Waren. J. Black body radiation is described by the Planck equation. Levenberg.6.g. 2 (1944) 164 and implemented by D.9. Our next. SolverAid can also furnish the covariance matrix or its scaled version. a powerful and convenient addin based on the algorithm proposed by K. acidbase titrations. but its code is provided by Frontline Systems (www. .1). both in its standalone version and as part of the Microsoft Office bundle. as occur in. nonlinear least squares methods use Solver. (2. Solver is provided with Excel. but fortunately that is no requirement for nonlinear least squares.. the usual circumstance for using a nonlinear least squares analysis. more downtoearth examples will involve fitting a molecular potential energydistance profile. In section 4. radiation believed to be a relic of the big bang. and Dan Fylstra. In Excel. the linear correlation coefficient matrix. This can be remedied by subsequently running a custom macro. The first group of examples. Solver has one major deficiency: it yields results without any associated uncertainty estimates. which will reconstruct the standard deviations based on (2. This satellite measured the socalled black body radiation in interstellar space. and further refined by Leon Lasdon. or (2. famous because its derivation heralded the birth of quantum theory. Appl. in sections 4. These options are explained in the Excel Help files. SIAM 11 (1963) 431. The sophistication of a nonlinear least squares method lies in how efficiently it adjusts the model parameters.19. Marquardt.8.9. John Watson.1). It can be found in the lools submenu.Ch. and it may still have to be installed in case a minimal version of the software was installed originally. spectra and chromatograms. and apart from its basic function it has several useful Options. Sections 4.6 will deal with curves that contain several peaks. The way SolverAid does all this is detailed in section 9.5 and 4. and phosphorescence decay. Math.1 we will analyze data from the cosmic microwave background radiation detector aboard the COBE (Cosmic Background Explorer) satellite.frontsys. W. will illustrate the power of Solver to fit data to expressions for which there are no easy or known ways to transform them into a polynomial with integer powers.com).1 through 4. and some of them are illustrated in exercise 4. 4: Nonlinear least squares 159 because that graph can often reveal the presence of systematic deviations. SolverAid. and deposit them next to the parameters found by Solver if that space is available.9.
Imagine that you know how to multiply. (ba 3f (2) In cell Bl deposit the number 3. taking the cube root Vb to find a. say. (2) get a machine or algorithm that will do the trick. You should of course reset the value of a every time. otherwise Solver may just nod and leave well enough alone. for serious computations with Solver. (3) Call Solver. with or without Assume NonNegative. Several approaches come to mind: (1) learn about cube roots and how to calculate them. It might better be called 'Integer tolerance'. but Assume NonNegative can be.recision to establish that it defines how tightly your condition must be met. and SR= where SR stands for Square of the Residual.1 and in cell B3 the modified instruction =B2 *B2. and Search are best left alone.0. Solver will stop and consider its job done. (7) Assume Linear Model is seldom useful. Derivatives. if that is less than the set amount. or (3) use a trialanderror method based on Solver. i. 8.9 and 4. a=. (9) Show Iteration Results can be instructive.3 and 4. Max lime and Iterations are selfexplanatory and.e. It may remind you ofthe need to have good initial estimates. it works indeed. As you can see. (6) Conyergence determines the amount of relative change in the last five iterations. in cell B3 the instruction =B2 *B2 *B2. deposit in cells Al:A4 the labels b=. (5) Tolj!rance only works when you use integer constraints. aaa=. as will be done in sections 4. as already illustrated in sections 4. Advanced Excel for scientific data analysis As an introduction to using Solver we will first use it in a simple example of reverseengineering a complex mathematical operation. because it shows snapshots of what happens during the iterations. correct to 8 significant digits. Sj!t Target Cell to B4. (10) Estimates. so that you can more readily see how good the answer is. then let Solver adjust the value of 3 a in order to minimize the difference between band a • Exercise 4. Change the value of b to the cube of an integer number. Try various values to see how it works. and therefore how to compute b = a 3 = a x a x a. but that you do not know how to perform its reverse operation. a = 3.11. This is equivalent to taking their logarithms. . Here is how you might do that: take b and some (preferably fairly close) estimate of a.160 R. Bingo: the cube root of 3 will appear in cell B2. final answer.. . anyway. Then try it with a = 3. although we hardly advocate this as your method of choice for cuberooting. you should always try Solver again to make sure it gave you its best. calculate a3 . (8) lIse Automatic Scaling is useful when the adjustable parameters are of different orders of magnitude. Then play with £. and press OK. inconsequential because.fur Changing Cells B2. See what happens when you start with b = 9.1. Equal To Min.1: (1) In a new spreadsheet. and in cell B4 the instruction = (BlB3) * (BlB3) . de Levie. (4) This gives us a convenient chance to look into the Options of Solver.4. in cell B2 the number 1.
use different starting values for a and b to get Solver past what may be a 'false' (i.1. (4) In cell G5 compute the sum of the squares of the residuals R = ~ = (BexpBmodel). c is the speed of light in vacuum. 4.1) as B= av e bv 1 where a and b are adjustable parameters. such as a = I and b = I. using assumed values for a and b located in. J Chern. 4: Nonlinear least squares 161 4. then press Solve or Enter. 78 (2001) 215. In the window to Set target cell enter the address of SSR.2).1.2) Exercise 4.e.1: (1) Open a spreadsheet. In terms of fitting experimental brightness data B as a function of wavenumber v = v I c we can rewrite (4. In the example of Fig. 2hv3 B = 2 h IkT (4. cells G3 and G4 respectively.. (5) Call Solver with Iools => Solyer..1. repeat Solver with the new values for a and b.. (3) Plot both Bexp and Bmodel vs.Ch. specify Equal to Min.. i.1 Cosmic microwave background radiation We will use the spectral data given by S. and specify its space as G6:H7. and enter the wavenumber iI and experimental brightness Bexp in columns A and B. and the column containing Yealc is C18:C60.1) c (e V_I) where B denotes the optical brightness. h is Planck's constant. . If it is poor.1. the sum of squares of the residuals in G5. and adjust the values of a and b so that both curves are visible in the same plot. then fit these data to Planck's expression for black body radiation. e. starting in row 18.e. SSR (for the Sum of Squares of the Residuals) = L ~2 = L (B expBmodelf This is most readily done with the command =SUMXMY2 () . ii. and in the window fu' changing cells enter the addresses of the parameters a and b.1. (2) In column C compute B model according to (4. leave room at its top for a graph. If that doesn't work.1. Bluestone. Accept the answer by checking the Keep Solver solution and press OK. v is the frequency of the light.1. (6) Now call the macro SolverAid to get estimates of the precision of your results. 3 (4. and T is the absolute temperature.g. Do request to see the covariance matrix. Look at the fit in the graph. as reproduced in table 4. e. local rather than global) minimum. Educ. where the argument contains the address ranges for Bexp and Bmodel respectively.g. the parameters determined by Solver are located in G3 :G4.1.
0587 0. (9) Also verify that Vab 2 has almost the same value as the product VaaVbb.88 16.44 10. This column lists the relative weights Wi that should be assigned to each individual brightness B i • We now modify the spreadsheet in order to accommodate these weights. indeed.87 21.3136 0. showing that an error in a causes a corresponding error in b.50 2.97 Bexp 11018 2. Advanced Excel for scientific data analysis Bexp 11018 2.1.1003 0. What this means is readily illustrated by setting a to a different value.91 5.94 8.07 2.1.0846 0.89 1.5551 3.09 3.1438 1.04 0.07 1. likewise.59 0.4672 1.3773 3.62 9.1396 0.42 20.1.8438 1.1185 0.53 9.71 8.89 2.9369 3. Vaa = S/ (va is found in cell G6.7648 0. 4.54 4.1).0552 1.I 8.7025 3.8477 3.35 6.45 5.27 2. indicating that a and b are highly correlated. but that Vab (in cell G7) is not given by (2.25 12.08 4. Sa in cell H3) and.1. Notice that SolverAid places the standard deviations in a and b to the right of these parameters. but the information in the third column should not be neglected. Educ.08 9.10.9.1.48 1.72 3.2858 3.43 15.5749 0.1: The cosmic background radiation data from COBE as summarized by S. and vice versa.98 0.27 1.92 0.34 16.63 4.90 6.77 2.8269 3.01 Table 4.10 3. here labeled SSR.162 i7 /cm.77 2.2721 2.89 11.81 7. de Levie.0717 0.98 10.89 1.60 1.8027 3.92 6.34 11.I 2.52 14.89 1. In this initial attempt to fit these data we have only considered the first two columns in table 4. see (2.27 3. Vbb = Sb 2 . which applies specifically to a straight line.8771 0. and the covariances Vab = Vba in the offdiagonal10cations.13 0.27 2.71 13.16 13.5503 3.6631 0.5003 2.4965 Wrel /103 8.0110 2.61 14.18 3.94 5. Bluestone inJ.82 0. 78 (2001) 215.1945 0.26 6.17 R.26 10.89 2.7316 3. and the standard deviation of the overall fit to the right of the sum of squares of the residuals.1752 2.24 17.91 3.06 19.4957 2.07 2.7281 2.4). and then using Solver to adjust only b. Verify that.80 12.I 15.09 3.2684 0. (8) The covariance matrix in G6:H7 contains the variances Vaa and Vbb on its (topleft to bottomright) diagonal.15 18.10 2. .11 0.0459 Wrel 1103 2.3669 0.4265 0.33 0.77 1.9535 Wrel 1103 5.00 i7 /cm.33 Bexp 11018 0.2973 1.51 19.97 20.0019 0.61 19.6488 1.09 2.77 i7 /cm.2287 0.07 14.79 17.99 5. Chern.70 18.07 2.26 7. (7) Your spreadsheet may now look like that shown in Fig.28 1.1657 0.10 5.
Note that v should read ii..08E09 1.52741 6.73 19 26 4..=..8477 3..5002 2..1. The appropriate criterion to be minimized is now I.. 28E04 1..27E05 SSR = 2.18 2.73 16 2. 4: Nonlinear least squares A 163 Cosmic background radiation fitted to Planck' 4 .8023 Fig..8027 3.08 4. which Excel cannot display...72 3. eM : I 4.. 4.0110 2..54 2.1: The top of the spreadsheet of exercise 4..39730 .. which is here displayed separately on the spreadsheet. we use an additional column to compute it.5503 3.1 after using Solver and SolverAid for an unweighted nonlinear least squares analysis of the COBE data.2E05 7. Moreover.0110 2.8272 3.8478 3.w(BexpBmodelf Because Excel does not have a conveniently compact function (analogous to SUMXMY2 (rangel.Ch.1.285 3.5003 2.<:..90 3.39£05 2.37E095.I 8 15 20 25 wavenumber I em" Bexp /I E.99 5. 18 3.9368 copied to: cell: instruction: 1 = G 3*(AI8" 3) f(EXP( G 4*AI8)I ) 19:C60 5 = MXMY2(B 18:B60 C I8:C60) 3.9369 3.. range2)) to compute such a weighted sum of squares of residuals.I 5 10 BII/odel /I .... but could have been included directly in the fonnula used for SwSR. .w~2 (denoted on the spreadsheet by SwSR) through division by the nonnalizing factor (I... we will nonnalize the relative weights used in I. unweighted: a = 0.45 5.63 4.27 2... for the sake of comparing the results obtained with Ullweighted and weighted nonlinear least squares.8269 3.37E09 I I 1...IO I I I O +_r~+_~~~~ o v fern. 15E. 3 b = 0.w)/N.
9 2.27E03 6.936 3.8272 3. weighting makes only a relatively minor improvement. unweig llted: a = 0.7319 3.e.15E10 I weighted: a = 0. Exercise 4.8477 3.2: The top of the final fonn of the spreadsheet of exercise 4.8023 5.4E07 T = 2.21£08 4.8E06 6..91 Fig._ 1.45 5.0E07 25 26 2.5002 2.07 2. mostly on the uncertainty.. 10 2..01 \0 2. In this particular case.21£06 SwSR= 1...7316 3.w)! = 2.. i.39729 1.6E06 1._ ! .1.:::.27£05 7.1 (continued): (10) Enter the relative weights w listed in table 4. 03~: I.25£05 2.728312 3. Advanced Excel for scientific data analysis Cosmic background radiation fitted to Planck's law 4 .40£29 4.52741 SSR = 4.1 in column D.77 1. SwSR..E ·c .::.164 A R.t:> .75£05 b = 0.5003 2.0 11 0 2. after using Solver and SolverAid for a weighted nonlinear least squares analysis of the COBE data.6E08 2.5E09 value t.2 = W (BexpBmodeli. Engage Solver again (now using Gil as the target cell.09 3.:nE=09. where the data clearly fit the model very well and are virtually noisefree.1.i I . 4.99 5.0E08 II = 6.9331 5 10 15 _ _ _wa"enumber I em· __ _ _ _ __ 1 20 25 . so that cells G14 and GIl will contain the instructions =AVERAGE (D18: D60) and =SUM(E18: E60) /G14 respectively..08 4.0EII b = 5. 4.. .27 2. and recalculate the values for a and b.aE091 .Q.20£41 c = 3.63E34 5.()3E= 1 . del'.00E+08 1 2.1 0 I I .28£04 .8027 2. o.1.09 3. w D.52741 6.. and G9:G 10 as the adjustable parameters)..J v 1mI Bexp Bmodel IV IlE.38E23 2.18 3.7E07 ~~ 2.39730 b = 0.60 1.18 /I E18 I IE3 IVRR IIE39 I.54 4. ~ .051::= LO 1.72 3. 3 i12 .c CM: ii.1..1. .27 3.8E08 k = 1.9369 3.2 58 3.2858 3.8269 3.5503 3. or wSR for short. In cell G14 calculate (Lw)/N.. I o +++~~~~ o l.5502 3..9E08 5.. as NLW(BexpBmodel)2/Lw.37E09 5...89 2.63 4. and in column E calculate w times the square of the residuals. de Levie. and in cell Gll compute the weighted sum of squares of the residuals.99£04 CM: r 3. (II) Let the expressions now refer to a and b in cells G9 and G 10 respectively.847 3.~5E:~I_. Again please read v for the column heading v.39£05 2.
J.1) in a spreadsheet.1. Note that the values for y. the fine structure of the cosmic background radiation can now be observed.2.1 we find T = (2. Enter the labels and guess values for a and b. as an increasing function of r. s.. 4. 32 (1960) 738 (reproduced in table 4. D. Venna. and r will appear twice.2. Adjust the guessed values to achieve a crude fit with the experimental data.5 instead.2.3 and 3. use re = 2.0000005 2) X 1034 J . sI. b. Your spreadsheet might now look like Fig.99792458 X 108 m . For r in Angstrom units.e. Here we have used the Propagation macro to estimate the propagated imprecision in T = hc/kb. (3) Compute SSR using the function SUMXMY2 (). and SolverAid for uncertainty estimates. Such visually adjusted initial guess values might be a = 12000.Ch.02. and one for the distance r. (4) Another often used fitting function is that of LennardJones. distance profile Spectroscopic measurements such as those described in sections 3. which has the fonn VCr) = 4a{(b/r)12(b/r)6}.3806503 ± 0. 4: Nonlinear least squares 165 (12) Comparison of equations (4. and analyze them in exercise 4. and k = (1. and r" and call SolverAid to compute the precision. U(r).1) and (4. Note that this computation only requires b. Phys. Verma in J. inverse wave numbers) to meters.2. Phys. after division of b by 100 to convert it from measuring centimeters (i. r. b = 0. for the ground state of iodine. Because these measurements allow determinations of the temperature of interstellar space with a precision of the order of 104 OK.2) shows that b = hC/kT. add these data to the plot. and calculate the corresponding sum of squares of the residuals. 32 (1960) 738.1. c = 2. Exercise 4. compute U(r) according to the LennardJones fonnula in the next column.distance profile (often called the potential energy curve) of a diatomic molecule.1) r has been entered in picometers. once for rmin and once for rmax' Plot the data as VCr) vs.7283 b ± 0.2 The 12 potential energy vs. This can then be used to test various model expressions for such a profile. As our example we will use a set of data reported by R. add a column in which you compute VMorse on the basis of these data. one for the energy U(r). Use three columns: one for the vibrational quantum number Y. and re = 250 when (as in Fig. Then call Solver to optimize the fit.0000024 ) x 1023 J .62606876 ±0. (2) Enter labels and initial guess values for fitting VCr) to the Morse function U(r) = a{lexp[b(rre)]}2.6 can be used to determine the potential energy . then call Solver to adjust the values of a. °K. 12• We reproduce these data in Fig. . Chern. You can use this to calculate the background temperature T of the universe! With the numerical values h = (6.2.1.1. 4.00003 2) OK for the temperature of interstellar space. 4. and to estimate the equilibrium distance re and the dissociation energy De.1: (1) Enter the data of table VII ofR. This will be done by the Microwave Anisotropy Probe that was launched by NASA in 2001.2. Chern. and plot the resulting curve. D. 4. table VII.1. so that there is no need to use the covariance matrix.
02 12476.7 1 11 4 14.80 12 178 . 12 10256.5 12375 .5 230.8 228.1 232 .73 7882.06 94 13.9 12431.80 12276.8 233.2.6 1 12 178 .80 12276.8 228. 3 11 794 .54 12826.176 771.9 228.02 111 36.42 12352.72 11 824.54 12826.77 12045.5 12276.50 11 562 .2 SSR = 3.1a: The top of the spreadsheet of exercise 4.1 9 255.7 0.30 8859.32 12509.0 229. Also shown is the value of the equilibrium distance calculated for the LennardJones model as req = b ifi" and the associated precision computed with Propagation.9 23 1.9 1256 1. 228.82 12495.52 12039. 13 11 546.4 r pm 228 . and the results of Solver and SolverAid.54 12826.6 b = 0.157 0.45 8474.44 150 0.0 19899 r eq = 236 .6 234. 4.54 12826.62 12427. 0 7260.6 10 U(r) /em 12547.61 12081.1. 12 12393.98 U(LJ) pm 12826.1 3 67 13.54 11 12 13 30 31 32 33 34 11 2 107 105 103 10 1 99 97 95 93 91 89 85 2 78 74 70 66 62 5 54 50 46 42 38 34 30 Fig.67 9908.58 12375.71 12693.9 229. 228 .71 12561.22 12522. 17 4904.32 12302.8 228. 1 230. 228.2 229.69 11 60 1.72 10774. de Levie.94 107 12. 1 229.58 12375 .14 r eq = 257.54 12826.33 12301. 14 06 v 67. 2 8295.42 7543.000162 0.54 10490.8 237.4 229. 11 11 25 1.2.93 9602.6 235 .58 12375. 1 LennardJones: a = 12385 b = 229.58 111 82.342 SSR = 4.34E+07 U(MJ pm 12375 .0 229.9 228 . 11 9034. 19 9559.98 11791. with the distances r in pm.6 229.92 12454.23 7 252.89 5793.71 12693 .8 230. 11 5955. 5 232 .58 J2375. Advanced Excel for scientific data analysis 1 2 3 M orse: a = 12666.00 119 3.45 660 .52 12532.72 10045.49 8980.24 6748 .24 10893 .32 12 170. 1 10421. 11 756 1.54 12693.93 593 0. .166 A R.
12 12192. 4: Nonlinear least squares 26 22 18 14 10 6 2 0 0 2 6 10 14 18 22 26 30 34 38 42 46 50 54 58 62 66 70 74 78 82 85 89 91 93 95 97 99 101 103 105 107 112 5226.24 11562.0 256.3 410.60 527.26 10629.03 8886.59 12660.77 6269.08 102.1 397.80 11355.Ch.44 11696.05 852.0 386.76 1833.8 601.08 527.97 12666.25 1848.82 2179.82 3747.0 626.37 142.36 7677.4 251.61 6564.3 242.48 11666.07 7239.75 5738.03 2294.3 244.34 8104.9 345.4 294.57 12663.43 6761.02 12592.3 535.70 167 44 45 46 47 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 Fig. .59 4514.6 240.3 444.69 5385.05 11219.1b: The remainder of that spreadsheet.54 1531.65 277.36 981.52 12302.7 359.2 682.46 12144.45 9559.43 3425.46 4.32 12532.68 3409. 4.0 287.73 9034.49 3747.03 4096.06 2973.45 7882.72 12170.06 12233.50 12039.11 10045.67 10117.94 297.7 279.13 12636.4 556.28 2973.28 4498.6 881.44 102.20 11476.54 11251.8 426.03 9628.2 513.7 327.73 11996.3 577.52 4006.6 247.3 652.01 12314.65 9977.3 332.97 4286.70 12492.0 300.98 5976.05 8176.48 10560.16 974.19 4765.4 321.51 9656.12 12454.66 9178.72 10893.52 3770.93 7260.58 10315.2.03 12554.98 7.1 305.06 12648.09 12369.94 7114.76 3138.45 11991.22 5146.1 376.02 12509.03 11897.32 12393.8 494.22 12547.42 12427.63 4498.02 10938.92 12495.26 12193.63 5930.4 5125.62 12476.06 1363.8 462.13 9267.44 1363.80 8474.9 338.47 12292.09 12082.4 262.03 12335.1 367.11 11824.90 8676.63 7660.58 307.3 723.12 2929.15 2641.6 311.65 8504.19 10490.30 10965.31 12266.49 5226.1 351.32 12352.82 12522.28 12618.54 6608.1 316.99 1765.54 12411.1 271.15 12655.60 2179.21 2587.35 907.52 238.
2..3: The residuals of fitting Verma's data to the Morse equation (open circles) are about three times smaller than those for the LennardJones formula (solid circles).' 200 400 600 800 V 1000 Fig. 4... de Levie...........::p1Ollc:L/f1 220 240 260 280 300 320 o +~~_+~_r~ 200 400 600 800 r 1000 Fig...2: The data of Verma (open circles) and two fitting functions: a Morse curve (black line) and a LennardJones curve (gray line)............. U(r) 12000 10000 8000 4000 ...::t.. systematic trends...... 2000 .... ... fitted (as sets of 12) to these two model functions.. .. The inset shows the 12 data around the minimum..::..' ..... 4.168 R.. . though both clearly show similar. R 1000 • ! 1000 2000 . 6000 3000 2000 1000 4000 2000 0 +1.2... Advanced Excel for scientific data analysis 14000 ...
g. 4. .3.2.3 show considerable systematic deviations for both. (8) One might be tempted to use linear least squares to fit the data to a LennardJones function. (6) Fitting the Morse function to all data yields re = 263.2. and clearly depends on the model used.2) a .3 Titrating an acid with a strong base As our next examples we will use Solver to find the concentrations and equilibrium constants in the titration of a single weak acid. Ka = [H+] [A] / [HA] is the dissociation constant of the acid. 4.23 7 ± 0. and [H+] is the concentration of hydrogen ions at that same time.24 pm. Ca is the (unknown) concentration of the acid. acetic or citric acid.2.Ch. Cb is the (known) concentration of the strong base used as titrant.747 ± 0. Regression. ao is the concentration fraction of the acid anions. an expression such as U(r) = ao + al/rl2 ~ a2/r6 has three adjustable coefficients rather than the two in the LennardJones formula. when the data are so analyzed (with LinEst. (7) The uncertainty in the dissociation energy is even larger.3 4 ± 0. vibrationless "basement" state below the ground state at v = o.g..2 than the LennardJones curve. which visually overlaps with the LennardJones curve in Fig. NaOH. However.044 pm.3.023) x 1033 / r12 ~ (7. sodium hydroxide.16) x 10 18 / r6. 4. e. say. because they both refer to a physically nonrealizable.1) (4.='0 . although the residuals in Fig. a difference that persists if we only analyze. the 12 data points surrounding the curve minimum.2. with a single strong monoprotic base. For a monoprotic acid HA such as acetic acid this titration can be described by the expression V.1 0 ± 0. the result can depend strongly on the number of data points used.([H+] + Ka) where Va is the (known) volume of acid placed in the test cell (typically a beaker or Erlenmeyer flask). 4: Nonlinear least squares 169 (5) For both the Morse equation and the LennardJones expression.. Even within a given model. as shown in the inset in Fig. e. The Morse curve fits the data better in Fig. to be computed from the (measured) pH. Vb is the (measured) volume oftitrant added at any given time. while fitting the 12 points nearest to the curve minimum curve leads to re = 266. or LSI) we obtain U(r) = (1.2. The two analyses yield different results for both the dissociation energy and the equilibrium distance r. Nonetheless. Kw = [H+] [OH] is the ion product of water. 4. =V Caao[H+]+Kw/[H+] b a Cb+[H+]Kw/[H+] Ka (4. the parameter a denotes the dissociation energy De in units of wavenumbers.016) x 104 + (1. It should be realized in this context that re and De are only model parameters.2.028 ± 0. 4..
For [W] in this calculation use [W]nOi. with slightly different parameter values. .Vb times noise from the second noise column. but you will get the general idea. (4. two values Sn (for pH and Vb) for the standard deviation of the noise. an experimental variable seldom controlled during practical acidbase titrations.g. However..3.) Compute pHnoisy as pH + Sn. Sn. and Kw. de Levie. Exercise 4. K w . pH vs. Figure 4.1) and (4. (4. de Levie. for the constants Ka. (6) For Vb. the corresponding activity corrections can also be made on the spreadsheet. can be found in R. Compute Ka from pKa as = I 0"(pKa).2. two sets of the adjustable parameters Cm K a .calc again use eqs.3.exp using eqs. leave space at its top for a graph. (To make the set more realistic. if needed.y.pH times noise from one of the noise columns. 2001.3.3. it is fully described in. and Ca from the second set.1 and 4.3.1 shows such a layout. and one set of associated pKvalues. In practice the pH noise is not quite so uniform. If you have actual titration data. Ka.exp vs.3. a wider pH spacing can be used in the range between pH 6 to pH 10.1: (I) Open a spreadsheet. and [W]noiSy as 10"( pHnoisy)' (4) In the next column calculate Vb. 4. and Ca use the first set of these. (3) For pH use the values 2 (0. and Ka. Vb. if you are more comfortable with a titration curve. [H+]noisY. but are best considered as adjustable parameters since they vary with temperature.2). in its most compact form. 1999. and two columns of noise (only one of which is shown in Figs. but now with [H+] from the noisefree columns.1 and 4. taking logarithms levels the playing field.cab the squares of the residuals R.2) with an added noise term.170 R. section 4. e. and the titrant volume Vb as the dependent parameter.10 of my book Excel in Analytical Chemistry. Both books also provide the general theory of acidbase titrations which. Educator 6 (2001) 272. for those interested. Advanced Excel for scientific data analysis The value of [H+] is taken as the independent parameter. Below we will use Solver to find the unknowns Ca. or in my Aqueous AcidBase Equilibria and Titrations. Vb.1) and (4. because it is smaller in the buffer regions than near the equivalence point. and K w. and do the same for Kw' This is helpful because especially Kw is too small to be adjusted by Solver's steps. Add the calculated curve to the plot. Vb. since it is a detail that has little to do with Solver.3. The equilibrium constants also vary somewhat with ionic strength. As experimental data we will simulate a titration curve with added Gaussian noise on both the volume and pH axes. pH or. Also deposit numerical values for the constants. and immediately below it provide spaces and labels for the known parameters Va> Cb . The values of Ka and Kw are known in principle. pHnoisy . but this will not be done here. Only a small section is shown in Figs. Kw. 4. Oxford University Press. Cambridge University Press.2). (2) Make columns for pH.3.3.3.exp. (5) Plot the progress curve Vb. Chern. please substitute these for the makebelieve set. as shown in the figures.2) 12.exp.
Here. that is certainly no longer the case for a triprotic acid such as citric or phosphoric acid.exp . in this example. i. and Ca.0.exp is negative.calci for 0 < Vb. but that these can be included (if they are known) in the same wayan additional variance was included in the calibration curve in Fig. Instead of (4. Obviously.3. Especially when the Kavalues for such a polyprotic acid are of similar orders of magnitude. or higher than that of the base solution used to titrate it. The spreadsheet might now resemble Fig.3. is the instruction to go in cell F21 of Fig.0.3 . pKw.expVb.12.2) we must now use . we cannot generate a pH lower than that of the acid sample.1) applies regardless of the parameter values. (021E21) "2). Schwartz. as is the case for citric acid. While (4. because they should not affect the fitting. the uncertainty estimate for Ca does not include any experimental uncertainties in Va and Cb . 4. here abbreviated as H3A. Chern. Then call SolverAid to get the associated standard deviations.1: =IF (021<0. If we left them in. we have little choice but to use a nonlinear least squares fit. which can be read as "if Vb. they might be like the tail that wags the dog.79). Note that the only contributions counted will be for 0 < Vb.exp < 50. and the covariance matrix. 2. (8) Calculate SSR as the sum of all the terms in the column for SR.3. (9) Call Solver. 4: Nonlinear least squares 171 (7) The computation of the squares of the residuals R2 = (Vb. make the cell output zero. from which the unknown concentration Ca is then calculated..exp is positive. but involves three acid dissociation constants.15). The covariance matrix indicates that Co and Kw are only weakly correlated (with a linear correlation coefficient r of 0. and we therefore use IF statements to excise those physically unrealizable values (such as negative volumes or concentrations). without the usual twostep approach of first determining the equivalence volume.calci in the next column is somewhat more complicated. KaJ through Ka3. 64 (1987) 947) by considering Kw as a known parameter. Consequently.Ch. unless we want to rely on approximations of rather questionable reliability. A general equation such as (4. Figure 4. (021E21) "2)). The relevant parameters are obtained directly. and we therefore use a nested IF. action when condition is true. and Ka and Kw hardly at all (r = 0.1. M. IF (021)50. 0. but does not incorporate any physical constraints.2. 4.e. and let it minimize the sum of squares of the selected residuals by adjusting the values of pKa.2 illustrates what you might obtain.1) for a single weak monoprotic acid can be linearized (L. action when condition is false)." In the present case we want to compute (Vb.Vb. we would write =IF (021<0. Note that.3. when we want to compute the square of the residuals only when Vb. otherwise make it equal to R2. then. The formal description of the titration ofa triprotic acid with NaOH is similar to that of acetic acid. Educ.exp< 50.3.3. J.1) and (4. An If statement in Excel has the syntax =IF (condition.
172 R. is present as column H but not shown here).1: Part of the spreadsheet just before the actual data adjustment. . while the drawn line represents the calculated curve for the assumed parameter values (in cells F13:Fl7). de Levie.3. Open circles denote the 'experimental' data.3. which include noise on both the pH and volume axis (a second column of noise data. The guess values for the acid concentration Ca and the various pKavalues are shown in bold. Advanced Excel for scientific data analysis C a (a 2 +2a] +3ao)[H+]+Kw/[H+] C b +[H+]Kw/[H+] (4. 4.3) Fig.labeled Vbnoise. as is the as yet unadjusted sum ofthe squares of the residuals.
The guess values for the acid concentration Co> the various pKavalues. The standard deviation of the fit is shown in italics to the right of SSR.3. .Ch.3. and the unadjusted sum of the squares of the residuals SSR. 4.4) [H+]3 + [H+]2 KaJ + [H+ ]KaJKa2 + KaJKa2Ka3 where ai denotes the concentration fraction of the species HAi. 4: Nonlinear least squares 173 (4. Fig.2: The same part of the spreadsheet after using Solver and SolverAid. and the standard deviations of the adjusted parameters are likewise shown in italics to the right of their numerical values. Note that the calculated line fits the data quite well.3 • In all other aspects the approach is the same. The covariance matrix is not shown. as do the individual data points. are shown in bold.
and Fig. de Levie.154784 0.92298 0.00 pKa3 = 7. in this case. A 2 I 12 ~~~~~~~ 7 2 +~~4~+~ o Va = 25 Cb = 0.ca lc SR pHnoi e Ybnoi e 24 25 26 27 28 29 30 31 32 2 2.1.0016 0.73313 0.0000 I Ka3 = IE07 Kw = IE.00 pKw = 14.27768 1.470505 8.267 pH [H] [H]noi y Vb.234 18 1.892458 0.18359 1.14 Ca = 0.3.69043 0.0140 pKa! = 3.2 3.74E05 Ka3 = 3.27647 1.3: The top part of a spreadsheet for the titration of a triprotic acid.4 its final form after Solver and SolverAid have been used.9170951.09502 0.9 07 Kw = 2E.4).14 Co = 0.4 2. .24426 1. because the three pKavalues are too close together to yield resolved equivalence points. The guess values for the acid concentration Cao the various pKavalues. 3 3.97763 2. 11 793 Fig.6435 0.4 0.02167 0.3.000333 6.075987 0.581119 5.000412 6.0015191.885258 o 0.0 670 1.0040 0.77351 1.00 40 Vb 50 SR = 3146. 5175 0.3. 4.30023 0.009332 5.286164 2. and the unadjusted value of SSR are all shown in bold.174 R.25 10 20 Ka! = 0.3.002039 0.3.40903 3.3.calc should now be based on (4.03 sl1(PH)= 0 .108556 6.000853 3.788712 1.0025 0. Kal through Ka3 .3 shows such a spreadsheet with some initial guess values appropriate for citric acid.005246 2. but now with three Kavalues.006675 3.69020 1 .0063 0.2 2.001 Ka2 = 0.19835 2. 16911 o o 0.84691 0. Gaussian noise has been added to the simulated 'experimental' data.exp Vb.0004 0. instead of one. 1 sl/(Vb)= 0.0 I 00 0.670013 3.0006 0.0010 0. the formulas used for Vb•exp and Vb. there is only one clear step in the curve.00 pKa2 = 5.0120 30 Ko! = 0. Note that. 4. Likewise.547314 4. Advanced Excel for scientific data analysis Exercise 4.45303 0.3) and (4.2: (1) Open a spreadsheet like that used in exercise 4.000741 Ka2 = 1.6876 5.6 2.22737 0. Figure 4.3.
2 3.82E.2 is too weak an acid.2.1.159 pKa2 = 4. yet only two clear steps in the titration curve are observed.4 0.2.2776 1.8.8 3.7.3tricarboxylic acid (hemimellitic acid) with pKavalues of about 2.3 after minimizing SSR with Solver. and observe how the shape of the titration curve changes.0 670 1 .4.009 pHnoi 'c Vbnoise pH [H] [Hjnoisy Vb.24426 1.4E+00 0.18359 1.. including the number of steps in that curve. such as benzenel. With phosphoric acid.0 100 0. or phosphoric acid with pKavalues of about 2.5325 0.27647 1. HP04.00 16 0.1 01 1.03 0.25 20 Kal = 0. 4.2.741 Vb.3E+OO 4.69043 0.742 pKaJ = 6.0009 0.4705 5.4: The same as Fig.0120 1.0052 0.0004 0. 11 793 Fig.059 0.4090 2.5E+OO 1.6 +00 1.69020 I.088 0.2 2. 1086 6.98 07 30 Kal = 6. and 12.9171 0.0025 0.9E+00 6. 5.04 SSR= 63.exp RR 0 0 0 3.4 2. A hypothetical triprotic acid with pKavalues of about 1.0 2. E+OO 3. i.0004 0. and 7.calc 0.74E05 Ka3 = 3.3. because the third pKa is too close to that of water. 19 35 2.0093 0.447 pKw = 13. only two clear steps are observed because pKa\ and pKa2 are too similar in value.09502 0.0121 pKal = 3.0 3.3.0067 0. the three pKavalues are quite distinct.e.0010 0.6 2.0003 5 .00010 0.0601 1.6877 3.7735 1 1 .5473 6.0040 0.0063 0.2.9.2795 1.7E+00 3.0090 31 32 2.000741 Ka2 = 1.1 0.070 0. 4. . 4: Nonlinear least squares 175 (2) Try the same spreadsheet for other triprotic acids.57E07 40 Vb 5 Va = Cb = 15 SII(pH) = 16 SI/(Vb) = 17 18 19 20 Kw = 2E1 4 Ca = 0.2274 1.9776 2.73313 0.2341 1. In the case ofhemimellitic acid.0931 0. 469 1 0.93 04 Ka2 = I.6700 3.0020 0. I E05 Ka3 = 3.Ch. A 2 3 4 5 6 7 8 I" pH 7 2 0 10 25 0.30023 0.0015 0.50 KII' = 1.0006 0. and estimating the resulting parameter uncertainties with SolverAid.14 Ca = 0. and 9 would yield a titration curve with three clearly distinct steps.
An example of applying this method to a practical (and rather complicated) sample can be found in section 4. unless the pH meter is far below par or poorly calibrated.~2Q~2}_ ____ }.0. and subsequently increases faster.. Therefore. the molar conductances g of H+ and OH.. there is seldom any question that the volume read from a buret is the more errorprone observation. KOH. 4. 4.73iiiisl 0.11 of my Excel in Analytical Chemistry. 4.Gi of the solution will first decrease as a result of the neutralization of the protons of the perchloric acid via H+ + OH. which of these (if any) should be taken as the independent parameter.. i.176 Ca pKal pKa2 pKa3 pKw R.240601! 0. Cambridge University Press. as one neutralizes acid with base.4. Advanced Excel for scientific data analysis . Such a titration works because. well illustrated by Schwartz & Gelb. 200l.073587 1 0.4 Conductometric titration of an acid mixture L. Schwartz & R.l5.: Fig. which is available when we fit Vb as a function of pH.1.329113 0.are several times larger than those of any other ions present. in water.+ Ac.267 0. I.i .3.107261 0. which effectively replaces H+ with K+. and to compute the intersections between these extrapolated line segments to find the equivalence points.507069 0.+ H20.1 and shown as open circles in Fig.10067 1 Oj19(13. de Levie. This is a rather laborious procedure. it is not always clear whether Vb or pH carries the larger experimental errors. simultaneously.507069. These three nearlinear regions can be seen in the data from Schwartz & Gelb listed in table 4.370622i 0.O~i0067 1 0. but with several .+ H20. 4.5: The linear correlation coefficients generated by SolverAid for the data in Fig.267 1 0. The conductance then slowly rises while K+ is added and.020597 1 :. but not the other way around. M. the total conductance G = I. The traditional analysis of such data is to replot them after correction for dilution..107261 ! 0. (In an undergraduate laboratory.in the reaction HA + OH.3. Gelb (Anal.4 show no strong correlations between any of the computed parameters.e.~7.JlJ}!9l~ _0. Chern. 56 (1984) 1487) reported data for a conductometric titration of a mixture of a strong (perchloric) acid of concentration Cal plus a weak (acetic) acid of concentration Ca2 with the strong base. from which one can then calculate the sought concentrations ofthe two acids.4.. In the case of acidbase titrations. HA is converted into Ac.?7 _0~~~6_0} _ ~.) Here we therefore take a rather pragmatic approach: nonlinear least squares fitting requires that we have a closedform expression for the curve. to fit the approximately straight regions in the resulting curve (excluding points in the transition regions) to three straightline segments.020597 i 0.ii. as excess base introduces both K+ and OH.
but they can be avoided by fitting the entire titration curve as one single curve. as was done in section 4. None of these difficulties are insurmountable.840 3. One is to find an explicit mathematical expression for the conductance as a function of either [H+] or Vb. but also makes it easy to display the individual components of the measured conductance.432 3.445 4. which (when such an algebraic solution is not already available) is often faster and simpler.4.Ch.305 5. and the corresponding cell conductances G.330 3.522 24 26 28 30 32 33 34 35 36 38 40 42 44 Vb mL G 3. such expressions will be relatively messy. Vb G 4 6 8 10 12 14 15 16 17 18 19 20 22 mL mS 6. (2) Finding the intersections between such straightline segments can be rather imprecise if the covariances are not taken into account (cf. In this case.097 4.638 5.3 with a potentiometric titration. the equivalence points. Chern.4.1 ) The corresponding ionic contributions to the solution conductance G are then given by (4. in mL.772 5. V.2) .975 6.1: The titrant volumes Vb. but without using an overarching closedform mathematical solution. (1) There are no clear criteria for including or excluding points in the "linear" sections. It not only avoids the derivation step if the theoretical expression is not known to the user.415 3. Since this is an acidbase titration we start with the corresponding relation between the titrant volume Vb and the proton concentration [H+]. M.052 4.328 3.633 3.610 3. and leads to the same answers.4.145 4. 56 (1988) 1487.020 4. as reported by L. in mS. Schwartz & R. Below we will illustrate the latter approach. complicates the procedure. Gelb in Anal. There are two ways to approach this problem with Solver. and can further reduce the precision of the results unless the covariances are used.080 5.420 3. section 2. (3) The computation of intermediary quantities.14). =V Cal+Ca2Kal([H+]+Ka)[H+]+Kw/[H+] b a Cb +[H+]Kw/[H+] (4.742 3.370 3.865 3. The other approach is to make the calculation on the spreadsheet as if the constants involved were all known.I. 4: Nonlinear least squares 177 problems. fitting all adjustable constants directly with Solver.280 4.380 5.680 mS Table 4.946 4.
such as 0. in cell HIS the instruction =10"B15 for Ka. Advanced Excel for scientific data analysis G CIO:. or the other way around.1) to compute Vb in column B. (7) Use (4. and gOH= respectively. and in cell H16 the command =10"B16 for Kw. Verify that they have similar shapes. gCl04=. And in cells G13:G17 write Va=.exp. which gives you four points per pH unit. Ca2=. = Cal Va a V + V. since the two scales are different. (2) Deposit guessed values in B13:BI6.4. and pKw=.4. and Gexp. we cannot compute the residuals.4. G(H).4. Vb in separate graphs for columns H vs. G. Vb.Ycalc for common values of x.4.1. then copy this instruction down to row 60. We . starting in cell B21.1 M.1.1 since Cb = 0. plot G vs. G(OH). G(Ac). Now we face a problem: we have two sets of data for G as a function of Vb. and SSR=. enough for the rather smooth conductometric titration curve. Vb. gAc =50.2) for G(H1 in column C. and in columns J and K copy the experimental values for G and Vb from table 4. etc. (6) In cell A21 place the value 0. in cell A22 the instruction =A22/sqrt ( sqrt (10) ) . and 14 for pKw. pKa=. and gOH = 150. (4. b (4. (3) For the ionic conductances use crude literature data. G(Cl04). (8) Leave column I blank. Ka=.. gAc=. Gintpol. B and J vs. de Levie. G(K).4. In cells D13:DI7 deposit gH=.6) where the gi are the molar conductances of the species i. K respectively. in cell Hl4 the value 0. (4.7 for pKa. (9) In Al:EI2 and FI:J12.178 R. (4) In cell H13 place the value 100 (since the sample volume used was 100 mL). gK=.4) (4. g CIO:. The solution is to interpolate the calculated curve to yield xvalues that coincide with those of the experimental data set.4. gCl04 = 50.1: (1) In cells A13:AI6 of a new spreadsheet place the labels Cal=. Kw=.02 for Cal and Ca2 . 4. the differences Yexp .4.5) (4. gK = 50. (5) In cells A19:K19 write the column headings [H+]. i.7) Exercise 4.e. such as gH = 300. Cb=. and the solution conductance G is given by (4.3) for G(K+) in column D.4. However.
") For the interpolation we will use the quadratic Lagrange interpolation function described in section 1. 4: Nonlinear least squares 179 will here use the conventional choice. 8 ~~ G 6 4 2 o +~_1+_r~ 40 Vb.1: Open circles: the data of Schwartz & Gelb.1. 4. then with those from HAc. plotted as cell conductance G (in mS) vs. . (The same method applies if we make the opposite choice. by identifying Vb with the independent variable x. which in this case might be more rational because the authors wrote that they "did not actually take care to add exactly 1.exp 50 o 10 20 30 Fig. gAc'" 48. and gOR'" 208.00 mL at a time. The added OH. gCI04'" 167. Perchlorate is merely diluted by the addition of KOH. Rather we added approximately this amount. 4.Ch.ions first react with the protons from HCl04 forming water. titrant volume Vb.4. 6 ~~~.11. while the contribution of potassium gradually increases. G 4 2 0 +=~+~~r~~~~r1 o 10 20 30 40 Vb 50 Fig. generating both water and Ac. Closed circles and connecting line: data calculated with exercise 4. with G as the independent parameter.4. The contribution ofOH.2: The components of the cell conductance as a function of the base volume added.soars after both HCl04 and HAc have been neutralized. gK '" 59. with the values found: gR '" 349.4.exp (in mL).
For that to be meaningful one should start with much more careful measurements. Prof.01631 mM and Ca2 = 0. and then placed in a lighttight box with a photodetector. Copy them from here. do not fit those of the published values very well (absolute values are meaningless here since Schwartz & Gelb did not mention their cell constant). the equilibrium constants Ka and Kw might need to be corrected for activity effects. Make sure that the Lagrange function is defined in an open VBA module. because it is specific for the reagent.1 (continued): (10) In cells 121 deposit the instruction =Lagrange ($B$21: $B$ 60. One can raise several objections to the above procedure: the ratios of the equivalent conductances.4. 4. then of Cab Co2 . $H$21: $H$60.e. and let it minimize SSR in cell Jl7 by first adjusting the values of Cab Co2 . (14) Copy the results obtained in 121:146 to the graph in Al:E12. (11) Copy the data from column I into the graph of Gexp vs. and has kindly provided a set of such data. and leads to quite acceptable results: Schwartz & Gelb list Cal = 0. otherwise enter it. compare Fig.01637 mM. G(Ac). especially those involving perchloric anions. Ka. cf.. i. OH. or download them from the web site. Often. can indeed be made. but are not illustrated here.180 R. Fig. and Ka.5 Fitting a luminescence decay Glowinthedark toys based on copperdoped zinc sulfide can be excited with visible light. Vb.01795 mM respectively. 2) and copy this down to row 46. display the values of the individual ionic conductances GO that make up the total value of G. whereas we find Cal = 0. G(K). is neither a detailed examination of conductometry nor advocacy of its use for acidbase titrations. Carl Salter of Moravian College lets his students in physical chemistry lab do this experiment. which may vary during the titration. G(H).l.expo (12) In cell Jl7 deposit the instruction =SUMXMY2 (121: 146.e. The above exercise.4. i. then. kinetic processes are of first order.4. 4.2. Such corrections. and the dependence of the molar conductances g on ionic strength might likewise have to be incorporated. J21. de Levie. K21: K4 6) . 4. for which purpose potentiometry is almost always far superior. and G(OH). 10 is the amplitude of the signal 1 . G(CI04).01789 mM. to the end of columns J and K. (13) Call Solver. Advanced Excel for scientific data analysis Exercise 4.. C a2 = 0. they follow an equation of the form 1 = 10 e kIt where t is time. (15) In the second plot. Our point is merely to illustrate how even such a rather complicated data analysis problem can be addressed adroitly with Solver. in FI:Jl2.
1l71 9 0.0694 64 0.0646 92 0. 4.0635 Table 4. A plot of the experimental data suggests that there may be a significant background signal. Since the photo excitation generates equal numbers of electrons and ions (electron 'holes').0696 60 0.5.0644 91 0.5.1504 4 0.1056 12 0.0705 10 0.0634 98 0.0897 100 0. where k2 is a secondorder rate constant. 4.0695 57 0.0703 58 0.0852 23 0. We therefore fit the data to the corresponding rate expression 1 = /01(1 + 10 k2 t) + h.0649 t lexp 90 0. It is clear from Fig.0767 39 0.1048 13 0.0678 68 0.0641 95 0.0605 18 0.0669 80 0.0667 75 0.5.1431 5 0.0709 56 0.0825 30 0.0715 52 0. and Fig.0665 74 0.0645 96 0.0871 28 0.0773 37 0.0660 82 0.0834 26 0.0638 99 0. 4.0771 65 0.0665 67 0.0776 36 0.0714 59 0. t lexp 0 0.0727 48 0.0660 76 0.0673 lexp 72 0. The box may not have been lighttight after all.1798 2 0.0675 62 0. .0730 47 0. but the systematic trends in the residuals plotted in Fig. Figure 4.0709 53 0. see Fig.5.0662 86 0.0785 35 0.0670 63 0.5. or perhaps it had not been closed properly.2b suggest that the model used is still inadequate.0640 69 0.0583 85 0.0964 17 0.0653 83 0.0635 103 0.0672 71 0. which subsequently recombine to cause the observed luminescence.0713 55 0.0641 94 0.1550 3 0.5. The resulting fit is fair.1126 t lap t lexp t 54 0.0750 40 0. 4: Nonlinear least squares 181 at t = 0.0664 81 0.0766 34 0.0659 87 0.3 shows both the resulting fit and the corresponding residuals.0836 29 0.0857 25 0.0646 93 0.3b that this simple secondorder model satisfactorily represents the data.5. we actually should expect secondorder kinetics in this case.0823 33 0.0661 78 0.0715 51 0.0727 50 0.0675 70 0.0916 20 0. where h is a constant representing an unspecified. constant background signal.0752 41 0.0668 77 0.1 the top of the corresponding spreadsheet.0904 19 0.0898 21 0.1325 6 0.3.0765 38 0.2a.0865 27 0.0726 46 0.1079 II 0.5.0638 88 0.0677 66 0.0843 24 0.0670 73 0.0739 45 0.0743 42 0.1024 14 0.0740 44 0.0632 102 0.0908 22 0.0663 84 0. 4.0959 16 0.0794 32 0.2006 0.0762 31 0.1: The luminescence decay data used in Figs.0668 79 0.0635 89 0. We therefore fit the data to 1 =10 e kIt + h.0638 97 0. and kJ is the firstorder rate constant.0857 15 0.0741 49 0.Ch.1234 7 0.0626 104 0.1 through 4.0695 61 0.0619 101 0. without obvious systematic bias.1174 8 0.0739 43 0. 4.
1405 0.0030 0.0002 2/1dorder 0. sometimes used to decompose spectra or chromatograms into their presumed substituent components.0094 0. Fig.0065 0. 1393 2 3 4 5 Fig.0004 0.1325 0.0024 I cafe R 0. 1550 0.0023 0. Advanced Excel for scientific data analysis Ib= SR= I e.5. 1431 0. de Levie. 1327 02006 0.0068 R 0.0205 0. 4.1534 0. 4. 4.1704 0.0018 0.5.0006 0.2: (a) The measured phosphorescent light intensity (solid circles). time t in seconds.180 1 0.0565 0. 1620 0. 1461 0.6 Fitting a curve with multiple peaks Our next example of Solver as a rather general curvefitting tool will illustrate the principle of a spectrum analyzer.0004 0.0070 0.0023 0. 1775 0.1: The top of the spreadsheet used for the data in table 4. while D I :D3 and F I :F3 contain the corresponding results of SolverAid. 1983 0. For the purpose of our illustration we will assume that all peaks .0030 0. and its analysis (drawn line) in terms of afirstorder decay with offset.182 A R.0049 I calc l SI order 0.xp 0 I 0. The results of Solver are shown in C I :C3 and E I :E3 for the firstorder and secondorder fits. in arbitrary units. 1615 0.5. and (b) the resulting residuals.0026 0. 1798 0.1500 0.0679 0.1504 0.0005 0. vs.
plus added noise. We use 1* instead of  in view of Excel's precedence of negation over exponentiation.02 R 0 1 0. (3) The values for X can be arbitrary: they represent sample number.10 • 0. In the example of Fig. peaks with considerable overlap can be resolved. and widths in the experimental and calculated function. noise. 4.1 might then read =$A$lS*EXP(l* ((A21$B$lS)/$C$lS)A2)+$A$16*EXP(1*((A21$B$16)/$C$16)A2)+$A$17* EXP (1* ((A21$B$17) /$C$17) A2) +$H$lS*B21. provided the assumed peak shapes are correct. magnetic field shift. enter a row of column labels for X. elution time. as well as the adjustments.6.02 . 4: Nonlinear least squares 183 conform to a given general shape. a sufficiently large data array to ensure that random noise will average out. center positions. vs. (5) For Yexp assume the sum of.1 we useX= 1 (1) 1000. Yexp . We will first generate a noisy 'experimental' curve. 0.5.6. The method is primarily visual. . The instruction in cell e22 of Fig. in arbitrary units.00 10 0.1: (1) Open a spreadsheet. at its top leave space for two graphs. and (b) the resulting residuals.15 0. time t in seconds.3: (a) The measured phosphorescent light intensity (solid circles). Exercise 4.voo" 0 ~!Lo " 0 0 cr b I) 0 20 40 60 0 100 120 0. say.05 0 20 40 60 • 80 100 t a 120 Fig. 4. in that it displays the guessed values. which we will here take to be Gaussian. and for the sum of the squares SSR of the residuals.6. Also enter labels for the standard deviation Sn of the noise. see section 1. three Gaussians.16. then analyze this curve in terms of a number of Gaussian peaks of the general form y 2 = a exp[(xbilc ]. (4) Fill the noise column with Gaussian noise of unit standard deviation and zero average. or whatever appropriate independent variable. and its analysis (drawn line) in terms of a secondorder decay with offset.Ch. 4. Often. specifically for the peak amplitudes. (2) Below these parameter labels. and below that for labels and constants. One cannot expect to get similar results with small data sets.20 0. Ycab and R. wavelength or wavenumber.
so that the instruction in cell D22 might then read =$0$16*EXP (1* «A22$E$16) /$F$16) A2) +$0$17* EXP (1*«A22$E$17)/$F$17)A2). X. showing the noisy combination of three Gaussian peaks. such as peak centers at about 270 and 675 nrn.. X and.1320 0.4562 1. then optimize the parameters in D16:F17 with Solver. Advanced Excel for scientific data analysis 5 . Figure 4.0009 0.1573 0.0009 0. (8) Plot Yexp and Yealc vs. (7) The residual R is simply the difference between Yexp and Yeale . and base widths of about 100 and 115.184 A 2 3 5 jllllctioll 4 R.8.1: The top of the spreadsheet. (10) Calculate SSR as =SUMSQ (E22: El021) .0008 0.1311 0.01 3.93 center 271. (11) Comparison with the initially assumed values is quite good for the peak centered around X = 270.1582 0. and introducing the corresponding parameters in .0057 Ycalc 0.0 na = 0. In fact. R vs.xp 0.0047 24 2 3 4 1.94 width 95 80 90 94. de Levie. by extending the expression for Veale in colunm D with a third Gaussian. and their analysis in terms of two Gaussians (double black line).3202 0. the residuals give a (very slight) hint of a systematic deviation. By playing with the parameters in block D16:F17 you will quickly home in on a visually reasonable fit.6. (9) Try some parameter values for a fit. amplitudes of about 3 and 3. (6) For Yeale we will first assume two peaks.836 Ye.0466 Fig.5825 0.5 111. We therefore test whether this peak can be resolved into two parts. 3.1 shows an example.0010 R 0. residuals 3 2 0 I 5 0 250 500 750 1000 0 250 500 750 1000 ampl. 3 2 3 X center 272 620 710 noise width ampl.0457 0.1 R = 11. but only fair for the second peak. 4.7 675. in a separate graph.0466 0.6.
00 2. 4. where 11.8788 are the values of SSR.3 shows what you may obtain. N = ..0009 0.5 100.37...836 and 9..\320 0. .7 575.1 SSR = 658.10 to estimate the likelihood that the third Gaussian is statistically significant. A 5 ..836/ (1000 . 3 2 3 X center 272 620 710 width 95 80 90 amp!.. as is the case here..00 center 271. can we use the Ftest described in section 3. .0466 0.2.5825 0.0466 Ye. by keeping the first peak as is.0009 0. 3. residuals 4 3 2 0 1 0 250 500 750 1000 5 0 250 500 750 1000 amp!..\582 0. 4. Figure 4.. Run Solver again.. leaving good enough alone. i.6. \ 573 0..3202 0... \3\ \ 0.4562 1. [1I1/CliOIl 5 .xp 0.0 width 94.8788 / (1000 .6. as in Fig.Ch. How do we know whether the extra peak is significant? In general this question cannot be answered: the peak in the experimental set might be a Lorentzian or have another shape sufficiently different from a Gaussian to require more than one Gaussian to represent it.0047 2 3 4 Fig.0057 Ycalc 0.9) or 1.0008 0.0 100.0457 0.00 10 R 0.0 na = 0.6) divided by 9.01 2. you can save computer time by only adjusting D 17:F 18. Only if we know that the peaks are all Gaussian. In this case the variance ratio is 11.6. 4: Nonlinear least squares 185 DI8:FI8.0 775.04 noise 1.e.2: The top of the spreadsheet just before adjusting the curve with Solver to three Gaussians.
7 616.0057 0.4562 1.0047 94.0009 0.186 R.3202 0. 3. dfl.6. 4.8788 X noise 1. By this criterion. or any mixture thereof.06 R 0. asymmetric peak shapes. 1311 0. dj2) will show that this variance ratio is approximately equal to FINV(O.0466 0. 991).3% of being a coincidence. The above procedure can easily be automated in a custom . even though the positions.5825 0. it corresponds with a probability of 0. and P = 6 or 9 is the number of variables used to describe them in Figs.3 respectively.6.0457 0. the third peak is surely significant. i.86 3. 4.003 or 0.1 and 4. It cannot be emphasized enough that the usefulness of the above fitting procedure depends on the appropriateness of the model: one may instead need to use Lorentzians.e.5 79.0009 0. 994.53 707. de Levie.000 0.9 na = 0.00 10 Fig..01 1.6. Advanced Excel for scientific data analysis 1000 is the number of data points. 1582 0.0466 1 2 3 4 0. and widths of the second and third peak are rather inaccurate. 1573 0.12 Yeale 2 272 620 710 95 0 90 Yexp 271. heights. Some playing with the Excel function =FINV(criterion. A 5 ~~ flll/ ction re idllals 5 0 250 center 500 750 1000 0 center 250 width 500 750 1000 width ampJ.1320 0.3: The top of the spreadsheet after adjusting the curve with Solver to three Gaussians. 1 SR = 9.1 90.003.
We then have a twodimensional problem with adjustable parameters for both x (wavenumber shifts) andy (amplitudes). who knows the source of the data and the purpose of the curve fitting. in which case they may be subject to shifts to longer or shorter wavelengths due to the interaction with the solvent. P. three for wavenumber shift) in order to minimize the sum of the squares of the difference between this combination and the spectrum of CP29. As our specific example we will use a preliminary analysis of the spectrum of CP29. Although it is not strictly necessary. However. in terms of its dominant constituent pigments: chlorophyll a.7. a somewhat idealized situation but often the best we can do without much more specific information.7 we saw how to use linear least squares to resolve the composite spectrum of a mixture when the (additive) constituent spectra are known. 4.Ch. 4. . The spectra of the three main constituents shown in Fig. we construct a linear combination of the spectra of these three components. Note also that significant noise reduction is achieved when the number of fitting parameters used. Once that is done. chlorophyll b. we will use a custom spectral function S to give the problem a more compact notation.1 are sufficiently broad that they can each be represented quite satisfactorily with six Gaussians. The point to be made here lies not in any specific peak shape used. matters might be more complicated because the component spectra were taken in a different medium. and xanthophyll (lutein). 4: Nonlinear least squares 187 macro. Harry Frank of the University of Connecticut.7 does not work. and adjust the resulting six parameters (three for amplitude. Choosing the specific model to be used for fitting is the responsibility of the user. For the sake of the exercise we will assume here that the spectra merely shift along the energy (rather than wavelength) axis but otherwise retain their shape.7 Fitting a multicomponent spectrum with wavenumbershifted constituents In section 3. On the other hand. a lightharvesting pigmentprotein complex. but in the flexibility of Solver as a general curvefitting tool. in which case that approach is all that is required. In that case the simple method illustrated in section 3. and we will need to parameterize the spectra so that they can be shifted smoothly. The data for this exercise were kindly provided by Prof. Spectra in a mixture can usually be considered as mutually independent. is much smaller than the number of experimental data points N. fitting to an incorrect model may lead to significant distortion of the underlying curve.
P$12.P$10. We can use more elaborate expressions for Gaussians. etc. This will give you a blank sheet in which you type the instructions for the function. a2.P$15. T2 As Double. P $ 2 0 . 4. placed the wavelengths A (typically in nm = 109 . or even use the Excelprovided function NORMDI8T. P$8. and that specifying S. bl. The last line specifies the end of the function. Function S(x. as. T6 As Double Tl T2 T3 T4 T5 T6 S = al a2 a3 a4 a5 a6 I I I I I I Exp( ((x Exp( ((x Exp( ((x Exp( ((x Exp (( (x Exp( ((x cl c2 c3 c4 c5 c6  shift) shift) shift) shift) shift) shift) I I I I I I bl) b2) b3) b4) b5) b6) A A A A A A 2) 2) 2) 2) 2) 2) amplitude * (Tl + T2 + T3 + T4 + T5 + T6) End Function (4) The first long line (wrapped around for better visibility on the monitor screen. bS. b4. amplitude. then (in the Visual Basic Editor menu bar) select Insert => Module. a3. P$7. but Visual Basic was developed independently and. h]. 0.oupusa. P$9. al. cS. You need not enter the spaces between the symbols. a6. TS As Double.P$19.P$16. Then convert the wavelength scale into one of wavenumbers since the spectral shifts are in terms of energy (1 nm = 107 cm so that v in cm. b3. with those dimension statements. (5) Then there are two lines that specify the parameters T1 through T6 as having double precision. type or copy (from SampleMacros) the text shown below. unlike Excel.p$14. and deposit labels and values for the 18 constants a]. b2.7. c]. Chlorophyll a. c2.1: (1) Download the spectra from the web site www. a4. as illustrated in Fig. cl.org/advancedexcel. shift. 1.P$18. a2.P$17. (8) Make a colunm for the calculated spectrum of. because they instruct the function what to do: to calculate S as the sum of Gaussians. does not automatically use double precision. The spacer lines and indents are used for easy readability.7. P$5. say. but these are not as convenient because their amplitudes vary with their standard deviations. c6) Dim Tl As Double. Plot the individual spectra. b6. de Levie. This may seem (and is) strange.P$13. by typing a space followed by an underscore) specifies the name of the function (here called S) and its 21 input parameters (within parentheses).P$11. (7) Go back to the spreadsheet with AltuFll (Mac: OptuFll) which acts as a toggle switch between the module and the spreadsheet. and are therefore already in double precision.188 R. and should be entered into a spreadsheet. (2) Use the keystroke combination AltuFll (Mac: OptuFll). or Iools => Macro => Yisual Basic Editor. (3) In that module. P$4. T3 As Double Dim T4 As Double. b2. and in its top row deposit the instruction =8 ($B27. All other parameters come from the spreadsheet. c3. (6) The lines of code specifying T1 through T6. c4. P$6. as the Visual Basic editor will insert them. are the heart of S. So we must insist on it.1 = 107 I A where A is in nm). whereas no such interactions between the parameters occur in the simpleminded defmitions ofT used here. P $ 21) if you have. where they can be found under SampleData.1. Advanced Excel for scientific data analysis Exercise 4. say.
0 makes the shift zero. just as standard instructions such as sqrtO or expO would. almost entirely hidden by the points). The optional dollar signs following the column identifier (P) facilitates copying this expression to another row for the next component. vertical scale: arbitrary absorbance units.4 0. Copy this instruction down 200 rows.0 18000 20000 22000 24000 26000 2 000 Fig. Horizontal scale: energy in wavenumbers.7.2 0..8 0..g.8 0. 4: Nonlinear least squares 189 m) in A27:A227.0 Xanthophyll 20000 22000 24000 26000 28000 0. 1. e.2 0. Here $B27 refers to the top of the column with the wavenumber scale. chlorophyll b. the same length as the spectrum.t = 102 m\ so that v 7 = 10 /A) in B27:B227 and the constants at through C6 in. 1 sets the amplitude to one.6 0.0 18000 1.6 0.Ch.4 0. Each cell in the column must therefore contain the function statement.4 0.0 Chlorophyll b Chlorophyll a 20000 22000 24000 26000 2 000 0. and the rest refer to the locations ofthe 18 fitting parameters. the corresponding wavenumbers vein cm.e.6 0.1: The experimental spectra of chlorophyll a. i. and xanthophyll (open circles) and their parameterizations in terms of six Gaussian peaks (solid curve.8 0. P4:P21. (9) The function only changes the contents of the cell in which it resides. .0 0.0 18000 1.2 0. 4.
aa) / Exp (( (x bl) < 25 Then . aI.03 T3 = a3 / If ( (x . b4. a6. 0. b.04 T4 = a4 / I f ( (x . such as at 23. c3. copy it. bl. T6 As Double If «x .190 R.cl ~ aa) / bl) b2) < 25 Then .000 and 26. T3 As Double Dim T4 As Double. In order to keep the equations relatively simple.c2 .aa) / Exp (( (x . T2 As Double.24. and c for the peak at 24. while it is too dumb to replace it by O. (12) Now increase your range and fit.aa) / b2) b3) < 25 Then . and let it minimize SSR by adjusting. the changes to be made to guard against underflow.aa) / b4) b5) < 25 Then . T5 As Double. Advanced Excel for scientific data analysis (10) Highlight the column containing the calculated data. the six parameters describing the peaks at both 23. and play with the corresponding values of b to get the peak widths approximately right. we make three additional rows. you are ready for the final fitting of these spectra to the data for CP29. in bold. (13) You may encounter trouble with the computed values.000 and 24. cl. and paste it into the graph for chlorophyll a. that you will find under Qptions. showing the error #NUM!.aa) / Exp (( (x .000. chlorophyll b and xanthophyll. and Conyergence. b5. b6. Select wave numbers at the centers of visible peaks. Repeat the same procedure for the two other pigments. the approximate numerical limit for Excel. b3.01 TI = al / If «x . Play with these parameters (since they are independent. shift. a4. the three parameters defining the major peak near 23. Then add minor peaks at different wave numbers to fit the extremes of the curve. in which a number becomes too small for Excel to represent.4 respectively. in .02 T2 = a2 / If «x . one for each constituent. (11) Enter a label and formula for the sum of squares of the residuals. To prevent this from happening we change the instructions for function S. This most likely reflects a numerical underflow.000 cm~l and corresponding amplitudes. a2. b2. (15) With the three constituent spectra parameterized. c6) Dim TI As Double. amplitude. and 0.06 T6 = a6 / .000 cm~l.aa) / b5) b6) < 25 Then c6 aa) / b6) A 2) Else Tl 2) Else T2 2) Else T3 2) Else T4 2) Else T5 2 ) Else T6 0 0 0 0 0 0 A A A A A S = amplitude End Function * (Tl + T2 + T3 + T4 + T5 + T6) (14) By now you should have a wellfitting curve through the experimental data for chlorophyll a. c2. a3. and repeat for each ofthe six peaks. a5. c4. Then call Solver.05 T5 = a5 / If «x . until you finally fit all 18 parameters simultaneously.8. Tol~rance.aa) / Exp (( (x .c3 .aa) / b3) b4) < 25 Then . such as 0.aa) / Exp (( (x . You may have to reduce Solver parameters such as £recision.7. say. as illustrated below. That can easily happen with an expression such as l/exp[x2] which for x = 27 is already smaller than about 1O~306.000 cm~l. Function S(x. Then do the same for a. where we merely repeat the coding with.000 cm~l. c5.c4 . by adding zeros behind their decimal points.c5 . de Levie.aa) / Exp (( (x . this is easier than it sounds) to get as tight a visual fit to the experimental data as you can. say. Gradually widen your net.
the original assumption that the spectra merely shift is only a first approximation.. and finally for xanthophyll. then do the same for chlorophyll b. Yet. 4... and clearly illustrates what can be achieved with parameterization. P$4.P$19.P$20. Furthermore..~~. considering the limitations inherent in the underlying model. That we nonetheless obtain a reasonable fit with such simplifying assumptions is._ .P$3.... displayed in Fig. depending on what you want to see. like the cup that is either half empty or half full.. which may find themselves in different molecular surroundings. After you get as close by manual adjustment as you can. 4: Nonlinear least squares 191 which we compute the spectrum with two additional adjustable parameters. then fitting all six parameters together. Also shown (as thin lines) are the three individual components of this synthesis.::::::::::::~:::. First try to fit the dominant peak by adjusting the amplitude and shift for chlorophyll a.) Do the same for the other two species.6 0.P$13. CP29 contains multiple copies of these pigment molecules. when these two parameters are stored in cells P2 and P3 respectively. amplitude and shift.. (Don't worry: the Visual Basic Editor will not recognize boldfacing even if you try. P$7..Ch..4 0. each properly scaled and shifted. P$ 9. P$8...P$16. There are two additional known pigments in CP29.. violaxanthine and neoxanthine.... it is unrealistic to assume.8 0. ... first fitting single components. with systematic deviations around 24000 and 26000 cm1.. Since these surroundings cause the spectral shifts.. in a way. as done here. For example...P$18.....7..0 ~=~::::. 1.2..P$15. Moreover. with spectra that resemble shifted spectra of xanthophyll.. You are now refining the last 6 of a system with a total of 6 + 3 x 18 = 60 adjustable parameters! (18) The result. . P$lL P$12..2: The observed spectrum of CP29 (open circles) and that for the combination of its three major pigments. 0. P$10..~~~::::::::. the fit is quite good.. (17) Now play with the three amplitudes and the three shifts... P$ 6..7...P$21).0 . where the differences are boldfaced.P$17. P$5. shows an imperfect fit... scaled and shifted in order to obtain an optimal fit (solid line). call Solver and repeat the same approach. that all pigments will exhibit the same wavenumber shifts.:=J 20000 I 000 22000 24000 26000 2000 Fig. 4.2 0. (16) Make a column where you calculate the algebraic sum of the three lastmade columns.. we would use in the new column for Chlorophyll a the command =s ($B27.. quite encouraging.. and display this in the graph for CP29.. without altering any of the other parameters.:::==. P$2.P$14...
10 to three lines with a common intersection. e. in section 4. Solver may be used to maintain one or more mutual relations between them. cannot be negative. the first to YP = apo + alX. When data are to be fitted to several related curves. then the two sets should be fitted simultaneously. the slopes apl and aql as well as the intercepts apo and aqo will in general be different. in this particular example. Solver has provisions for limiting its search of particular parameter values (in the Solver Parameter dialog box under Sgbject to the Constraints:) or. as in Fig. Instead we will focus on constraints that involve the functions rather than their specific parameters. the first set to YP = apo + aplx. such as (in the example given in Fig. and a]. as a general constraint (in Solver Options with Assume NonNegative). and we can merely give a sense for what can be done. .8.8 Constraints Constraining the parameter space searched by Solver may be needed when it would otherwise. Say that we want to fit two sets of data to straight lines. and enter two small sets of linear functions of x with different intercepts but identical slopes.g.e. with a common value for a]. fit the first and second data sets individually. while section 4. In section 4. i. Exercise 4. for physical reasons. Several types of constraints will be illustrated in this and the next five sections. (2) Using a linear least squares routine.1: (1) Open a spreadsheet. Below we illustrate how to do this. This is readily accomplished by separately fitting the two data sets with linear least squares.8. find a negative answer for a quantity that. Clearly..9 to a curve through two fixed points. and use Solver to minimize this sum by adjusting apo. thereby constraining the fit.192 R. 4. the possibilities are endless.8.. compute the sum of the squares of the residuals for both. Then add a generous amount of noise. de Levie. and plot the results. If there are theoretical or other good reasons to expect the slopes to be the same. and a]. the second to Yq = aqO + alx. the second to Yq = aqo + alx. the second to Yq = aqo + aqlx. in section 4.12 we will fit data to a discontinuous curve. and plot the corresponding lines through these data. 4. fit the first data set to Yp = apO + alx. You can of course add uncertainty estimates for the coefficients found. and in section 4.5 + 1. Below we will fit data to two separate but parallellines. In that case.5 x. agO.1.5 x and Yq = 13.1) YP = 2 + 1. Here we will not explore these in detail. agO. although they will occasionally come up in particular examples.11 we will illustrate how to fit results for chemical kinetics at various temperatures. Advanced Excel for scientific data analysis 4.13 will demonstrate a piecewise fit. (3) Then deposit guess values for apO.
9 Fitting a curve through fixed points We now consider the data of Leary & Messick.553 10.454 12.5 14 15 18 19.1: The spreadsheet for exercise 4.1 93 X Y 6.095 18.46 17.276 13.690 1.744 12.560 23.69 15.244 1. 4. 20 22.8.847 3 4 5 6 7 8 I 12.25 25.0 7 0.0 5 22.~ 193 30 ~~ ~ 20 ~ 10/ 10 ~ 4 6 0 0 2 0 4 6 8 10 0 2 10 aOp = 0.810 15.037 25.1337 alq = 1. Anal.8.5 23 .EI9:E31) copied to: B20:B24 B27:B31 C20:C24.5*A26 B19+F19 $B$13+$B$14*AI9 $D$13+$D$14*A26 $F$13+$F$15*AI9 $F$14+$F$15*A26 SUMXMY2(C19:C31.369 I .0322 aOq = 14.275 Ypq 7.690 1 .1.27 0.448 1 alp = 2.200 6. 153 Yp.2379 ap = 2.622 26.766 19.847 19. 4: Nonlinear least squares A 30 .5 27 3 4 6. C26:C31 D20:D24 D27:D31 E20:E24 E27:E31 Fig. 57 (1985) 956. Chern.300 . displayed in table 3.751122 aq = 13.372 17.12.01747 a 1 = 1.905 14.276 1.413 22.733 14.5 II Y+lloi e 6.722 9.681 9.Ch.810 25.450532 SSR 12.7 13 11. Since in theory the detector response at 0 and 100% ethanol should be 0 and 100% of the measured .095 1 .1. 19 1.777 15.649 7. 180 24.072 noi e 0.355 14.5 8 9 cell: B19 = B26 = C19 = D19 = D26= E19 = E26 = F16 = instruction: 2+1.234 1.1.004 11.5 8 9. 103 8. Yq 5. 4.745 13.733 0.5+1.446 24.5*AI9 13.
Y) therefore requires that the first set be fitted to y\ = Y + a\ (x .. we conclude that. which also shows how these constraints skew the fit. By similar reasoning one might then anticipate a strictly linear rather than a quadratic relationship.194 R. and all coefficienttostandarddeviation ratios a/si > I) these data are best represented by an unconstrained quadratic. one wants to constrain the curve.10 Fitting lines through a common point This is a simple variation on the same theme. Advanced Excel for scientific data analysis area. What if we consider these data for calibration purposes only? If we let the data speak for themselves. Let us first focus on the mechanics of fitting the data of table 3. (2) to verify or falsify a particular model.y). without any theoretical guidance.X).1.. we have Y = ao + a\X or ao = Y .1 shows the result. relegated to the computer). by all three of the criteria (minimal sY' FR5 > 1 and FRI > I. see section 3. or any other unanticipated phenomenon. but Leary & Messick instead elected to fit their data to a quadratic. the above illustrates how this can be done. Fitting several sets of data to lines all intersecting at point (x.g. or (3) to find a representation of the experimental data for subsequent use in. because they illustrate some of the real difficulties involved in data fitting: what fitting function should we use.12. Note that this approach can include as many constraints as the fitting equation allows.X).100). so that the expression for the line becomes y = Y + a\ (x .0) and (100. The question is only answerable within a given context: do we use the experiment (1) to extract model coefficients. they fixed these values. e. 4.e. 4. and what constraints.9. The requirement thaty = 0 for x = 0 leads directly to ao = 0.12. it is interesting to take a second look at these data. but they must have had one in mind in order to justify their constrained quadratic fit. 4.1. But if.12 to y = ao + a\x + a2x2 while this expression is forced to go through the points (0.a\x. This assumes the absence of any offset or baseline drift.9. for some reason. We therefore use Solver to fit the data of table 3. and the thick curve in Fig. . while y = lOO for x = 100 then yields the relation 100 = lOO a\ + 10000 a2 or a2 = (la\)/100. as illustrated in Fig.1 to y = a\x + (la\) x 2/lOO. When we require the line y = ao + a\x to pass through a particular point (X. de Levie. a calibration? Leary & Messick did not specify any specific model. Now that the mechanics of fitting these data are out of the way (i.
5 39.3 73.9 22 . and Propagation to calculate the resulting standard deviation in G2' The small closed circles (including those for x = 0 and x = 100) connected by a thin line show the unconstrained quadratic y = 2.7 31. 4.X). SolverAid was used to find the standard deviation in GJ.7 70.Ch.3 30.000345 x ycalc 6.6 3.7 50.1: A spreadsheet using Solver to find a quadratic through the points (0.3 00 () 10 20 30 40 50 60 70 0 90 l(lll cell: instruction: CIS = $B$12*B18+(1$B$12) copied to: *B18*B181100 C19:C26 B13 = SUMXMY2(A18:A26.5 R= 31. 4: Nonlinear least squares 195 the second to Y2 = Y + a2 (x .3 13.9.100) and closely fitting the data of table 3.C18:C26) B14 = (lB12)/lOO Fig.56 x + 0.12.1 61.0 0.0037 x 2• .4 86.0) and (100. 16 15.6 10(1 0. etc.035 2. the constraint that all lines go through the point (X. 0 0 20 40 60 80 100 a 1 = 0.7 + 0. 1 39. Again.8 49.004 12 y .1.Y) is readily handled by Solver.4 59.4 21.8 02 = 0.
After noise has been added. insert spaces for three slopes Gl through G3. The individual straight line segments start out intersecting at the point x. Then take some values for x. as in Fig. (3) Calculate the corresponding values of y+n. Compare your spreadsheet with that in Fig.X). Forcing the three data sets through a common point of course makes the resulting parameter values interdependent.10. Because of the paucity of data and the large noise amplitudes used. by letting it adjust the three slopes and the two coordinates ofthe intersection point.1.X).1 we have used few points and rather generous noise in order to illustrate the effect of the constraint. 4. the appropriate value of the slope Gi.10. in order to draw the lines across the entire width of the graph. In Fig. These have been printed in italics in Fig. then make columns for x. and y + n.196 R.10. Assume values for the coordinates X and Y of the point of intersection of the three lines. which have been collected in table 4. as can best be seen from their standard deviations. provide space for graphs and labels. so that their covariances may have to be considered in subsequent computations.10. 4.1. points for x = 0 and x = 10 were added. . straight lines through the three noisy data sets. they no longer pass through a common point. (6) Compute the sum of squares of the residuals by comparing the data in the justmade column with those in the column for y+n. then use linear least squares to compute separate. n. y. and compute corresponding values for y as y = Y + Gi (x . and enter guess values for all of these. de Levie. (8) Plot the resulting lines. as well as for the coordinates X and Y. using the guess values and. where n represents noise. and plot these as well as the noisy data.y before noise is added to them. Advanced Excel for scientific data analysis Exercise 4. the lines drawn are of course quite uncertain. 4. in each segment. (2) Split the columns in three parts by inserting two empty rows.10. (7) Call Solver to minimize that sum of squares of the residuals. though of course not included in the analysis. 4. (5) Make a column in which you calculate y = Y + Gi (x .1. (4) Somewhere near the top of the spreadsheet.1: (I) Open a spreadsheet. where Gi assumes different values for the three segments. one could make them disappear from a printout by coloring them white. In the column for n deposit Gaussian noise using the Random Noise Generator.10. As guess values for Gl through G3 you can use the slopes obtained under (3) by linear least squares. Incidentally.
9321 23.9098 12.1.0 .9531 0. 1984 1.4571 577 9 5.1 179 0.0867 0.2129 5. 4: Nonlinear least squares A 197 30 30 20 10 10 0 0 2 4 6 0 8 lilope 10 0 2 4 slope 6 8 10 .80 11.3002 1.60 14.7223 6. Ye 9.02 11.7443 7.0 0.50 14. 1729 26.34 a h e X Y SSR X 0.11 10.59 2.5263 7.0 9.4265 15. 4.Ch.7331 8.8955 12.0 4.4225 3. 1485 6.50 23.80 0.7452 5.Iller Cepl a h c 9.0 5.41 0.1.1.7887 12.35 20.80 14. and (b) forced to go through one common point.2 6.2765 5.0 7.10.1 9 2.8864 23 .0197 7.68 0.9538 3.1.20 17. (a) analyzed separately.7604 Yahe 10.6 Ya.2443 6.4950 12.7543 23. .6902 .1: A spreadsheet for exercise 4.6 8.7482 16.4255 12.4040 0.9349 Fig.6742 12.7331 7.92 12.6904 .5 7.6807 6. 135 7.8839 10.2443 1.5999 19.5493 13.4067 12.6060 13.10) in column B.7208 12.3699 12.2342 1.08 9.8096 4.4878 12.8469 11.80 0. 1984 4. showing three sets of straight lines intersecting in the common point (2.25 12 19.79 3.0 5.5679 0. Yb.10.7735 2.4594 Y 12 7 6 5 4 3 2 IIOili/! Y+lloise 0 4 5 6 7 8 9 10 0. 12 2. and the same with Gaussian noise of zero mean and unit standard deviation in column D.2765 1.3134 16. 1349 13.65 26.5 10.0950 .3418 11.40 13.4914 13.36 13.6998 5. 10.2777 0.3946 12.6858 13.8096 12.2321 19.2460 26.7865 4.2733 12.0 5.9532 11 .2573 26.2774 4.47 0.1353 6.1007 4.
Advanced Excel for scientific data analysis Assumed: Found for three separate lines: ao= al Found for three lines through a common point: 0.4 7 ± 0. Figure 4.5 hrs. (And leave more rows free if you want to extend the calculated curves from t = 0 to t = 0.1: (1) Again open a spreadsheet as usual. taken from page 124 of Y. 4. parameter values. (3) Use Solver and SolverAid to find a solution.1. e. 4. it is convenient to keep empty rows between the data for different temperatures. Whenever two or more curves must be fitted with mutually dependent parameters. the sum of the squares of the residuals.1 = 2.11. so that the graph will show the solution as three unconnected line segments. The data used in Fig. The assignment is to analyze these numbers in terms of the rate expression f = exp(kt) where the rate constant k has the temperature dependence k = a exp(biT). or as three straight lines through a common point. a = b = 1.590 ± 0. column headings.3 4 ± 0.1 illustrates such an analysis using Solver and SolverAid.1 5 Table 4.11. The above approach is quite general.3 2 intercept 1 slope I intercept 2 slope 2 intercept 3 slope 3 X y 12 1 8. de Levie. so that f = exp[at exp(biT)]. Nonlinear Parameter Estimation (Academic Press 1974). Bard.6 7 ± 0. as will be illustrated in the next example. and show the resulting curve.10.0 2 ±0.0 8 ±0.1: The results obtained in Fig. Solver is a convenient tool to use.06 7 2.11 Fitting a set of curves We now consider a set of data representing the progress of a firstorder chemical reaction at various temperatures and times. and enter the labels..g. analyzed as three independent straight lines.8 ± 1. 4 ± 1. and then ask SolverAid to display the correlation coefficients.15 3. The dependent variable f is the fraction of the initial reagent remaining after a reaction time t at an absolute temperature T.1 5 bo = 11.19 1 ± 0.1 2 10 9.0.b±0. For subsequent plotting of the calculated curves.8 0.6 5.23 9. the corresponding uncertainties can then be obtained from SolverAid.11.1 are clearly hypothetical. 4.10 0.11.09 3 2.) Plot the data.198 R. and there are only two data points at 3000 K that are significantly different . (2) Computefasf= exp[at exp(b/1)] based on initial guess values for a and b. Exercise 4.9 9 = .10.8 2.1 shows what you will get when you start with.11 ± 0.2 bl = Co = CI 0. made up by a nonexperimentalist: the data at 1000 K show virtually no reaction. and data. Figure 4. Also calculate SSR.
Yet. Qptions ~ Assume NonNegative would again have avoided the problem. 4. again.g. that we were lucky that a = 1 and b = 1 led to a plausible answer. As can be seen from the correlation coefficient r12 of about 0. In this case the calculated values of yare all very close to 1. a = 10 and b = 10. if you have no clue what to use for plausible starting guesses for the data of Fig. Moreover. assemble estimates for most or all of the parameters that way. the sum of the squares of the residuals. 4: Nonlinear least squares 199 from zero. you can find the correct answer by using Solyer with Options ~ Assume NonNegative. In this particular example.1.Ch. In this case. Solver finds a plausible solution. so that SSR. If not. making it impossible for Solver to minimize it. accept that result. the initial iteration step differs from subsequent ones. use a fraction of the data for which the analysis is simpler.11. This trick sometimes works: if you don't trust the answer Solver provides. which leads to a local minimum that is not the global one. with more realistic data the noise in the data should also be smaller. simply engaging Solver once more will get you to the correct answer: apparently. A more interesting case occurs with. e. This trouble is easy to spot by the fact that the initial guess values are not changed. Note. even with such poor data. and again get Solver stuck. It does not always work that way: a = 8000 and b = 8000 might not get you anywhere. then use these as guess values in the final nonlinear fit of the entire data set. in such a case one might try Solver once or twice with wild guess values. This illustrates. that Solver may not always work when the initial guess values are far off. In this case. doesn't change perceptibly. the values for a and b are quite interdependent.98. say with those for 200 OK and 300°K. you might start with separate parts of the data set. even though your initial guess values are now within a factor of 10 of their 'best' values. You fit each of these to f = exp(kt). however. either with Solver or with any of the linear least squares . For instance. which prevents Solver from accepting negative values for any of the adjustable parameters for which you have not specified any particular constraints. then use it as the starting point for a second try with Solver and see whether it gets Solver unstuck. Even so. If you had started with a = 100 and b = 100 you would have run into the same problem but for a different reason: now some of the yvalues are so large that they essentially mask any changes in SSR. on the chance that these just might work.. but occasionally does. and hence the standard deviations for a and b shown in cells Cl3 and C14 respectively. This typically occurs when you know little or nothing about the context of the data. something that should not happen.
5161 4 0.+ .6 0.7 and 33 hrI.983 0.989145 0.317 0.4 _ _ __ _ 0.983761 0.994557 0. at different temperatures T.7 16608 0.06 0..544 0.188977 0.97 407 0.30 0. 137535 0.626 0.566 0. 02 0 .167 0 . de Levie.993 0.0 16 0.20 0. Lines calculated with Solverdetermined parameters. Now fit these two values to the temperature dependence k = a exp(blT).066 30 31 32 33 34 35 300 300 300 300 300 Fig.039806 246. especially since the data at 1000K don't contribute much information.1 11 L _ _ _ _ _ _ _ _ _ _ _0.40 0. 53395 .11.200 R..xp T l oK 100 100 100 100 100 200 200 200 200 200 Il hrs 0. .036646 0.50 0. 15 0.2637 1 0.. and you find a = 810 and b = 960. 10 0..+ .08 0.055335 '' ye.070993 0.:':::::::.05 0. 0.14E+02 b = 9.20 0.. You will find kvalues of about 6.0 +=.10 0.034 0. 4.5 13527 0.979 0.1: The remaining concentration fraction! of a species undergoing a firstorder chemical reaction.225 0..2 12 _ t Ih rs ~ \3 14 15 16 17 a = 8.266446 0.6 1E+02 SSR= 0.36799 0. Advanced Excel for scientific data analysis programs in Excel.. A 2 3 4 5 6 f 0. With these as guess values you are virtually guaranteed to home in on the final answer when you subsequently analyze the complete data set.973082 0.455 0.04 0.10 yealc 0.25 0. as a function of time t..980 0.2385 68.""':"".955 0.5 T= 100 · K 7 8 9 10 0...0 0.
b2. Exercise 4.g. 4. with phase transitions. most of all. bo. x. Here we will merely illustrate the method for a simulated.12 Fitting a discontinuous curve One sometimes encounters experimental data sets that exhibit a discontinuity. As always. the best description starts with the appropriate model describing the phenomenon. Fortunately. e.1. Such a firstorder fit is shown in Fig. and c. 4: Nonlinear least squares 201 How will you know whether you have hit the jackpot? Look at the resulting fit of the calculated model curve to the data points. . (3) Call Solver and let it minimize SSR by adjusting the parameters in CI :C6. which it therefore indicates (while displaying an appropriate warning message) as O.12. and by y = bo + blx + b2x2 for x> c. Note that SolverAid cannot find the uncertainty in c.17 and 4. Also compute the sum of the squares of the residuals. and adjust the parameters in CI:C6 so that the calculated line roughly fits the (in this case simulated) experimental data.20. We can treat its constituent parts as separate segments. and below it the corresponding columns for x. and it is good practice to repeat Solver at least once to see whether it has reached a steady answer. 4. Say that we have a set of data that can be described by y = ao + ajX for x ~ c. (4) Call SolverAid and let it determine the corresponding uncertainties. one of the unknowns can be eliminated since continuity at x = c requires that ao + alc = bo + blc + b2c2. 4. The single equation will contain an IF statement. and add Sn times this noise to the data for y. Check whether a subsequent application of Solver modifies the result: sometimes a second application of Solver does change its results.1. but often the coordinates of the transition are also subject to experimental uncertainty.11. in which case it may be preferable to fit the entire data set. If the curve contains a knee rather than a break.12. remain skeptical: nonlinear least squares do not come with any guarantees. y. And. 4. in which case Solver usually works well. and you should preferably have more and better data points than in Fig. a problem which we will take up again in sections 4. with (at the top of the spreadsheet) the model and adjustable parameters. ao.12. and then use Solver to find their optimum values. Plot the residuals and see whether they show a trend. (2) Plot both Y and Yeale vs.2.5.. discontinuous data set. although in the present example there are too few data points to make this a meaningful criterion.1: (l) Set up a spreadsheet similar to that shown in Fig. as may occur. b]. We now express that behavior in a single equation with six unknowns. and Yeale' Include a column of random noise.Ch. you will typically know what order of magnitude to expect for your data. a].
20 ~. 15 2.12.$C$3+$C$4*A12+$C$5*A12*A12).5 x 0 0.2 5 0. $B$1+$B$2*A12. Cell B12 contains the instruction =IF (A12<$B$ 6. 1 0. 4. while cell Cl2 contains =IF (A12< $C$6.IO ~~ o 2 4 6 8 X 10 Fig.5 924 .3 11 12 13 14 Fig.IO ~~ o 2 4 6 8 X 10 Fig.2: The simulated data (open circles) and an unadjusted.46 11 58 3. 4.1: The top of the spreadsheet of exercise 4. . Advanced Excel for scientific data analysis A 1 2 3 4 5 6 7 8 9 10 aO = al = bO = bJ = h2 = c= SII= SSR= 25 I 0. $B$3+$B$4 *A12 +$B$5*A12*A12) +$B$7*N12. crudely fitted curve through them.849884 2.$C$1+$C$2*A12.12. 20 ~~~ Y 10 o ++~4+~~=~ .1. 15 1.3 4. y 10 o +++++~~~ .5 30 0.12.3: The simulated data (open circles) and the adjusted curve through them. 4.202 R.12.322 129 y calc 2 2.2 y 2.9 0. de Levie.
Plot these data. and subsequently fitting them with the smooth combination of a parabola and a straight line. aj. Keep in mind that the uncertainty estimate of c derives exclusively from its role in defining the slope of the longer straightline segment. andy =bo+ bjx for x>c. A study ofE. Continuity of a fit withy = ao+ajx for x<c. 4.1. Seber & C. which shows ao and aj to be independent of bo. neither the beginning nor the end fits a line or higherorder power series in x through the origin.3 illustrates the resulting graph. with the Smoothed line option in Excel XY graphs) but will here be illustrated with least squares. As long as we are merely looking for a smoothlooking mathematical description of these data in terms of . Gallant et al. heights in inches) of preschool boys in the NorthCentral region of the USA as a function of their age (in months). 4: Nonlinear least squares 203 (5) Figure 4. and age as the independent variable x. (3) Compute SSR using the SUMXMY2 function. (4) There is no known reason to assume that something suddenly happens at x = c to render the slope dy/dx discontinuous at that age. perhaps even in the absence of a theoretical justification for such a piecemeal approach. Eppright et al. Assoc. A smooth appearance usually requires that the segments and their first derivatives are continuous. bj. and c to provide an approximate fit with the experimental data. Stat.13 Piecewise fitting a continuous curve Sometimes it may be desirable to represent data by a smoothlooking curve. Nonlinear Regression. 72 (1977) 523) in terms of two connected sections. first fitting them to two straightline sections with a knee. 68 (1973) 144. 4.12. The experimental observations can also be found in G. published in World Rev. and c. For the sake of the exercise we will do the same here. Am. bj. Plot the resulting curve for Yca/c in the above graph. Nutr. Perhaps more telling is the covariance matrix obtained with SolverAid. 14 (1972) 269 reported the weightlheight ratios (weights in pounds.. Dietet. and adjust the values of ao. as they should be. 461. R. requires that ao+ ajc=bo+b]c so that bo=ao+(ajbj)c. Wild.Ch. with the weighttoheight ratio as the dependent variable y. aj. The resulting curve is shown in Fig.13. (J. as ao+ajx for x<c. This can readily be achieved with splines (e.1: (l) Copy the data into your spreadsheet. Exercise 4. and b2. bj. then call SolverAid to find the associated uncertainties. then compute Ycalc with an instruction containing an IF statement.13. and y = ao+(a]bj)c+ bjx for x> c. call Solver to minimize SSR. (2) Enter labels and values for ao. A. J. and we will therefore use straight lines with arbitrary intercepts. as well as the (incorrect) zeros for the uncertainty in c.g. These data were subsequently analyzed by A. Wiley 1989 p. S.
t .2: The data of Eppright et al. with the constraints that bothy and dy/dx are continuous atx = c..C o 0 .....c '" c .2 shows the result. b2. We therefore calculate Yeale in an IF state2 ment as (b O +a2c ) + (bl2a2c) x +a2~ for x < c.c c ell::! ....4 I+It++~f__+_l o 40 age / month o Fig.. set a2 and c to some plausible guess values....I . Advanced Excel for scientific data analysis relatively few parameters... 4. .2 .+ . a curve without a knee would seem preferable.. fitted with a parabola plus a straight line..) Finally.13.. such as 0.I .. i.ell::! ~ . elIU ...c . (5) Calculate bo=ao+(albl)c...13... de Levie. andy = bo+blx for x> c... (6) Since this exercise already provided reasonable guess values for bo=ao+(al .. al = b l . 1....1: The data of Eppright et al.C . (7) Using the graph displaying y and Yealc as a guide.. and c.. compute SSR. 0.~ elIU . Figure 4... . 0..c '1:1 ..c.i~ .4 . We therefore fit these same data to y=ao+alx+ a2x2 for x < c.e. do this in steps. 4.~ o o o 0 Q. find the corresponding uncertainty estimates with SolverAid..+ _ . otherwise bo+blx.2a2c and ao=bo+(blal) ca2c2=bo+a2c2..c.........c .13.t  j. .b l ) c and blo we now express ao and al in terms of bo and bJ. first adjusting only a2 and c... ... b l.l o 40 age / months o Fig.. fitted with two connected straightline segments. ~ Q Q.t ..c ".....001 and 20.. 1... .204 R. .2 . and call Solver to adjust a2... (In case you encounter trouble. with both the function and its slope continuous at the joint..
1: A spreadsheet for analyzing enzyme kinetics with Solver plus SolverAid. and use these to compute v as a function of s according to (3. the procedure was rather laborious: it required us to determine the appropriate weights.14. in section 4.29 1 0.1 shows it all.14.17 calibration and standard addition.130 0.03 7 SSR: 0. In a second example. . (2) Compute SSR. Below we will show that similar results can be obtained more simply by using Solver.0024 0. Also enter labels and initial values for the two parameters to be determined.390 0.: 0.148 0.014 M: 0.1 or enter the data of table 3.17 we encountered a set of data on enzyme kinetics that. after some rearrangement. Using unweighted least squares we found answers that depended on the rearrangement used. 4: Nonlinear least squares 205 4. In section 4.17.1).14 Enzyme kinetics. whereas properly weighted linear least squares gave consistent results. we will see how Solver can sometimes be used when weighted least squares yield unsatisfactory results.0024 0.220 0.766 1.Dev.493 0.460 0.388 0. and to provide the covariance matrix in case subsequent computations will use both K and V m• Figure 4.0047 0. Still. In section 3.226 0.186 0. Exercise 4.14.324 0.Ch. and in section 4.490 0. and finally to perform an analysis of the propagation of uncertainty.14. We will illustrate how Solver can bypass the need for weighted linear least squares. once more Nonlinear least squares can be used for convenience in order to avoid the complications of weighting and/or error propagation. as in Fig.171 0.0014 copied to C5:C9 C4 = A4*$G$3/($F$3+A4) Fig.2 4 0.138 0.17. then to apply a weighted least squares. without the need for weights or special error propagation.334 0.1: (1) Extend the spreadsheet of exercise 3.0007 0.15. by revisiting enzyme kinetics.1.17 into a new spreadsheet. (3) Call SolverAid to calculate the associated uncertainties.597 SI. 4. could be analyzed with linear least squares as Hanes or LineweaverBurk plots. K and Vm . 4. G Vm 4 5 6 7 8 9 0.16 we will then reconsider linear extrapolation. then call Solver to adjust the values of K and Vm .068 Vallie: 0.690 0.560 0.
.68 0 ± 0.1: (I) Extend the spreadsheet of exercise 3.19.1.19....064..15. There was no problem as long as the signal did not contain much noise.15. What went wrong here? The answer lies in the approximation (3. vm = 0. black line: fitted to the data using weighted least squares.3. .. see Fig.. Figure 4. and c.690 ± 0..... and C of(3. thereby making Solver not only more convenient (because it homes in directly on the quantities of interest) but also slightly superior in terms of its numerical results.1: The results obtained in exercise 3. which is only valid when the relative deviations l1y/y are much smaller than 1. 3..15. Here we will revisit this problem with Solver.. Use these to computey as a function ofx according to (3..06 8 ...2 . 4....1) for the data used in Fig....1).19. Vm = 0... ... b.19... by entering labels and initial values for the parameters a.19 we used a weighted least squares routine to fit data to a Lorentzian. ..2 • t 12 • 18 x 0.14..0571 ± 0.05 c 0.17 are substituted into cells F3 and G3 of the spreadsheet of Fig.19.. 4..3.. gray band: noisefree curve used in generating the data. And when the values for K and V max found earlier with weighted least squares in section 3..' Fig.. de Levie.1) used to determine the appropriate weights.. Exercise 4. 0.. Solid circles: data points.. (2) Calculate SSR.03 7) with those of the weighted least squares method (K = 0..1.206 R.1. sn y = 0.15 The Lorentzian revisited In section 3. repeated from Fig.. 3. b..19. and let Solver minimize SSR by changing the parameters a. but the method failed miserably when the signaltonoise ratio was too small. 4..035) show them to be similar but not identical...2 shows the result.15.6 .059 7 ± 0..15. . a slightly higher value for SSR results.1. Advanced Excel for scientific data analysis Comparison of the above results (K = 0. This is not quite the case in the present example.' . 4.
where V is assumed to have the dominant experimental uncertainties. and SolverAid to determine the corresponding uncertainties.2 ••• • 0.. call Solver..1 we displayed the linear correlation coefficients showing a slope and intercept that were highly correlated. In the experimental determination of absolute zero temperature..1: (I) Extend the spreadsheet of exercise 2. In Fig.. Solver is clearly much more impervious to noise than a weighted least squares analysis. 2. then call SolverAid to get the associated uncertainties. and cannot be used after the fact. Indeed.11..15.. 4.1...05 207 y 0. 2.10. see Fig.1. we bypass this complication.. such as signal averaging or synchronous detection must be used...Ch.1 shows what . 4: Nonlinear least squares 0. the Lorentzian (or any other signal) can be overwhelmed by noise no matter what algorithm is used. use Solver to find the bestfitting values for al and to.. Solver is often more robust than weighted least squares. Enter labels and initial values for the parameters a1 and to.16.Vo = aI(ttO).. (2) Calculate SSR. where Vo = 0 so that ao = altO' We reformulate the problem as Vmode/ = V. Solver will allow us to home in directly on the parameter of interest.. . different experimental methods. Exercise 4.6 ..1. but these must be designed into the experiment. d sn = 0... thus requiring the use of the covariance matrix.10.2: The results of exercise 4. 4. black line: fitted line. and let it minimize SSR by changing al and to. . one wants to know the intercept to of a straight line V = ao + alt with the horizontal axis. Solid circles: data points. Here. By using Solver plus SolverAid.. gray band: noisefree curve used in generating the data.16. In this example. Eventually.2 12 • • 18 x ' L_ _ __ _ _ __ _ __ _ _ __ _ __ Fig.15.. and use these to compute Vas a function of t according to V = a] (tto).16 Linear extrapolation We also briefly revisit the problem of linear extrapolation addressed in section 2. though. In that case.11.. Figure 4. .
68 4.18 4.50 4.08 4.7 72.98 4. with no outlet.97 Vd mL 3.87 12.20 4.38 4.01 4.93 21.1: The spreadsheet computation of Fig.23 4.04 4.09 2. which we will here take as its lowest level.0 Vmodel mL 0. The results are substantially the same as those of Fig.58 4.25 Pw 0.47 4.75 6.40 4. its chances of .80 3. Just imagine a golf ball rolling on a golf course.14 2. 2.19 SSR= 0.77 Fig.83 5. As with any analogy.1.37 4.10 4.55 33.00 7.4 28.6 0.0 60.90 3.50 4.11 4.01364 9. Advanced Excel for scientific data analysis such a spreadsheet computation might look like. the above image is only partially applicable.26 4.61 4.87 3. II 12 13 Pb = 98.1 as modified for use with Solver and SolverAid.88 3.6 43.93 4.8 8. But even that observation reinforces the message: finding a minimum does not guarantee that it is the lowest possible one.033 to = 278.5 50.10.15 5.17 Guarding against false minima The convenience of Solver must be balanced against the possibility that it produces an incorrect answer.0 0. How can we guard against getting stuck in a local minimum rather than in the global minimum? There is no foolproof way to find the global minimum.86 1. after haven fallen on land.2 23. This is the consequence of the method by which the LevenbergMarquardt algorithm finds its way to the lowest value of the minimization criterion. usually finds its way to the ocean.34 19.2 34.79 3. 2.208 R.25 4.0 4.00042 kPa 0.011 V mL 3.86 3.16. 4. de Levie.0 at = 0.63 4. or even a local minimum. such as the Dead Sea.74 temp degC 278. But sometimes the rainwater flows into a mountain lake high above sea level.80 5.11. 4.00 3.9 18. because water can also run into a lake below sea level.50 8.61 0.78 3.91 4. That method is analogous to the flow of rainwater that. and can go no lower.0 61.
In Fig. and how attractive is a false minimum? There is. and the residuals look perfectly random. Nikitas & A. with SSR less than 107 • Figure 4. we select the same height and width parameters for both curves. where the initial guess values for the two peak positions are each given one of. no simple general answer to such a question. by trying out where various initial values lead us.1 illustrates what we find when Solver starts with peak positions that are reasonably close to their final values: the fit is as good as can be expected in the presence of noise. but we can at least use the spreadsheet to give us some idea. Figure 4.2 illustrates what happens when.2 we have generated a false minimum in order to ask the question: how close need we be to get the correct answer.17. That will force the macro to apply . The simplest is to use a regular grid. are minuscule.. This is done by the macro SolverScan. 4: Nonlinear least squares 209 finding a hole by itself. so that they differ only in their peak positions. Say that we have a curve with two wellseparated peaks. We will call the initial guess value for the peak center of the Gaussian G. one each for the guessed peak positions of the Gaussian and the Lorentzian. but we can automate the process. i. when we set the noise parameter Sn equal to zero we find almost perfect agreement. which applies Solver for various values of G and L. the range of parameter values within which we expect the global minimum to occur. we still get a similar result.17. and will try Solver for various values of G and L. varying both G and L. most minima are not so restrictive. The following exercise illustrates the problem. say. In that case we clearly obtain afalse minimum in SSR. and we can often reduce our chances of ending in a local minimum by starting Solver from different initial values that cover most of the likely parameter space. we sneakily interchange the positions of the Gaussian and Lorenzian. modeled after a recent paper by P. It is impractical to perform such a search manually. Fortunately. and that for the Lorentzian L. Indeed. just under the influence of gravity.e. The added noise for Sn = 0. In order to simplify the problem. of course. Thus we will probe a twodimensional parameter space.1 only adds about 10 to the SSR values. And it is not due to the added noise: when we set Sn to zero. We must now assign two initial values to Solver. There are several ways to conduct such a search.17. 10 equidistant values. the other Lorentzian. 4. PappaLouisi in Chromatographia 52 (2000) 477. in assigning the initial guess values of the peak positions for Solver. that we want to fit with Solver. one Gaussian.Ch. with a value for SSR of about 48.
..3748 699 . 10.9 39 50. Instead of using regular steps we can randomize the initially assumed values for the peak positions in such a way that they cover the same parameter space.03581 50.17. 100 x 5 s will consume more than 8 minutes. Except for a very simple calculation performed on a very fast computer.3 illustrates these three options. A I 2 3 5 5 .A)2 + 60C] at A = 700 plus Gaussian noise of zero mean and amplitude sn = 0.061 3. However. The lowercase symbols represent the corresponding parameters as found by Solver. {unction 4 3 2 residuals 5 00 1000 0 200 400 600 00 1000 position a 16 17 300 700 height b \ idtb c sn = 0.542 4. such a scheme often produces a rather uneven coverage of that parameter space for practical (i... Figure 4. fairly small) numbers of trials.1: The top of a spreadsheet containing 1000 data points.3152 4 4 50 50 300. .. Even if Solver takes only 5 seconds on your computer. 4.210 R.1 SSR =... ... we need to apply Solver n2 times as often..6373 Fig. Ifwe insist on an n times higher resolution. which is somewhat less likely to cluster.. showing a Gaussian peak B exp[{xA)2/2C] centered at A = 300 plus a Lorentzian peak 3000B / [(x .e. An intermediate solution is to use a regular grid to divide the parameter space in equalsized cells.. this will quickly become a timeconsuming proposition.17. Advanced Excel for scientific data analysis Solver lOx 10 = 100 times. de Levie. This leads to a socalled pseudorandom search. and then to assign the initial parameter values inside each of those cells with random numbers.1.
because otherwise .' • . . .. results obtained with the second and third option are more difficult to plot.. and the same restriction applies to the macro Mapper.17. 4.. \ \ 10 . " ' .1459 sn = 0... . because Excel can only handle 3D plots of equidistant data... 8 "' 6 4 2 0 •••••••••• •••••••••• 0 • ••••••••• • ••••••••• • ••••••••• 2 4 0 6 8 10 0 . .. . 2 8 ~ 4 c·_·• . ••••• 2 0 4 6 8 10 0 10 Fig.. but obtained with Solver upon assuming the Lorentzian to be centered at x = 300.1418 300..Ch.c.. where i = 0 (1) 9 andj = 0 (1) 9...•... ~.. where RANDO yields a random number between 0 and 1.. the result is probably substantially the same: if the minimum is very narrow........8434 Fig..3: Three different grids for searching an area...59269 41.... with G = i + RANDO. 6 . _.93 3.. Leftmost panel: a regular grid. L = 10*RANDO.. .5..3... ...'. and the Gaussian at x = 700. A 5 .... ... Rightmost panel: G = 10*RANDO.. it will be a hitormiss proposition no matter how we distribute the points. Unfortunately...5.. Before you use SolverScan please make sure that the Excel version on your computer has access to the Solver object library.. .:. 4. 10 8 6 4 2 •••••••••• •••••••••• •••••••••• • ••••••••• •••••••••• 10 ~. ..... Moreover..17. fimctio 4 3 2 residuals o 0 5 0 200 400 600 800 100 0 200 400 600 800 1000 height B width C 50 50 position a height b width c 300 700 4 4 700. 4.56552 56. Therefore.17.1. Central panel: a pseudorandom grid.874 3. SolverScan is set up to use a regular grid.' .2: The same as Fig. L = j + 0.. 4: Nonlinear least squares 211 5. here with points at G = i + 0.. .... .. L = j + RANDO..1 R = 57.... ~ .
The computer will now display the References . for Solver solutions starting from various initial values for the peak centers Xl and X2 respectively. SSR = 10. SSR. use Altv F11 (Mac: OptuF11) to switch to the Visual Basic Editor. de Levie.2 to add Solver to the object library. SolverScan can call all it wants. And Gaussian noise is not the main culprit here: when we repeat the analysis with Sn = 0. and the incorrect. but the overall pattern remains the same. 4. if so. use the procedure described in section 1.7. but Solver won't come). 1000 900 800 700 600 500 400 300 200 100 0 356 1270 58 58 1003 1220 893 891 893 1590 1149 0 58 58 1220 58 58 58 1220 10 10 1149 469 100 58 58 58 58 58 475 469 10 469 469 469 200 58 58 58 58 58 58 475 469 469 469 469 300 58 58 58 58 58 475 475 10 469 469 469 400 1235 694 58 58 58 475 10 10 1157 10 1157 500 694 694 694 694 58 475 10 10 10 10 1412 600 694 1233 694 694 475 475 10 10 10 10 1412 700 694 694 694 58 694 475 10 10 10 10 1412 800 694 1255 1412 58 1270 10 10 10 10 10 1412 900 1255 1281 58 58 58 1272 10 10 10 1281 1275 1000 Fig. until the next time you update your software.4: The values for the sum of the squares of the residuals. for both G and L.e.17.16. 32 combinations that reverse the peak positions as in Fig.4 shows the results for such an exercise using 11 initial positions. 4.xls is listed and has been checkmarked. and less than a quarter find the correct answer. Advanced Excel for scientific data analysis SolverScan cannot call Solver (or. click on Iools => References. To see whether this is the case.212 R. Figure 4. we find 27 combinations that lead to the correct minimum. interchanged result.. Check whether SOLVER. Of these.2. Fortunately you need to do this only once. are both displayed in bold. In other words. The correct result. is important. and 62 initial guesses that yield other. In a tricky situation such as considered here.17. and the axes in bold italics. . before and after Solver. you are OK.VBAProject dialog box. 0 (100) 1000. we clearly cannot start with just any parameter estimates and expect Solver to do the rest! That is also why a graphical display of the fit. more than half of the initial guesses lead to completely unacceptable 'solutions'. for a total of 121 combinations. If not. On the Visual Basic Editor menu bar. the values obtained for SSR are all about ten units smaller. i. rather. SSR = 58. completely false results (that do not even appear to represent local minima) with much higher SSRvalues.
there are situations where it makes no physical sense to assign all the uncertainty to only one variable. and the resulting fit and its residuals should always be displayed and inspected before the answer is accepted. Still. Often that is indeed possible. In chapter 2 we fitted data to the line y = ao + a\x by minimizing the sum of squares of the deviation ~y.~y? / (1 +b2+c2). Here we will merely illustrate the approach for the relatively simple case of a straight line. If we want to minimize instead the sum of squares of the distance di between the points Xi. this approach is not suitable for a linear least squares algorithm.) Similarly. . etc.1.Ch.18 General least squares fit to a straight line So far we have assumed that we can assign one variable to be the dependent one. 4. such as that the fitted parameters must be positive. Treating both x and y as equivalent parameters will in general require a special macro.18. in which case it greatly simplifies the analysis. d i = ~Yi cosa and tana = dy/dx = aJ. However. so that no special macro is needed.~y? (Even though we here fit data to a line. for which the spreadsheet solution using Solver and SolverAid is fairly straightforward. 4: Nonlinear least squares 213 It may be possible to improve the odds slightly.18. the message is clear: nonlinear least squares need reasonably close initial estimates to work reliably. which can be combined with the trigonometric relation cosa = 11"(1 +tan2 a) = 11"(1 +a/) to d/ = ~y?/(1+a\2). we can fit data to a plane y = a + bx + cz by minimizing 1:. Yi and the line y = ao + a\x we use a simple trigonometric argument illustrated in Fig. Fig.y) (solid circle) and the line y = ao + alx.1: The shortest distance d between the point (x.e. 4. In section 4. since the term (1 +a\2) makes the expressions nonlinear in the coefficients ai.d/ = 1:. by applying some constraints.~y//(1+a12) instead of 1:.d? = 1:. Consequently we minimize 1:. i.. 4.19 we will do the same for a semicircle.
in which case you will recover the result for y as the dependent parameter. Compute the sum of these squares of residuals.1 (continued): (3) Insert two new colunms. i. This is illustrated in Fig. al = 0.~y?)]. de Levie. where tl!e lowest points of the graph dominate the fit through a rather extreme choice of individual weights. one each for Wx and wy. 4. but in order to get correct uncertainty estimates with SolverAid. which is equal to SwSR divided by the average value of the terms 1I(1IWyi+ a1z/wx.546 ± 0.1941.ao . And for SolverAid again use the scaled sum sSwSR. an expression symmetrical in x and y that is convenient for the introduction of individual weights. although you can find examples where weighting really makes a difference. Likewise. (4) First verify that you obtain the same result as under (1) and (2) by setting all Wx and Wy equal to l.. al = 0.// z (1 +a1 ). .042l. This is why the spreadsheet in Fig.y/a1)z]). Then make all Wx equal to 1000000.9. and tl!e corresponding instructions to 11 {11[Wyi (Yi . You will obtainy = 5.8617 ± 0. minimize it by using Solver. and finally use the latter to find the associated uncertainties with SolverAid.18.18.0431.214 R. that sum should be scaled through multiplication by (1 +a1 z).5396 ± 0.7612 ± 0.a)xi)] +1I[WXi (Xi + ao/a1 . 11 {1I[(yi . by emphasizing a small subset of the data.e. say. we instead minimize 'ill[lI(wx . the same result as obtained by Pearson. SSR x (1+a1 z) since d Z = I'!. Exercise 4.19 .ao .78 ± 0. (5) Now try some arbitrary individual weights.5659 ± 0. you get the result for X as the dependent parameter.18. Exercise 4.04 z) x.1895. we can minimize 'iwid? = 'iwi~y?/(1+a12) = 'ill[lI(wi&/) +lI(wi~Y?)].3. Often the resulting differences will be small.15 we introduced weights Wi as multipliers of ~y? in the least squares minimization procedure.1: (1) Use the data set from Fig. Also check that for. ao = 5. sSSR (to be used subsequently with SolverAid).Modify the column label from SR to wSR. In section 3.a1xi] + 1I[(Xi + aO/a1 . Make labels and cells for the (initially assumed) parameter values ao and aJ. (2) You will obtain the correct values for ao and a) regardless of whether the sum of the squares of the residuals is properly scaled.). Also make a colunm for the weights 1I(1IWYi + a1z/wx. and we can do likewise in the present context. if the Xi.001. 2.&?) + lI(wy. 'id? = 'i~y? I (1 + b2 + c2) can be rewritten as IId?=lI&?+lI~y?+lI&l. ao = 5. and a colunm in which you calculate the squares of the residuals. Advanced Excel for scientific data analysis The above expression d? = ~y?/(l +a/) can be converted into d? = ~y?/(1+~y?I&?) = ~y? &?/(&?+~y?) or lId?=lI&?+lI~y?. e.(0.g. We will illustrate this more general situation below.2 contains SSR (to be used for Solver) and its scaled version.). If certain data points must be emphasized more than others. 4.y/a1iD for each row.and Yivalues of individual points should be assigned separate weights.1. compute the properly scaled value of SSR.. all Wy =1000 and all Wx = 0.18.
95 2. The above approach is readily extendable to those cases for which y = F(x) can be inverted to an algebraic solution x = G(y).4 1. 12456 0.3 168 I 0. 4.0422 eM: : 0. so that Ax can be evaluated for a given ~y.2) D20=B$ 13+B$ 14*B20 BI5=SUM(C20:C29) BI6=BI5*(1+$B$14"'2) copied to: C21:C29 D21:D29 Fig.00882 0.04235 0.7 2.18.4 5. 7840 al = 0.37 3. 4: Nonlinear least squares A 215 2 8 3 o 6 14 15 16 X 8 aO = 5.:0.24 1.4 4.Ch.29 4._.6186 sSSR= 0.02025 0.3 4. 18029 0.4 RR 0.01036 0.78 5.04697 Ycu/(' 5.38 2.46 2.03600.5 X 0.077 16 0.8 2.00680 0..0 1668 0.1 6.9 5. .00 178 I .5 7.091 13 0.6 3.1897 0.80 4.5456 SSR= 0.802 7 0.8 2.8 2. such as the proportionality y = a\x and the quadratic y = ao + a\x + a2x2.2 6.9 1.75 cell: instruction: C20= 1I((1I(A20$B$13$B$14 *B20)"2) +(1I(B20+$B$13/$B$14A201$B$14»).4 4.5 3.0 0.00680: 0.. 17 18 19 20 21 22 23 24 25 26 27 28 29 Y 5.6 3.98 3.2: A spreadsheet for general unweighted data fitting to a straight line.
00 1 0.0078193 Y 0.0003 0.75 02 7. I 26E{)5 0.51 4. Advanced Excel for scientific data analysis 2 3 4 8 Y 6 4 2 0 0 2 4 6 5 6 7 0 0 0 0 0 8 9 10 II 12 13 14 x 1: ~ 3 .38 1.2 6.9 Yeale 8.216 A R.20 5.3 3 30 300 000 30000 7.0 1 0.5 3. The difficulty of such use does not lie in its mechanics.4 4.0041 0. but rather in the assignment of appropriate weights. ".~.86 6.00758 0.8 2.6 3.50 19 20 21 22 23 5.44 3.0313 JIIy 0.9786 SSR= 6.52947 0.7 2.5857541 S".6E::05 : aO = 8.24 77 0.00505 0. .00046 0.5E04 .01744 75 0. Clearly.0 1937 0.58E06 7. unless transformations add global weights.18. the spreadsheet presents no problem to a weighted general least squares fit to a straight line.5 1..98 6.1744 27 28 29 cell: instruction: E20= 1I«1I(C20*(A20$B$13$B$14 *B20)"2)) +( 11(D20*(B20+$B$13/$B$14A201$B$14) )"2)) F20= 1/«(l/C20)+«$B$14"2)/D20)) G20=$B$13+$B$14*B20 B15=SUM(E20:E29) B16=B15/AVERAGE(F20:F29) copied to: E21:E29 F2l:F29 G21:G29 Fig.4 1.i that greatly emphasize the lowest points. 4. 7419 al = 0. A more detailed discussion of this topic lies beyond our current.5 7.R 6.0000 1 0. weights Wx = Sx2 and Wy = Sy2 may be appropriate for individual data points. de Levie.4 0.07580 0.3: The spreadsheet for a general weighted least squares fit to a straight line for an arbitrarily chosen set of weights Wx.3 4. 1 6.74 7. If the standard deviations Sx and Sy are known.SR= 0.41 17 1 5.5__ J : Z~~ _.0187 0.0539 1 0. necessarily limited purview.0 0.03 0.4 5.00076 0.77 2.9 5.00003 0.4 4.000 1 0.6 3.8 2.003 0.58E05 0.8 2.65 2.801744 75 .5 017 75.00965 HI X 0.30730 0.i and ~v.:§·i' _:Q. 1 10 100 1000 10000 If\' l .
ending at a value of 100.19 General least squares fit to a complex quantity Consider an impedance plot. as a function of frequency. Below these make column labels for 01. in which we make a graph of the imaginary component Z" of the impedance versus its real component Z'. so that it will not do to assign all experimental uncertainty to either the real or the imaginary component of the measured impedance. using the same instrument. as long as the coordinates of both the real and imaginary component are given. Both of these quantities are typically measured at the same time. Z'. we can consider the frequency (especially when derived from a quartz oscillator) as the independent variable.. which yields a semicircular impedance plot. C..19. regardless of the complexity of the equivalent circuit used. Z' the frequency is implicit. and in the next a number that is >/10 larger. Z" cab Z' caic> and SR. where d 2 = (Z' exp . The impedance of the abovementioned circuit is given by Z=R + s 1 =R + Rp(1. say. Exercise 4.) In columns M and N deposit Gaussian noise of zero mean and unit standard deviation.l9. 0. and both Z' and Z" are subject to similar experimental uncertainties. (The same could of course be achieved with two columns.1) 2. in block AI5:BI8. one for log 01 and the other for 01.J{(Z. Z".g.Z' calci + (Z" exp . and SSR. in Cole plots of the dielectric constant.2) l+(mRpC) We simply minimize the square of the distance dbetween the experimental and calculated points. and S"' and in block CI5:DI8 for the constants R" Rp. where log 01 = 2 (0. deposit the labels and values of the constants R" Rp. a total of 41 points. Note that the method is generally applicable. . e.19. while the second column calculates its antilog. Below we will use as our model the electrical circuit formed by resistance Rs in series with the parallel combination of a resistance Rp and a capacitance C./+(Z'. say. in a typical impedance plot of Z" vs.Ch.jmRpC) jmC+lIRp s 1+(mRpC)2 R mR 2 C P (4.1) with the inphase (real) and quadrature (imaginary) components Z'=Rs+ p 2' l+(mRpC) Z"= 2 (4. When we plot the magnitude . C.01.)2} and the phase angle arctan(Z'/Z") as a function of frequency.1: (1) In a new spreadsheet.Z"calci. (2) As the first cell in column A deposit. However. Do this for. Similar graphs occur. 4: Nonlinear least squares 217 4.
before using Solver. closed circles: calculated using the rough estimates for R" Rp. by varying the value of Sn you can see how the parameters found depend on the noise level.1.19.19.94 +00 9.1 and 4.00E02 2. SR = (Z' ~ z'ca.i + (Z" ~ Z"ca/c)2.000 959.2 Z" 2.400 8.23£0 1 3. again based on (4.27 +00 8. using the constants in BlS:BI8 plus the values of OJ in column A.000 1.19. (5) Call Solver and let it minimize Dl8 by adjusting DlS:Dl7. Figures 4.50 Z"calc 6. 4.98E+00 1. These will serve as our standin for experimental data.55 +00 Z'calc .10£· 01 5. de Levie.35E+OO 8.17 +00 9.5 E02 2.98E0 1 9.11 E+OI 1.2) with added noise. which will yield the required values for Rp.1: The top of the spreadsheet of exercise 4. (4) In column F compute the squares of the residuals.19E+OO 7. Advanced Excel for scientific data analysis (3) In columns Band C compute ~Z" and Z' respectively.2) but without noise. and now using the guessed constants listed in DlS:D 17 instead. and C shown.0 10 0. .2 illustrate this approach for a rather noisy data set.08E+0 1 Fig. IOE+O 1 0.07£+00 9.9 ··0 1 1.19.19.12E+01 I. and in cell D18 place the instruction =SUMSQ () to calculate SSR.1 0. R" and C.20 +00 8. For Sn = 0 you will of course recover the parameter values in B 15:B 17 exactly. In columns D and E again calculate ~Z" and Z' respectively. Open circles: simulated noisy ("experimental") data.32E+00 8.08E01 5.25E+00 1. based on (4.78E02 Rs= Rp= C= SSR= Z' 1.37 E02 1.36E01 7.00E02 1. A 6 Z' 4 00 0 0 0 0 C{) 2 0 2 0 2 4 • •• •• •• \ 08 6 10 1 Z' 14 2 Rs= Rp= C= n= IV 1.19.5 I E02 1. Then call SolverAid for estimates of the precision of the found parameters.26E02 1.218 R.I1E+0 1 1.09E+00 RR .
use measurements from T. and to provide some exercises for your selftesting. quite complicated equivalent circuits can be accommodated. will have enough experience and selfconfidence by now to tackle these problems without stepbystep instructions.003 0 .l9. shows only a weak mutual dependency.19. Bates & D.20.07 ± O. Nonlinear . This example shows the flexibility of the above approach. It allows for the introduction of individual weights. For the data shown in Fig. Ph.96 5 ± O.D. my reader.077 and a standard deviation of the fit of 0. Watts. thesis.1 Viscosity vs. with SSR = 3. with many adjustable parameters. 4: Nonlinear least squares 219 6 . z' 4 2 O +~~++++~~+~ 2 ~~ o 2 4 6 8 10 12 Z' l4 Fig. 4.l this results in Rs = 0. in this case. 4. In this way. temperature and pressure As an example of fitting a set of highquality experimental data to an equation with several adjustable parameters. The primary purpose of this section is to illustrate the wide range of problems that can be addressed efficiently by nonlinear least squares. Note that SolverAid can accommodate two adjacent columns for Yea/e.2: The result after Solver has adjusted the parameter values. 4. G.Oh Rp = 10.lo. and also yields the corresponding covariance matrix (not shown) which. You. and C = 0.Ch.098 1 ± 0. The model can be quite complicated.. on the pressure dependence of the kinematic viscosity of a lubricant at four different temperatures.28.20. 4. Technological University Eindhoven 1974. M. by including them in column F. Witt. as long as both its real and imaginary component are measurable. Miscellany In this section are collected a few additional examples of using Solver for data analysis. as listed by D. and SolverAid has determined the associated uncertainties.
1: Comparison of the experimental data (open circles) and those cal 0... j~ ' .. 4 ••• 0. y 10 5 o +~_+~~ p 2 4 6 8 o Fig... 4. Fig... ... in kAtm.20.220 R. 15 . . table Al.20.1) where t is temperature.. and is to be fitted to the empirical expression y=~+a3P+a4p2+a5P3+(a6+a7p2)pexp[ as +t 2] a2 +t a9P (4.1).20..  .1. Wiley 1988.....05 It • • 2 • • • •• • .15 r.2: The residuals of the fit shown in Fig.• 6 • p 0.. Here the dependent variable.. y.20... ..r . p..19.. 4. Advanced Excel for scientific data analysis Regression Analysis and its Applications..8.. R 0. de Levie. 275. 15 '' o 8 culated with Solver (line plus small solid points) by fitting the data to (4. I . ••'.. 4..05 •• . in °C... is the natural logarithm of the measured kinematic viscosity Vkin (in Stokes). . and p is pressure....
04305 M.599 4.589 6. 89 of Bates & Watts.990 5. and no arguments are offered one way or the other.474 6.339 9.634 16. and Va.078 5. Sachs reported data for a potentiometric titration of N.678 4. Va pH Va pH Va pH Va pH 1.ca/c.2.727 6.1. and Vb as adjustable parameters. then SolverAid.145 8.221 15.735 7.859 17.402 2.20.2) Assume that Kal = 1X 106.643 9.280 6. and from these calculate pKal = log Kal and pKa2 = log Ka2 .184 6.777 6.194 7. compare your results with those listed on p. K w .489 5. 4.660 8.Ndimethylaminoethylamine with HCI.1: The titration data from W.220 16.386 8.20.497 8.731 8.686 5. see table 4.070 9.000 17. pH. [H+]. and compare with Figs.2 Potentiometric titration of a diprotic base In Technometrics 18 (1975) 161.837 5.531 9.726 14.229 13. its volume as Vb = 20.830 8.294 7. 4. 4: Nonlinear least squares 221 Use Solver.230 12.calc.959 15.730 12.002 9.045 4. and pH the dependent one.228 5.991 7. we will take that approach here. and the concentration Ca of the titrant.280 11.430 7.559 9.220 Table 4.calc use the theoretical expression Va = [H + 2 2[H+f + [H+]Kal + C +[H+]K /[H+] b ] + [H ]Kal + Kal K a2 Ca [H+]+Kw/[H+] w (4.140 17. HCI. set Kw = lxlO.453 4.695 2.101 17.790 7.14.675 6. Technometrics 18 (1975) 161. Analyze these data in terms of the expression for the titration of a diprotic base with a strong monoprotic acid. Ka2 = l x lO.671 9.820 9.249 5.1 and 4.097 8.195 3. and calculating SSR from the differences between the entries for Va and Va. and its analysis assuming that the volume of titrant added was the independent variable. = a9 = 1.840 11. W. with the initial values al = a2 = . The sample concentration is listed by Sachs as Cb = 0.190 10.578 6. In this case it is not clear which parameter carries more experimental uncertainty.20.090 7. H.739 9..889 6.Ch.170 17.026 10. H.743 3. Sachs. by making columns for Va.656 4.225 14.378 6. as Ca = 0. Plot your results and the corresponding residuals. and find the significant typo in one of their listed constants.725 13. For Va.660 8.410 10.852 4.104 4. using Kah Ka2 .994 6. Because the analysis is much more straightforward when it is assumed that the titrant volume is the dependent variable instead. It is these pKa values .20..09975 M.230 7.10.457 9.20.20.552 4.980 8.000 mL.276 9.720 16.
4. The above fit neglects activity corrections that..4. . the ionic strength.l 4 6 pH 10 Fig. Plot the residuals R = Va .40. 1 of the Sachs paper.3I]} due to C. 4.2 ~lYIJI..5 [(V1) 1(1+V1) .. Advanced Excel for scientific data analysis (rather than the far smaller Kavalues themselves) that Solver should optimize. 20 Va 15 10 5 0 4 6 pH 10 Fig.20. ... W. where the superscript t denotes 'thermodynamic' or (most likely closer to) 'truly constant'. Butterworth 1962) and K!at = Kal 1 f2. de Levie. .3: Open circles: experimental data.. and z its valency.. 0. though approximate in principle.4: The corresponding residuals..3. 4. which will show that systematic rather than random deviations control the standard deviation. K!a2 = Ka1 ..0.. .. R O. 4.20. Davies (Jon Association..09658 M NaCI incorporated in the sample) is given by ... _I ~ _ _ _ _ __ _ _ _ _ _ _ _. You should get a result similar to that shown in Fig.222 R.33 and pKaJ = 9.calc.20.2 . and K!w = f 2 K w. see Fig. 0 0. line: the theoretical progress curve for the titration calculated by Solver with pKaJ = 6... Similar systematic deviations are apparent in Fig. We then estimate the activity corrections with the approximationf= 10"'{0. usually lead to results believed to be more realistic.20. They require that we calculate an auxiliary concentration parameter I = 'l2 ~ :lc. where c is the concentration of any ionic species in solution.lV''''''+_''_f7rcr_U'.Va. The ionic strength (buffered by the presence of 0....
and again Va. pKa! = 5. .Ch. so that the result is only marginally different from that obtained without activity correction.(V1) /(1+V1)]. because we have interchanged the dependent and independent parameters. Then use Solver again. the origines) ofthe systematic deviations in the fit cannot be ascertained. but here we will make the correction separately. Because of the presence of added NaCI in the sample.calc by again using (4.3 1. using a simpler. Plotting the residuals R will show that the systematic deviations illustrated in Fig. 4: Nonlinear least squares 223 Such corrections can be incorporated directly in the Solver procedure.28 (instead of 9. You will find pKa! = 6.55) and pKa2 = 9. iterative procedure described in chapter 6 of my Aqueous AcidBase Equilibria and Titrations. nor would we expect to. changes that hardly justity the additional effort. Add three new columns. but such iterations are obviously not necessary here since they will not lead to substantial further changes. and in the third compute Va. In the first. Oxford University Press 1999. Strictly speaking.3).40). then j. Since insufficient information is available about the titration (such as the chemical purity of the sample used).20. Finally we note that we do not obtain results identical to those of Sachs. compute the ionic strength I using (4.40.cab until the results no longer change. we should now repeat this procedure by using the new pKa estimates to recalculate I. they seldomdo. the ionic strength does not change much during the titration. In the second calculatef2 = 10""[ 0.4 likewise persist after activity correction. and Kw by f2 Kw.20.19.97 (rather than 6.2) in which Ka! is now replaced by Ka! / f2. 4.55 and pKa2 = 9. so that the absence of an activity correction did not cause the systematic deviations in R.
This is roughly what one would expect on the basis of merely rounding the data to integer values.5: The intensities of the variable star (open circles) observed on 600 successive midnights. Whittaker & G. just a standard deviation of 0.york.041 dayI. .20.302.224 R.20.dat. 4.05 respectively. 4.ac. 349352). 1944. Therefore use Solver fit these data to an equation of the form y = ao + al sin(2nji + b l ) + a2 sin(2nji + b2) with initial guess values forfi and/zofO. 4. while the beat pattern indicates that the other must differ from it by about 6 dayI.26.03 and 0. 4th ed. Advanced Excel for scientific data analysis 4. E. Counting maxima one quickly finds that the period of one of these sinusoids must be approximately 24W600 ~ 0. and are shown as open circles in Fig. de Levie.20. suggesting that this was a madeup data set. and rounded to integer values. These data can also be downloaded from http://www.3 Analyzing light/rom a variable star In their book The Calculus of Observations (Blackie & Sons.5 show no discemable trend. R I ~~ o 200 400 t l day 600 Fig.ukldepts/maths/datalts/ts. 40 Y 30 ~~ 20 10 o 200 400 t l day 600 Fig. pp. They exhibit an interference pattern implying at least two sinusoids.5. Robinson list the brightness of a variable star supposedly recorded during 600 successive midnights. plus a constant offset.20.20. 4.6: The residuals for the fit of Fig. and their analysis (line) in terms of a constant plus two sinusoids.
in days: area A.78 2 13. and delete some data points (to represent missing observations on overcast nights) in order to make the analysis more realistic. .4 The growth of a bacterial colony In his book on Elements of Mathematical Biology. Figures 4.york.0416665 ± 0.028' time t. al = 1O.. The analysis used here is both much faster and more precise that that discussed by Whittaker & Robinson (before nonlinear least squares were readily available). together with their uncertainty estimates. Thornton. Fit these data to the equation A = a 1 (b + eCI) where a.084 ± 0.20. 4: Nonlinear least squares 225 Using these data. and b2= 0.24 1 2. Williams & Wilkins 1924. Lotka listed data by Thornton on the area A occupied by a growing colony of Bacillus dendroides in a growth medium containing 0. You may therefore want to add some random noise. in days. By standing on their shoulders. the abovementioned web site contains 74 interesting time series. Ann. h = 1124 ~ 0. 9 (1922) 265.20. listed in http://www. you should find ao = 17.5 5 49.ukldepts/maths/data/ts/welcome. b. A.2: The area A occupied by the bacterial colony at age t.0005 0.03 1 ± 0. and c are adjustable parameters.0000023.2. al = 10..00742 ± 0.00000h.01 8.0417. b = 0.ac.Ii = 1129 ~ 0. 4. The above data lack any indication of a possible experimental source. BioI. as described by Solver and SolverAid with a = 0.53 3 36. 4.37 0 ± 0.20.3 4 47. Give the bestfitting values for a. Incidentally.018.20. a2 = 7.20. b l = 0. Appl. h = 0. c = 1. which yielded ao = 17.086 ± 0. we can now see much further than the giants of the past. G.0345.003 6.0344824 ± 0. b l = 6n 129 ~ 0.6218. and their noise is certainly compatible with a madeup set rounded to integer values.4 Table 4.6 indicate how close a fit you can obtain.7: The area A occupied by the bacterial colony at age t.Ch.0b. For experimental details see H. b. in cm2 : o 0. J.025. and c. and t is age. and b2 = 2n 124 ~ 0.005 0. and plot your results.962 ± 0.6500. 50 40 30 20 10 I I day 0 0 2 4 6 Fig.20.Ii = 0. Dover 1956. in days.5 and 4.6504 ± 0.2616 ± 0. a2 = 7.2% KN0 3• and reproduced here in table 4.
029) x 104 a = 339. Advanced Excel for scientific data analysis Incidentally.20. is illustrated in Fig. 4.999.03 6) x 104 Table 4.904 ± 0.01 6) x 104 a = 439. especially a and b. Here we will sample a few. to fit a variety of test functions: y = a [1 . Misra la.3: Results obtained with Solver (using Automatic Scaling) and SolverAid for four NIST data sets. Also note the strong correlation between a and b.9 ± 2. The function to be fitted is y = a [1 . it is crucial to use its auto scaling (Qptions ~ Use Automatic Scaling ~ OK).3 ± 4. Because a and b have such dissimilar magnitudes.20. 9 ± 2.8.7 has a ten times lower value for SSR than that given by Lotka.2 ± 2.03 8) x 104 a = 640.8 illustrate what you find when using Solver unthinkingly.3. which contains 27 test data sets of varying difficulty. 9 ± 2.066) x 104 NISTcertified values: a = 238.itl. equivalently.y = a {1[1I(1+bxI2l]} in Misra Ib. are fully compatible with the certified values given by NIST. the latter results. However. The first data set.y = a {1[1I(1+2bxi!2]} . adjust the logarithms of the constants.4 ± 3. This is an interesting data set because it can be used. Relevant to the present chapter are the data sets for nonlinear least squares.shtml.073) x 104. so that we either must use autoscaling or. de Levie.9 b = (3.043) x 104 a = 636.502 ± 0. as we did with the acidbase problems in section 4.073) x 104 a = 338'0 ± 3. Within their standard deviations.9 ± 3.7 b = (5.87 9 ± 0.20.2 b = (3. a = 240.3 b = (2. The difficulty in this problem lies in the fact that bla ~ 2 x 106. because all three adjustable parameters are strongly correlated.4 ± 4'7 b = (2. one should consider both sets of numerical results with skepticism.470 ± 0. 3 b = (3.9987. found: Misra la Misra 1b Misra Ie Misra 1d a = 240'2 ± 2'5 b = (5. accessible at www.01 8) x 104 a = 437.5 Using NIST data sets The National Institute for Science and Technology (NIST. The results in block D20:F24 of Fig.067 ± 0. the fit shown in Fig. 4. 4. which leads to the results displayed in block D26:F30 and shown in the graph.066) X 104. the former National Bureau of Standards or NBS) makes available standard data sets that can be used to test statistical software.023 ± 0. which in our solution exhibit a linear correlation coefficient of 0.nist.502 ± 0. gov/div898/strd/nls/nlsjnfo.exp(bx)] in Misra la.20. with the same initial values. 4. a = 238. as evidenced by the value of rab of about 0.470 ± 0.226 R.5 and b = (5.20.08 1 ± 0.7 and b = (5.exp(bx)] with the initial values a = 500 and b = IE4.00 3 ± 0.6 b = (3.
11.21. . Open circles: test data.. 4..000534 7.. using Solver with Autoscaling.3 shows the results of these four tests.S 760 SI.0979 2. Table 4.S56 found ". which can be used to calculate the propagation of imprecision.1 477 55.471925 537 61.4 66. After Solver has done its thing.827 found without A llfos£'ali"g: 23.56£06 66.981 a= 500 77.994 LCC: j.1335 15 0..i=0~99879i: 75. 4.6 29..091994 593 75.. 4 5 80 60 40 20 0 ® ® ® ® ® 6 7 8 9 10 II 12 13 ® ® ® ® ® ® +.13)..20. but it is equally applicable to problems normally solved with linear least squares. Figure 4.826 a= 245. 4: Nonlinear least squares 227 in Misra Ic.690 1..414 L ·: .0 6 \.7 115 141 17.6 14. and their relative advantages and disadvantages..531 SSR= 0. solid points: data as calculated with Solver plus SolverAid. 0.00E04 14.99885 I: 50.2 289 39.642 b= 1.156 a= 240.1 14 0 200 400 600 SOO 15 ~y x y calc i"itial vailles: 16 10. then SolverAid.8 378 44 .21 Summary Solver is designed for use with nonlinear least squares.Ch.8 55.11...9 17.101555 0. or to draw imprecision contours using equation (3.861 :.5 689 30 SI.:!>~9J~~ ~ _____ !! Fig._ .47 0 6.12) or (3. andy = abxl(1+bx) in Misra Id. SolverAid can provide both the standard deviations and the covariances..20..966295 240 29.118 SSR= 0. 1 9. /05481 35 .0 333 44.1 compares the various least squares methods discussed in this book.ith A utoscafillg: 435 50.547 b= 0. A I 2 3 100 . 5/£06 35. 1764 2...130 b= 5.8: The NIST data set Misra la and its analysis..i O~99s85j 40.9 191 23.
... 8 . Consequently Solver can be used to solve nonlinear problems as they might .. 0. 0.. Gray circles denote approximately single precision: even though the results can be displayed to fifteen figures. '" o:! o:! o:! '" ~ '" '" ~ '" t. de Levie. Solver is not tied to the least squares criterion.::$ > 0 .1.....~ '" 00 '" e .:i §" :9 '. (!) ] 0..::$ > . with more robust algorithms. (!) 0 0 (!) '" '" e . .. .::: '" . The methods labeled with an asterisk use the special custom macros provided with this book.~ '" S . .. 4. there are two complementary questions: how appropriate is the model used. binary case of weighting. there are no absolute answers.. '" ~ (!) ..~ I t:i 0 t:i 2 i'" "6b ~ 00 '~ o:! o:! t:i 0 0 0 :E o:! '~ .. using allornone weights of 1 or 0 only..228 R. I   (!) LiuEst Regression Trendline *LS *WLS Solver Solver + *SolverAid Fig..21... as illustrated by SolverScan..S 0 0 iii o:! P. 8 .t:: .~ . o:! '" 0 iii § .. K 0.. Advanced Excel for scientific data analysis As with linear least squares. 'il ~ ~ ~ ~ 0 '" '" '" 0 0. o:! . e. where in this context 'robust' usually means less sensitive to the effects of outliers..~ (!) bh 9..9 t:i ..) Solver can also be incorporated into custom functions and macros since they can call it. Closed circles show availability of listed option.g.. and how good is the fit between the data and that model? Again...9 t:i A §< d ~ (!) . ~ 0 B . usually no more than the first six or seven of these are significant. although looking for systematic trends in a plot of residuals can usually smoke out some inappropriate or incomplete models. 0.. Comparison of the various least squares methods available in Excel. (We can consider outlier rejection an extreme.·S .~ t:i '" 0 '.~ . but can also be used. open circles show lack thereof.~ '<iJ '" '0 d '" g (!) . while the standard deviation of the fit can give some indication of the quality of fit between model and data.g ! > 0 > ~ '<is .~ S e (!) (!) . . :9 :9 to!.
it would have an electric outlet for his desktop computer. Kumosinski. Bates & D. F.22 For further reading An extensive treatment of nonlinear least squares is provided by D. including their original data. it certainly would be Solver. try several different initial guess values. it can provide a quick and hasslefree fit of a set of experimental data to an appropriate model expression. Maximum fhange:. and often yields poor precision. It contains many fine examples of nonlinear curve fitting in the physical sciences. Nonlinear Regression Analysis and its Applications.g. Solver may well be the most useful single routine in Excel. ***** For many users. use SolverS can for a more systematic search of the likely parameter space. Wiley 1988. e. my advice is to stay away from it. If you are uncertain whether a false minimum may have been obtained. if the problem is serious. or fresh batteries for his laptop).. use SolverAid to provide the uncertainties in the calculated parameters. . Additional applications are described in 1. G. Nonlinear Computer Modeling of Chemical and Biochemical Data. if yours truly could take only one least squares program with him to the proverbial uninhabited island (assuming that. Often. This is not to suggest that Solver is the answer to all least squares problems. Academic Press 1996..e. and may even destabilize it. mysteriously.Ch. implicit numerical simulations. and it can sometimes lead us astray (i. M. but that doesn't always help. F. However. Calculations tab. and therefore provides excellent practice material for the present chapter. to a false minimum). Finally. Solver is a complex and rather timeconsuming program that may cause noticeable delays when used repeatedly. You can improve the latter by decreasing the numerical value in Iools => Options. ***** In this chapter we have not used Goal Seek. Russling & T. Since the NewtonRaphson method in Goal Seek doesn't appear to be as carefully crafted as the LevenbergMarquardt method in Solver. However. 4. because it is limited to just one variable. it clearly is not. and has no obvious advantages either. Watts. 4: Nonlinear least squares 229 occur in.
log x.1 Sines and cosines 5 Fourier transformation This chapter deals with the application of Fourier transformation in numerical data analysis rather than in instrumentation. 'frequency' f(in Hz) and 'time' t (in s) are symbols that can just as easily represent another pair of physical variables whose product is dimensionless.fhas the dimension day!. Fourier transformation is a method designed to determine the frequency content of a timedependent signal. timedependent function g(/) can be defined as G(f) = fg(/) e. i. for I measured in days. f stands for wavenumber. If we count time t in seconds.1. if t represents wavelength. In a Fourier transform we operate instead on a whole set of numbers. The Fourier transformation G(j) of a continuous. The methods for dealing with entire data sets are somewhat more involved. as in V3.I ).Chapter 5.e.21l" jjt dt = fg(t) ejwtdt ro +«> +ro +ro ro +«> = Jg(t) cos(2JZft)dt 00 j Jg(t) sin(2 nft) dt 00 +ro +ro = fg(t)cos(mt)dtj fg(t)sin(mt)dt ro (5. in radians per second (rad SI). Likewise. or of a single variable representing such a number. with j = VI and m = 2. such as wavelength or distance (in cm) and wavenumber (in cm..fshould be counted in Hertz. but are perfectly suited to spreadsheets.1) where we have used Euler's rule. Since it is a mathematical operation. such as a spectrum or a transient.rcj. (Because the symbols F and f are commonly used for function and fre . We are all familiar with manipulating functions of a number. e±jx = cos(x) ± j sin(x). where it is often built in. or sin a. f has the dimension of vI. if t is voltage. 'per day'. etc.
3) where N denotes the number of data points.e. The data for time t should either start at t = or. 256. . .. i. the real (inphase) component. i.32. 512. For forward Fourier transformation the first (leftmost) column should contain time t. highlight the data in the three columns (including the third column. the third column can either be left blank or be filled with zeros. 128. etc. Similarly. Efficient.1 .64.. will be written in the three columns immediately to the right of the block of input data. ° In order to initiate the transformation. one point at t = 0. In this chapter we will therefore use two custom macros. Its output.e.1) is N k~ N G(f) = Lg(t) cos(2. in the form of labels.Ch.3) and its inverse. .1. equidistant data points.iLg(t) sin(2.1 data at t < 0.N.e. i.. fastexecuting methods exist for computing (5. When g(t) is a real function.4.1. if g(t) is imaginary. preferably. Unfortunately it has a rather awkward input and output format. N = 2.1.. or 90° outofphase) component of the transformed data G(/). equidistant data is the central subject of this chapter. displaying from left to right the frequency.1Z" k / N) k~ (5. and the imaginary (quadrature.2) In experimental science we often deal with a sampled function.) The corresponding inverse Fourier transformation is then +00 get) = fG(f) 00 e+2trjjt df (5.1. in equidistant intervals. and the remaining 2 n. 2. 1024. They require that the input data be organized in three contiguous columns of2 n data. the result of the forward Fourier transformation. which are much easier to apply. with 2 n. in which case the definition corresponding to (5. 5: Fourier transformation 231 quency respectively. 8.1 data at t> 0. with a finite number of discrete. even if it only contains blanks) and call the custom macro ForwardFT. while the next two columns should hold the real and imaginary components of the function get) respectively. The application of suchfast Fourier transform algorithms to discrete. the second column should be left blank or be filled with zeros. Excel provides a tool for Fourier transformation in its Data Analysis Toolpak. especially when N is an integer power of 2. ForwardFT and InverseFT respectively. where n is a positive (nonzero) integer. and k = 1. 16. be 'centered' around t = 0.1Z" k / N) . we here use the symbols G and g..
(7) Replace the cosine wave in column A by a sine wave of. or 128 data points. and verify that it shows two nonzero points of amplitude al2 atf= ± biN. (8) Replace the signal in column A by the sum of a sine wave and a cosine wave. 5. the other negative. see Fig. Then add another sine or cosine wave. On the other hand. g". and call ForwardFT.1 data at/?:. Exercise 5. (3) In its second column compute a cosine wave such as a cos (2ff b t / N). selfrepeating signal. because sin(x) = sin(x). the sine is an odd function. and the remaining 2n. g'. and likewise produces complex output. Figure 5. Note that the Fourier transform of a sine wave has only imaginary components. G'. Section 5.4 will discuss what happens when this is not the case. freq.) (6) Plot the resulting transform. but tacitly assumes that this is one repeat unit of an infinitely long. The cosine function has the property that cos(x) = cos(x). one positive. It uses the same threecolumn format for input and output. of different amplitudes so that you can more readily identify their transforms. and repeat the process. 32. double the frequency.1.g. Advanced Excel for scientific data analysis The macro InverseFT converts the transformed data back from the frequency domain to the time domain.1. and we will therefore focus briefly on real functions.2. to be described in section 5. Re. de Levie. and is therefore called an even function.. 5. just consider it a mathematical consequence of the Nyquist theorem. 1m. you may have to extend the range from 16 to.1.3. However.232 R. 0. e. or (if you prefer to denote real and imaginary components with I and "respectively) t. say. Exercise 5. with 2nl data at/< 0.e.1.1 illustrates that the Fourier transform of a real. (5) Highlight the data in these three columns.1: (1) Open a spreadsheet. The discrete Fourier transform operates on a limited data set. Re. with the positions of the time and frequency columns interchanged. see Fig. and identify the transforms of each.1 will familiarize you with the operation of these custom macros. and 1m. i. otherwise Tools => Macro => Macros to get the Macro dialog box. f.1. (2) In its first column enter the numbers 8 (1) 7. even function is real. Its frequency scale must be centered around / = 0. whereas . (In case you wonder what might be the physical meaning of a negative frequency. and enter column headings such as time. and G".) The format of ForwardFT and InverseFT is such that you can easily transform data sets back and forth.. 64. with real and/or imaginary components. (Use ET => EorwardFT on the MacroBundle toolbar or the CustomMacros menu. In order to see the effect of a larger number of input signals. so that the graph will show them. The Fourier transform accepts complex input data. experimental data are often real.1. (4) Fill the third column with zeros.
000 I.000 0. A 1 2 3 r2 2 0 0000@OOO0000@OOO I 2 .000 0.000 0.000 0. 4 1. 4 1. This turns out to be true in general for any real function g(t).000 1.000 0. 1250 0.000 0. 8 (I  .765 1.5000 0.1250 0. 5.000 0.000 0.1.5 f 0. In column E the two nonzero points in the transfonn are shown boldface for emphasis. then call custom macro ForwardFT Fig.000 0.1875 0.000 1m 7 6 5 4 3 2 1 0 2 3 4 5 6 7 24 25 26 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.0 2 8 4 0.000 0.414 1.000 0.4375 0.3750 0.4375 Re 0.0000 0.000 0.000 1.000 I.000 0.2 suggests that the Fourier transform of a real.000 0.3125 0.000 0.000 0.5 = 2 h= Re 2.000 0.765 0.765 1.i   I 0 4 • I O@@@@@@@O@O@@@@@@ I 0.3750 0.1: The spreadsheet showing a cosine wave and its Fourier transfonn.000 0. 4 2.000 0.000 0.414 0.000 0.3 125 0.414 0.000 0. 1875 0.1. 5.000 0.000 0.414 I.000 0.000 cell: instruction: copied to: B13 = $B$9*COS(PIO*$D$9* A13/8) B14:B28 FFT: highlight A13:C28.0625 0.000 0.2500 0.000 0. odd function is imaginary.765 0.000 0. 5: Fourier transformation 233 Fig.Ch.000 0. .2500 0. open circles: imaginary components.0625 0.000 0. Solid circles: real components.84 1m lime freq 0.
Advanced Excel for scientific data analysis • • I ~ • • • • 2 1 I I O• •@@@@.6) +00 = Jgeven(t) cos(2 nft) dt 00 j fgOdd(t) sin(2nft)dt 00 so that +00 G'(f) = Jgeven(t) cos (21tft)dt 00 (5.At) Likewise we can write (5.1. .1.8i8 • 8 @@@@ O ®Ooo@ooor oee.5 8 0. • • t .4) Gif) = G'if) .1. 5.7) +00 G"(f) = .8) Equivalent rules apply to the discrete (rather than continuous) Fourier transform.1.1. Since mUltiplying an even and an odd function produces an odd function which yields zero when integrated from 00 to +00. Solid circles: real components. I 2 I 0.1. we have +00 +00 G(f) = Jgreal(t) e2trijtdt = J[geven(t) + gadd(t)] e2trijtdt 00 +00 00 = ngeven(t) + gOdd(t)][cos(2n"jt).5 r 0 / 0.0 Fig.2: The spreadsheet showing a sine wave and its Fourier transfonn.jsin(2n"jt)]dt 00 +00 (5. We can always write a real function as the sum of an even and an odd real function: greaAt) = geven(t) + gad.[gadAt) sin (21tft)dt 00 (5.234 2 R.eoo I 4 ° • • 0 4 . de Levie.5) where the superscripts ' and" denote the real and imaginary components of Gif). open circles: imaginary components.j G" if) (5.1.1) in compact notation as (5.
2.374. 0.Z.. i. However.e. 4/5. 0.Z.1.. as the average of the function values just before and after the change. 0. 4/3. but only the discrete data points. (2) Transform these. 5. but will reconstruct the discrete input data exactly.. extend the second time scale. In cell ]30 then deposit the instruction =E$21+2* (E$22*COS (2*PI () *F$22*D30) +E$23* COS (2*PI () *F$23*D30) + .Ch.050 respectively. 1. and 0. But then.2.. which can be considered as an infinite set of cosines.257. at intervals of 0. starting e. 7r n=l 2n + 1 = 7r (5.255. if we were to use the coefficients of the forward Fourier transform to reconstruct the input data. which therefore has somewhat different coefficients. in cell G30 with t = S. Whatever curve we want to see in those discrete data is our prerogative. we find a curve that indeed passes through all input data.2. and copy this all the way to cell JlSO. and proceeding downwards with increments of 0. with only a limited number of input data and an equally limited number of frequencies.3). and immediately follow this by an inverse transform.1.. with coefficients slightly different from 4/. the square wave is represented by a truncated series. Note that the zerocrossings of the square wave must be entered explicitly. as in Fig.167.. we recover the original input data.1) and by plotting the resulting curve in the same thumbnail sketch.z. (3) This is more readily seen by computing values of equation (5. 4~(lrcos[(2n+l)wt] sqw(wt ) =.g.424.g.0. You can speed up the computation by calculating the term 2*PIO in a separate cell and referring to that numerical value rather than repeating the calculation of 21Z'in every cell.1 till cell G ISO. 5: Fourier transformation 235 5. like seeing images in groupings of unrelated stars and calling them constellations.L.z. Note that the inverse transform only approximates the presumed underlying square wave.273.. Exercise 5.2. In this I6point analysis you will find only four components of the series (5. 1 II} 3 5 7 In a discrete Fourier transformation. we did not enter a square wave.1: (1) ModifY the spreadsheet by replacing the input signal by that of a square wave.1) at intermediate values of t (e. but is a caricature of a square wave. and 417. and 0. +E$28*COS (2*PI () *F$28*D30». (4) For this. .1) 4{cos(wt) cos(3wt) +cos(5wt) cos(7wt) + . When we follow the Fourier transform by its inverse..2 Square waves and pulses We will now use a square wave.IS2 instead of the found values of 1.
0 0.257 0.11) • • f o COCO • I I 2 •••• 3 0.0 0. 1250 0.000 0.3750 0..000 0.3125 0.0 0.0 0.0 2.0 2.0 2. 1250 0.0 2.000 0.4375 Re 0.050 0.5 2 0. Advanced Excel for scientific data analysis .001 and 3.····1 8 0 8 2 3 •• o~ 2 ••••••• II a:n:JnD :~ ••••J I l.000 0.0 0 0 0 0 0 0 0 0 0 0 0 0 1 2 3 4 5 6 7 Fig.0 0.3125 0.0 2.000 0.0 0.999 respectively. (7) Do the same for the input function.000 0.0 0. and paste the data in with Ctrluv.000 0.0 0.0 1m 0.000 0. sharpen its comers by changing.5 to its yvalues. For better visibility.0000 0.000 0.374 0.0 2.050 1m 0. and in the formula box extend its time scale from G13:G28 to G13:G180. (5) Highlight the real data set in the third thumbnail sketch.1: The spreadsheet of a square wave and its Fourier transform. computed also for intermediate.0 0. de Levie.236 A 1 2 3 4 5 6 3 2 I R.2. 1 75 0. If necessary.5 ··· 1 0 I 8 a =2 Re 2 2 2 2 0 2 2 2 2 2 2 2 0 2 2 2 b = 8 time 13 14 15 16 1m 0 0 0 0 0 freq 0.0 2.0 2.9 to t = 4.000 0.5000 0.0625 0. Solid circles: real components.0 0.1875 0.0 2.0 0.1 and 3.0 2.000 0.0 0.0625 0. noninteger values of t.g.000 0. (6) Highlight the column Jl3:Jl80.0 0.000 1.0 2. The drawn line in the leftmost panel.000 0.2500 0.0 2. 5.0 0.374 0.000 0.3750 0.000 0.0 0. offset this new curve by adding 0.0 21 22 23 24 25 26 27 28 7 6 5 4 3 2 I 0 1 2 3 4 5 6 7 6 5 4 3 2 I 0.167 0.000 0. .0 0.167 0.0 0.. t = 4.000 0. and its function reach from H13:H28 to H13:H180.•• 3 o (O)lll I) II .000 0.4375 0. copy it to the clipboard (with Ctr1uc). e.000 0.000 0.·. illustrates that the (boldfaced) coefficients define a curve through all input data but do not trace a square wave.2500 0. highlight the plot area of the third thumbnail sketch.0 2.000 1.257 0.000 lime 8 7 Re 2. open circles: imaginary components.000 0.
2 a o 113!i 13 3111 ! 3))~.4.2 8 o 8 O. 5. Again. Again. the zerocrossings of the pulse should be entered explicitly as the average of the values before and after the change. as in Fig.0 8 0 I 8 O.S f 0.0 f O .. I 0. and display the results.3.1111 III 1IIItel 0.2.2. Exercise 5.2.2.2. The thin line represents the reconstructed curve shifted up by 1 for better visibility. y = 0 for t < 6. the zerocrossings of the pulse should be entered explicitly as the average of the values before and after the change: y = al2 for t = O. and clearly shows the oscillatory nature ofthat reconstruction. the zerocrossings of the pulse should be entered explicitly as the average of the values before and after the change.o ~ ° • 0.2. 5.2 O. Exercise 5.0 '*. y = 3/2 for t = 6. andy = 3eO·8(1+6) for t > 6. as in Fig. and the inverse FFT of the latter.2.2. 5.S 8 o I Fig. 3 0..g. Again.2 0. (2) Transform these. 5. 5: Fourier transformation 237 Exercise 5.4 3 ~ 2 ·1 • 0.5: (1) Shift the exponential function in time to. e. as in Fig.2 o 'lEU? I I 0. (3) Again use the coefficients in the output of the forward FFT to construct the function at intermediate values of t.3: The same for a rectangular pulse shifted in time. 5. apply the inverse transformation.2.Ch.4 l Cf' J 0. The recovered function is also shown as a thin line offset by 1. .2. see Fig. 3 0.6 3 0.S 0. such as y = 0 for t < 0. Exercise 5.3: (1) Move the pulse in time so that it is no longer symmetrical with respect to t = 0.2: (1) Modify the spreadsheet by replacing the input signal by that of a narrow pulse.5.4: (1) Use an exponential function.4 2 • ••••• •• ••• • 0. its FFT.2: A rectangular pulse. 5.2. 0.5 8 0 I 8 Fig. and y = aebl for t > 0.
The recovered function is also shown offset by +1.5 0.2. i.e> CD 0 . i. y = 3e<J.81 for t > 0. Likewise.5: The same for the function y = 0 for t < 6.0 e .2.~Q ~ lJ r\ 1 I t 8 4 "'I 0. its Fourier transform will be complex.e. (4) All the above examples have a real input. (2) When the input signal has mirror symmetry with respect to t = 0. 0 0. de Levie.4 0. 3 0.3 through 5. 5. In that case its Fourier transform is imaginary.2 0.4 0. Moreover. a socalled even input function g(t) = g(t) yields a real Fourier transform.e. see Figs.2 0.2. i.~. When we write get) and G(j) explicitly in terms of their real and imaginary components as g(t) = g'(t) + j g"(t) and G(j) = G'(j) + j G"(j) respectively.0 0.8(t+6) for t > ~6. it is called odd.5 .4 0.5 0. so that g(t) = g(t).1. and consequently their Fourier transforms all have an even real part. y = 3e<J. 5.5 o 4 8 Fig. (3) If the input signal lacks either of the above symmetries. 5.0 O~ ••• . G"(j) = G"(j). 5. we merely started with these because they yield the simplest and most satisfactory transforms.238 3 R. it will have both real and imaginary components.1.. Advanced Excel for scientific data analysis 0.0 0.2 I I 0 I ·0 3 ~. 5. the Fourier transform is real..1.2.. when get) is .2. its FFT. g"(t) = 0.2.4 : '111111 f 0. hence G'(j) = 0.J1!& t 0 I 4 4 0 4 8 4 8 Fig. i.5. 2 " o nUl I 0.2 3 2 ~ •• 0. as for the sine wave in Fig. as illustrated in Figs.1 and 5. and an odd imaginary part. The above examples demonstrate that the input signal is by no means restricted to sines and cosines. G"(j) = 0.e. 5. so that g(t) = g(t).2. these examples illustrate the following aspects of Fourier transformation: (1) For an input signal that is symmetrical with respect to t = 0.2. G'(j) = G'(j). and the inverse FFT of the latter.e.4: The exponential y = 0 for t < 0.0 Of o~ I 4 0 4 t 0..
in Excel usually well within 1 in 10 15). and likewise has only a finite number of frequencies with which to represent it.3. as in (5.5. which only uses a finite sample of the function. 5. in a discrete Fourier transform algorithm there is no way to specify such a function other than through the discrete input points used. it cannot be reconstructed reliably between them. We cannot expect the algorithm to guess what we mean. (5) Forward Fourier transformation followed by inverse Fourier transformation recovers the original data points exactly (within the roundoff errors of the computation.4 will discuss leakage. In the present section we will consider aliasing and the related sampling theorem.2. i. while section 5. 5. it does imply that the Fourier transform of N input data contains information on only (Y2N + I) frequencies.2. although this is not customary.2. its discrete Fourier transformation will only approximate the continuous Fourier transform of that function at intermediate times or frequencies. a conclusion that is related to the sampling theorem of section 5.3. it can only respond to the specific input data provided. Therefore.5.e. which represents the average value of the function.2..1 through 5.2.Ch. underlying function. 5. Because the Fourier transform of a sum get) = g'(t) + j g"(t) is the sum of the Fourier transforms of its components. just as one cannot count on getting a reliable interpolation from a Lagrange polynomial for an arbitrary input function unless the underlying function is a polynomial of the same order. and are without counterparts in . The one 'extra' frequency beyond Y2N is the zero frequency. is that it can even express a discontinuous function in terms of an infinite series of continuous sines and cosines.3 Aliasing and sampling One of the most amazing aspects of the continuous Fourier transform. or a single exponential in Figs. G'(j) is odd and G"(j) is even.2. 5.4 and 5.1). such as a square pulse in Figs. 5: Fourier transformation 239 imaginary. This can be seen clearly in the continuous curves calculated in Figs. and consequently this property does not carry over to the discrete Fourier transform. the entire information content of any Fourier transform can in principle be repackaged in terms of nonnegative frequencies f. Because the input function is only specified at those N points. However. While we may have meant the input data to represent a particular.1 through 5. and the one that delayed its initial publication for many years at the hand of some of Fourier's great French contemporaries. especially when we furnish relatively few points of a continuous function.2. Both are artifacts of the discrete Fourier transformation. f= 0. That is clearly not possible with a finite set of sines and cosines.
we will encounter an uncertainty relationship similar to that of Heisenberg.5.0625.240 R. although in a strictly classical context. 5. Aliasing is easily understood with a series of examples.5 0.1: the Fourier transformation yields the same result as for b = 7.0bo o V 8 4 2 t~ h .5 Fig.OOO.125 instead of f= ±0.l. Advanced Excel for scientific data analysis the continuous transfonn. extend column A with cells A30:A190 containing the times 8 (0.0 f 0. (4) Increment the value of b by I to b = 3. Aliasing results when the signal frequencies fall outside the frequency range covered. 5.2: The function y = 2 cos (7 Jr t / 8) and its Fourier transform. and Fourier transform AI3:C28. Exercise 5.3.1: The functiony = 2 cos (9Jrt /8) and its Fourier transform. and in cells G30:G 190 enter the corresponding values of the cosine wave. which will now exhibit contributions atf= ±0. and Fourier transform the data. de Levie. Observe that the nonzero points in the transform hop over to the nexthigher (absolute) frequency. in section 5.5 Fig.3. Show these data in the graph as a thin line. It works fine up to and including b = 8. (3) Change the value of b from I to 2.3. What happens theL is shown in Fig.1) 8. (2) Leaving row 29 empty. 2 2 • \ • \ 2 8 4 2 o 4 t 0. (5) Continue this for the next few integer values of b. .0 f 0. 5./1 o 4 I 8 0.5 0. Finally. 5.3.3. . bl t at b = 9 you have obviously run out of the frequency scale.3. 2 • 2 n •A I I • \ 2 • 0 \ OD.1.1: (l) Return to (or recreate) the spreadsheet shown in Fig. while leakage occurs with signal frequencies that lie inside that range but fall in between the limited set of frequencies provided by the discrete Fourier transfonnation. such as those of exercise 5.
4 and 5. (7) What we see here is called aliasing.5 Fig.:. That this is so is perhaps even more clearly seen when we select b = 15 or b = 17..5 Fig. 0. 25.0. 5: Fourier transformation 241 (6) Comparison with the earlier result. for b = 7.0 1 2 4 8 o 4 = t 0.1.5: The functiony = 2 cos (17 Jl" t /8) and its Fourier transform. 5.0 2 o 4 t f 0..5 • . which should be compared with Fig. No wonder we get the same transform when the input data are the same. 5. 'I ~I 1 : .3.s Fig. 2 • I 2 • . 5.3.0 f o. The Fourier transform only finds the lowest possible frequency that fit these data.41. The Fourier transform merely sees the input data.2. 5. 2 • • 1 2 8 4 o 4 = f 8 0. . as illustrated in Figs.:...5 0. But it really is a problem of sampling.3: The function y 2 cos (23Jl" t /8) and its Fourier transform.5 f 0. 5.1 0.3.1. The same Fourier transform would be obtained with 16 equally spaced data calculated withy = 2 cos (bJl"t / 8) for b = 23. 39. as illustrated in Fig.Ch.3. as if these higher frequencies masquerade under an alias. shows why: the input data for both cases are identical.5.3. whereas the line indicates what we intend the data to represent. The only difference between the two left panels is in the line drawn through them..3. 5.4: The function y 2 cos (15Jl" t /8) and its Fourier transform. etc. because we give the Fourier transformation insufficient information..
Aliasing results when the signal is undersampled.5 0. 2 I I\ 000 000 I 2 • 8 4 °r~oocfo ' / • 2 1 • ~ • o ~~~~~eoQQO~~~~~ . Another way of looking at it is that. It is. 5. i.4. The assumed infinite periodicity of the sampled signal fragment then fills in the rest.3. 5. now use b = 2. but it needs contributions from the other frequencies to make the fit. it is most readily demonstrated with a sine or cosine wave. 5. The Fourier transform yields the lowest possible of these.1.1 directly. it exhibits discontinuities that can only be represented with higher frequenCIes. Clearly..1: The functiony = 2 cos (2. 50 Hz elsewhere) because many signals are contaminated by that ubiquitous signal.. 4 I 2 0 t 8 0.rt /8) and its Fourier transform.5 Fig. de Levie. It occurs when the signal is amply sampled according to the Nyquist criterion. because it cannot represent the frequency 2.4. The transform shows the dominant frequency as 2. This is illustrated in exercise 5. Exercise 5..4. as it was in Figs.0 f 0. but contains frequencies in between those used in the Fourier transformation.5. It states that.3. In such cases the problem is readily avoided by synchronizing the . Again.4 Leakage Another problem specific to the discrete Fourier transform is leakage.e.1 through 5. it uses the parsimony principle. when the signal is repeated (as is implied in Fourier transformation). therefore.1: (1) On the same spreadsheet. Advanced Excel for scientific data analysis The above examples suggest that we can put an infinity of cosine waves through 16 points. in order to define a periodic signal unambiguously. just as integers cannot represent numbers such as e or Jr.1. as if the frequency 'leaks out' into the adjacent frequencies.1 shows the result. Aliasing often occurs when data are acquired at rates that are not exact (sub)multiples of the power line frequency (60 Hz in the US.1 . we must sample it more that twice per period.4.242 R. where we consider a sine wave with frequency 2. Figure 5. the problem lies with the input data rather than with the Fourier transformation.1. This is where the Nyquist or sampling theorem comes in.
5.1a. then only one of its central cycles. setting its remainder to zero.5 Uncertainty When we hear a sustained note we can recognize its pitch and. In this example the time period is well defined. (3) Apply the Fourier transform. is 00 x 0. A continuous sine wave supposedly lasts indefinitely. 5.3. by using its nearest neighbors to interpolate the removed value. we cannot do this.g.. e. If that same note is sounded only very briefly. as can be seen by Fourier transformation when we select one or a small number of cycles of that sine wave. 5.5. . apparently we need several complete cycles of the sinusoid involved in order to define its pitch.5. and 15 x 0. But as soon as we restrict the sine wave to a finite interval r. but the range of frequencies is not.1: (1) Generate a sine wave and its Fourier transform. it can be filtered out after Fourier transformation by removing the contribution at that specific frequency.5. Avoid generating a break (which would generate spurious high frequencies) by setting the signal to zero up to a point where the sine wave itself is zero.5. it is inversely proportional to the length of the signal burst. 5. its frequency is no longer so precisely defined. The same may have to be done also for its prominent harmonics. i. In this case the product of the time interval (00) and the corresponding frequency spread (0).. we find that it is 5 x 0.e. Once that is done. equal to 11128 for a 128point signal. Exercise 5. within an infinitely narrow spread. 5: Fourier transformation 243 data acquisition rate to that of the power line frequency. better yet. when we just take the width of the frequency peak at its base as a crude measure.0078125 wide in Fig.0078125 is the unit off.2. i.. 5. Figures 5.5.Ch.Ib and 5.5. as in Fig. by setting it to zero or.1a. This suggests a constant product of the time interval and the associated frequency spread.e.5. This is the trick used in nuclear magnetic resonance spectroscopy to generate a narrow range of frequencies to encompass any chemical shifts. identify its name and therefore (at least in principle) its frequency. The number 0. if we have perfect pitch.0078125 wide in Fig. However. and the noise contribution is isolated at those specific frequencies.Ie illustrate what you may get: a broadened set of frequencies centered around the frequency value of the left panel in Fig. leaving first three. (2) Delete most of the sine wave. which is ill defined. and is specified by a precise frequency f.
5 o I 2 t\i i 32 n1~ II . (Note that this property is exploited in pulsed nuclear magnetic resonance spectrometry to excite a narrow range of frequencies of interest by gating a short section of a pure sinusoidal signal.2 0 32 t 64 0. it simply does not have a welldefined pitch. its brevity converted its single frequency into a frequency distribution.5 0. In all these cases the signal is odd so that the real part of the transfonn is zero. top panels) and of two bursts of the same.5 0.0 f 0. Note the changing vertical scales of the righthand panels.0 o 0. For features for which the characteristic widths in time and frequency can be defined in terms of standard deviations.5 Fig.1: The transfonn of a continuous sine wave (a. de Levie.5 0.. t 64 32 .6 a 0.•. It is not a deficiency of our ears that we cannot identify the pitch of a very brief note played on a double bass. • I :~ rI\rc 0.2 0.. lasting only three cycles (b) and one cycle (c) respectively. For the sake of clarity only the imaginary part of the transfonn is shown. the above can be formu . the fuzzier its frequency gets.244 2 R. o 32 t 64 iii 64 •• • T 1/ I 0. even though we used a pure sine wave.) In other words: the shorter the signal lasts.0 1.5.0 _ _ _ _ _ _ __ 1 1.0 o 2 64 2 32 o 32 t 64 2.0 0.0 f 0. 5.0 b f 0. Advanced Excel for scientific data analysis 2.5 2 0 "11 I 2 • . Thus.
Start with. Or. and will yield a numerical value for its standard deviation S. manipulate them there. we have not yet considered the effects of random noise on the input signal. Exercise 5.5. 5: Fourier transformation 245 lated mathematically by stating that the product S rSf of the standard deviation S r of the time interval T over which the signal is observed. such as for t = 64 (1) 63. The above is illustrated in Fig. Qualitatively. and we will transform noisy data into the frequency domain. This will verify that. Below we will illustrate this uncertainty principle with a Gaussian curve. this product should be close to 1. A tuned filter either enhances or rejects signals in an extremely narrow frequency band. i. If . we Fouriertransform the signal. with the result of the Fourier transformation. even simpler: the product of the standard deviation in time t and that in angular frequency OJ = 2 Jl"f is at least 1. and return the modified data to the time domain. equivalent to their analog equivalents: tuned and general. if{/). (5) Compute the product of2. say. (6) Change the value ofs from 10 to. and vice versa. the more tightly it is restricted in the frequency domain. Fourier transformation allows for the ideal tuned filter. cannot be determined to better than 1/(2Jl"). and (b) the Fourier transform of a Gaussian is again a Gaussian. Quantitatively. (4). using externally stored values for A and S. s = 10.9 will deal with differentiation. if{/) = A exp[f2f(2S 2)). and data compression respectively. say. the more the signal extends in the time domain. 5.rr l exp[r/(2i)) where you refer to the value of s as stored in a separate location. and (5). In this and the next few sections we will include noise. If indeed s times Sis 1/(2. and inverse transform the resulting data set. 5 or 3.6 Filtering So far we have dealt with mathematical functions rather than with experimental data.r). 5.2. and repeat steps (2). indeed. 5. Plot the Gaussian input curve. you should find close adherence of the product of the standard deviations sand S to the value of 1/(211:). (4) Now use Solver to match this Gaussian. such as cell B9 in Fig. (2) Apply the Fourier transform.7 through 5.5. Below we will consider filtering.2: (l) Make a spreadsheet in which you compute 2N points of a Gaussian.Ch. set the real and imaginary contributions at that frequency to zero. because it can be as narrow as one single frequency. the function g(t) = (sV2. and again plot the resulting Gaussian peak. If we want to filter out a particular frequency from a signal.. We can distinguish two types of filtering.2. G'{/). interpolation.5. times the standard deviation sf of its frequency f. while sections 5. the Fourier transform G'{/) fits a Gaussian. because (a) its width is expressed in terms of its standard deviation.e.rs and the justfound value of S. (3) In a seventh column calculate a Gaussian in terms of frequency J.
4531 0.4844 . the narrower the tuned filter.0 09 3.4E08 9.3E13 1.4766 1.5E09 6.OE+OO O.01592 21rsS= 1.4375 0.4E191 5.4E20 5 . and transform back.IE161 2.10 1.0 15 g'(t) g"(t) S= 0.3E13 I.3984 0.7E210 5.OE+OO O .OE+OO O.2 illustrating the uncertainty principle.9E08 3.6E179 7.3E13 .13 I. set the contributions at all other frequencies to zero.OE+OO O .4297 0. IEII 9.6E144 6.4688 0.2.1E19 2.0 M _ _ __ o 32 I 64 0.3E23 l.9 173 6.0000 f G'(/) G"(/) 0. Of course.13 0.3E150 2.OE 00 O. 08 5.00 1 : II 0.1.6E11 I. the smaller the margin of error in matching that frequency with the desired signal frequency we want to single out for enhancement or rejection.3E20 5.OE+OO O.4141 0. otherwise we will encounter leakage.0E21 2.2 1 3 0.4 .4922 1.1 09 2. Advanced Excel for scientific data analysis we want to remove all frequencies other than a particular one.4219 0.OE+OO 0.0 +00 1.7E20 6.5E20 5 .3E13 1.8 204 1.BEIO 3.IE.5 s = 10 A= 0.3E139 1.4453 0.4063 0.6E197 3.3EIO 6. .1.7E185 7.246 R.3E13 1.5 f 0.2E13 0.2 13 0.OE+OO O.2 .OE+OO O .01 2 3 4 5 6 0.3E13 J.00781 SRR= 1.OE+OO 0.3906 1.OE+OO O.OE+OO O.1 1 55 .6 19 I.8E217 I.0 +00 O.2E09 l.4 20 1. 5.5.4E19 2.4609 0.167 4.13 3.OE+OO O . A 0.2E133 22 23 24 25 26 27 28 Fig.2E.6· ~ 11' 11 0.I.4E.2: The top of the spreadsheet for exercise 5.9E19 W(/) 15 16 17 18 19 20 21 64 63 62 61 60 59 58 57 56 55 54 53 52 51 50 5.4E13 .3E13 I.3E20 2.lEOB 1.7E20 1. de Levie.5E07 O .5.00 _ _ _ 64 32 .05 0.5000 1. we Fourier transform.OE+OO O.0 08 I.3E13 I.0E20 1.
e. with has a less clearly defined purpose. Academic Press 1982. often based on either trigonometric functions or exponentials.. Greek for narrowness) where O:<:.2) . R. A way to avoid such artificial oscillations is to use a more gradual filter. and often a correspondingly more tentative solution. NI21).5 cos (2~n) J (5. At its extremes (i. W. at n = N12 and n = +NI2). 1.. as are all its derivatives dPcos 2 (. . the von Hann window function IS Wen) = cos 2 ( JrNn) = 0.. Wen) = cos 2S ( :n) = [0. NI22.6. n = N/2. Proc. and we will therefore assume that a sufficiently large number of data points has been collected. typically again applied in the frequency domain (i. Elliott & K. . Much noise will then occur at frequencies higher than those of the signal.6. However.e. 5: Fourier transformation 247 A more complicated problem is that of general filtering. For an extensive listing see F. or n = f when used in the frequency domain. Hamming and is therefore called the Hamming window).Ch. NI2+ 1. and n = t when the windowing is applied in the time domain. J. A crude approach to removing highfrequency noise is to Fourier transform the data. Applications. to set a number of highestfrequency contributions to zero. 1. F. General filtering is essentially a statistical process. Analyses. In its simplest form. For more variable filtering we can extend the von Hann filter with a single adjustable parameter s (for stenosis. IEEE 66 (1978) 51. .s:<:. Harris. Many different window functions have been proposed for that purpose. 00.. cos2 (JrnlN) = cos2 (±m2) = 0. after Fourier transformation) to reduce the highestfrequency contributions to zero.e. Fast Transforms: Algorithms. In our context perhaps the most generally useful window is the cosine window apparently first proposed by Julius von Hann (and sometimes called the Hanning window. while leaving intact most of the frequencies carrying the signal.5 + 0..5 +0.. the sharp transition between the frequencies that are included and those that are excluded can lead to oscillations in the filtered output. 0.rnIN)ldn P..1) where N is the total number of data points used in a centered data set (i. Rao.5 cos (2: n) (5. a confusing name since a different window function was proposed by R. .. NI2+2. or chapter 6 in D.. and to transform the result back into the time domain.
. 21n11N::. 5.248 R.6.0 0. 1.N/22.sand b = 0 for s < 1 is illustrated in Fig. N12+1.3) W(n)=O for a::.6.. .1: The adjustable von Hann window for various values of s as indicated with the curves. The gray curve for s = 1 is the original von Hann filter.1) for a = 0 and b = 1.0 . and is described (again. de Levie. . a and b. which usually contain mostly noise. 1 The Tukey window compresses the gradual cosine function within the region between a and b. NI2+2.1. This filter has two adjustable parameters.1 and Fig. . it can gradually reduce the value of the function and of its derivatives near the extremes of its range. b for b ::.1 illustrate the adjustable von Hann window. whereas it becomes a rectangular window for a = b. 5. such a window predominantly attenuates the highest frequencies.::: 1 and a Tukey filter with a = 1 .2 for various values of s.5 0. . 21n1lN::.5 + 0. In the frequency domain. 1.1. In Fourier transform instruments socalled apodizing window functions are often used that somewhat resemble a raised von Hann filter of the form Wen) ::::: a + (1a) cos(27r nlN) .6.0.6. A combination of the von Hann filter for s . it is identical to the von Hann window (5.. Exercise 5.128 64 o 64 128 Fig..5 cos [~(M_~)] ba N 2 (5. where 0 S a S b S 1. 5. NI21) by Wen) = 1 Wen) = for 0 S 21nliN Sa = cos2[~(M_~)] ba N 2 0. for a centered set with n = NI2. When a von Hann or Tukey window is used to filter (by multiplication) a time sequence. An alternative window function introduced by Tukey uses the two halves of the von Hann filter to remove some of the abruptness of an otherwise sharp cutoff filter. Advanced Excel for scientific data analysis and N12 S n S N12 .6.6.
0. 0.e. Therefore the power spectrum is often plotted only for f ~ 0.5 S a S 1. compute and plot (5. M(/) = M(j).. Both the von Hann and Tukey windows allow for a more gradual filtering action than the simple highfrequency cutoff method. IF (ABS (n) > b *128. and compare your results with Fig. 5.6. as in lsymbolically) =IF (ABS (n) < a *128.3) for various values of a and b. (2) For N = 256 and n = 128 (1) 127. and are usually preferable to a sharp cutoff. Fortunately. the square root of the sum of its real and imaginary components) of the transfonned data as a function of frequency f For a real input signal. 5: Fourier transformation 249 where 0. This is a logarithmic plot of the magnitude M (i.e. Soc. 1. i. although it may be helpful to display the entire (positive and negative) frequency range as a reminder of that symmetry.Ch.6.6. 5.. they are at best shots in the dark.6. Norton & R.e. Opt. 67 (1977) 418. of the need to use the absolute values !If of all odd powers off For our present purpose it is actually easier to plot log M2 = 2 log M .0 0. Still.1.2: A combined von HannTukey window for various values of s as indicated with the curves. 5. N. Beer. 1. Note that this requires two nested IF statements. Compare with Fig.6.2) for various values of s. The simplest way to do this is to Fouriertransfonn the data. Am. compute and plot (5. and to display the corresponding power spectrum. 5*COS (2*1l * (ABS (n) /256a/2) / (ba) ) ) ) . and compare your results with Fig. see R. Exercise 5. as long as they do not take into account any specific infonnation of the data set used.6.2.1: (1) For N = 256 and n = 128 (1) 127.0 fl12N o nl2N Fig.5 0.5+0. the power spectrum has mirror symmetry along! = 0. 60 (1976) 259.6.1.. i. we can let the data themselves guide us in the choice of filter. 5. J.
0. 5. 12.4 0. 004.6. The constants used for the Gaussian peaks were a = 0.3 were created with five Gaussians plus Gaussian noise.2 256 128 o 128 256 Fig.5. but you are of course welcome to use other functions. Exercise 5. 320 respectively. We can readily distinguish the contributions of signal and noise by their different frequency dependences. 15. 5. Note that we have extended the data set with zeros to the nearest integer power of2 in order to facilitate its Fourier transformation.6. 235.3 shows a collection of Gaussian curves with added random noise. 190. 5.2 o 0. 120. and hence in the power spectrum. i. ... extend that data set with zeros at both ends.250 R.05 (open circles). (2) If the curves are all properly contained within the original data set.6.8 0. and Fig. It can also have a frequencydependent power. 5.6 0. b = 10. it has a simpler appearance than the frequency spectrum: as a real (rather than a complex) function it is easier to visualize and to plot. socalled lifnoise. The original data set of 40 1 points was extended to 512 to facilitate Fourier transformation. it is essentially independent of frequency. de Levie.2: (1) First generate a sample data set.6. otherwise.g.4 illustrates the corresponding power spectrum.6. The position of a single feature in the time domain is represented in the frequency domain by a phase shift. as in. the narrower it will be in the frequency domain. As an example. extrapolate the data to taper off smoothly to zero in the added 'wings'. 8. e. Fig. 004. and c = 95.3a: A test function composed of five Gaussian peaks a exp[(xc)2j(2b2)] (thick gray curve) and the same plus Gaussian noise of zero mean and standard deviation 0. and is therefore lost in the power spectrum. Advanced Excel for scientific data analysis Since the power spectrum omits all phase information. In this simple example the noise is 'white'. 0. 10.6. The wider a feature is in the time domain. 004. and therefore shows in the power spectrum as a horizontal band.e. The data shown in Fig.
5: Fourier transformation 0. 5. We can then filter the data by zeroing all frequency components above 0.6.8 0.6.4 251 0. the signal predominates at lffvalues smaller than about 0. S2(f) and N 2(f). Exercise 5. and plot the result. a = 3/32. We can usually estimate S2(f) and N 2(f) only in those regions of the frequency spectrum where they dominate. 5.055.6.3b: Another representation ofthe same noisy data (the points in Fig.Ch.4. Figure 5. Then copy the Fourier transform obtained under (3). Wiener showed that one can use the power spectrum to obtain optimal least squares filtering. the other (at the highfrequency end) in which noise is the determining factor.2 256 128 o 128 256 Fig. Such a plot is shown in Fig. Figure 5.5 illustrates such smoothing.6. where M is the sum of the squares of the real and imaginary components of the Fouriertransformed data. by considering the signal and noise components of M2(f) as the algebraic sum of two smooth functions. where S 2(f) approximates the contribution of the signal to M2(f). However. (3) Fouriertransform the extended set. here drawn as a 'continuous' curve.6.2 (continued): (5) Estimate by eye at what frequency 10 the two regions intersect.3a). b = 1/8) yields an essentially similar result. A Tukey filter with a narrow transition range (e. Also compute and plot the residuals.5 illustrates such results. .2 o 0.g. in this example.6.. and N 2(f) that of the noise.6 0. (6) Inverse Fourier transform these data back into the time domain. (4) Calculate and plot log M or 2 log M.055 before inverse transformation.4) to conclude that.6. At its simplest we can use the power spectrum (see Fig. 5. and often exhibits two distinct regions: one (typically at lower values of jtD in which the signal dominates. 5. while noise is the most important factor at higher frequencies. and in that copy replace all values of the transform by zero whenever JI[ > 10. and calculate the sum of the squares of these residuals.
loworder polynomial expressions for log S2(j) and log N 2(j). then exponentiate these to obtain S2(j) and N 2(j). the sum of squares of the re .3 0.6. we can fit the data in the logarithmic plot (of log M2(j) vs. Advanced Excel for scientific data analysis in the absence of better information.252 R. S2(j) » N 2(j) .j) to obtain. so that S2(j) / [S2(j) + N 2(j)] ~ 1. since the top points should carry more weight than the lower points.g. filtered data. and compare with Fig..6. 5. 1 0.1 0. calculate the sum of the squares of these residuals.4 were guessed in this way. The parameters for the two lines shown in Fig.f.3.6. at the frequencies describing those peaks. The lines drawn are 2 log M(j) = 2 . On the other hand. Wiener filtering tends to affect the peak signal amplitudes only weakly because.6.6. The resulting Wiener filter then is S2(j) / [S2(j) + N 2(j)]. 5.5 Fig. de Levie. Plot the resulting. 5. Wiener showed that use of his filter yields the smallest value of SSR.6. e. the square of the signaltonoise ratio. we will typically extrapolate them to the other regions of the spectrum. 0. then call the inverse transform to convert the data back to the time domain. Because the Wiener method requires only a crude approximation of the signal and noise components in the data set.300/ 2 and 2 log M(j) = 5 respectively.2 (continued): (7) Fit simple polynomials to the two parts of the plot of2 log M(j) vs. As long as the noise is not overwhelming the signal. and compare this with SSR obtained under (6). 5.50 If[ . because there S2(j) / [S2(j) + N 2(j)] « 1. Exercise 5. Do not use unweighted least squares. For a visual estimate use a curve near the top of the data.5 0. but disregarding a few high points which are most likely due to noise. and attenuates the data more strongly the smaller is the value of S2(j) / N 2(j).6. see the legend of Fig.4: A plot of 2 log M(j) as a function of frequency I for the data of Fig.6. contributions from frequency regions where noise predominates are much more strongly attenuated. Also compute and plot the residuals.3 0. (8) Multiply the Fourier transform obtained under (3) with the Wiener function S2(j)/[S2(f)+N2(f)]. 5.
The noise reduction achieved by the Wiener filter is perhaps best illustrated by Fig. and obtain a completely smooth result. with Solver. where we plot the random noise originally added to the Gaussian peaks of Fig. and the noise remaining after filtering. 5. function Lorentzian Gaussian exponential approximation for log S2(f) ao + a1 lf[ ao + a1 lf[ + azl ao + a1 log [(If[+az)l(fmax+azIfl)] O<az« 1 Table 5.2 o 0.Ch. 5.6.3.8 0.7.918. Note the absence of perceptible bias. which makes it 'optimal' in a least squares sense. It does assume that the contributions of signal and noise can be identified separately. and .4 0. 5.6.6. i. Top: the filtered curve.6... the differences between the original.g.2 256 12 o 12 256 Fig. bottom: the residuals. we could fit the data to that form directly.2 128 o 128 256 o 0. The sum of the squares of the 401 residuals is 0. 0. e.5: The result of filtering the data of Fig. requires no a priori information on the nature of the signal. If we knew the functional form of S.e. While the Wiener method removes only part of the noise. 5. 5: Fourier transformation 253 siduals. it is rather general.3 with a simple cutoff filter at Iflfmaxl = 0.1: Useful approximations for log SZ(j) for some prototypical signals s(t). and derives its information directly from the data set to be filtered.6 0.1 lists a few simple expressions for Wiener filters.11 in the frequency domain.2 256 0. noisy data set and the filtered one. Table 5.6.6.
0.4 0.2 128 o 12 256 o 0. Top and middle panels: the filtered data (open circles and thin black line respectively) and the initially assumed.6. and the filtered curve. Advanced Excel for scientific data analysis can be extrapolated validly. bottom panel: the residuals between the unfiltered noisy data.6: The result of filtering the data of Fig.2 o~ 256 0.6.8 0.4 I 0 0. 5. The sum of the squares of the 401 residuals is 0. 5.2 128 o 128 256 0.254 R.3 with the Wiener filter 10"(2 50ltf 300f)/[lOA(2 50ltf 300f) +10"(5)]. de Levie.8 0.2 o 0. Wiener filtering is optimal in a least squares sense.6 0.2 256 0. . noisefree data (gray curve).2 256 .6 0. Provided that the noise can be described as following a single Gaussian distribution and is additive to the signal.128 o 12 256 Fig.688.
.4)2]. ~• '1 1.. i"·. as in Fig.III. triple differentiation through multiplication by (jm)3 = _jm 3....2 255 o 0.. though noise enhancement often makes the onestep approach unadvisable.. and call FFT to generate the transform in columns D through F.. and the associated empty spaces in column C.rjfin the frequency domain. In column H calculate 21l times the corresponding frequency (in column G) times the corresponding imagi . Since the Fourier transform is in general a complex quantity. for x = 16 (1) 15..7 Differentiation In principle. !II. 128 256 o Fig..1.2 256 0.2 .7: The noise originally added to the Gaussian peaks in the model signal used (top panel).'. (3) In column G copy the data from column D.... (2) Highlight the data for x = 16 (1) 15 in column A. ..2 256 128 . separated by one empty row..1: (1) In column A of a new spreadsheet enter x = 16 (1) 15 and. x = 16 (0..9375. Below we illustrate the procedure with three examples. In column C compute the same y for x = 16 (0. The standard deviations are 0. Plot both series...I' ~_. In column B calculate. 5.3(x+0. and transforming the resulting product back to the time domain..051 (top panel) and 0.7 exp[O. with markers and with a line respectively. 5: Fourier transformation 0... . the corresponding values for y = 0. ~~"IIi. where j = ~ 1.Ch.. 0.~·II.7.9375."'b_ tI ~.. 5. the corresponding yvalues in column B.. differentiation is readily performed using Fourier transformation because differentiation with respect to time t in the time domain is equivalent to multiplication by JOJ = 2. etc...022 (bottom panel) respectively.7.0625) 15..) Exercise 5. multiplication by jm yieldsjOJ (a+ j b) = bm +jm a.6. (Double differentiation can be obtained in a single operation through multiplication in the frequency domain by {jOJi = _m 2.0625) 15.. One can therefore differentiate a function by transforming it into the frequency domain.128 o 128 256 o ... and the remaining noise after Wiener filtering (bottom panel).. multiplying it by JOJ. 5... say a +j b.. .
4 0(j) \ 16 8 o 8 x 16 Fig. call 1FT. for x = 16 (0.8 y.0625) 15.256 R.7. 5. Likewise.4) y.7. dy/dt 0.6 (x + 0. 5. (9) Fourier transform the noisy Gaussian. (8) Add Sn times noise from the column made under (7) to your Gaussian curve in order to make it noisy. (5) For x = 16 (0. we examine how it holds up under strain.7. 0.0625) 15.1 as well. rather than have the spreadsheet compute PIO each time.2 0. in order to make a Wiener filter.9375 calculate the derivative dy/dx = 0. Plot these data for positive frequencies. In the first of these.1. It is most efficient to precalculate the value of2" and then refer to its address.1 (continued): (6) Since it is more instructive (and more fun) to illustrate what works than what does not. Exercise 5.7. 5. Now that we have established the principle of differentiation by Fourier transformation. The result of this differentiation is very satisfactory: the fit in Fig.1: The Gaussian y = 0. and plot the resulting real component of column K in Fig. a label for a noise amplitude na. then (to the right of that transform) insert three columns. Advanced Excel for scientific data analysis nary component (in column F).9375.3(x+0.6 0.7.0 0.4 0. such as caused either by noise or by discontinuities.7 exp[{). de Levie. with errors smaller than ±O. and plot these results in Fig. we will here use the second data set. empty columns to generate simple functions (such as a parabola and a horizontal line ) to approximate the contributions of signal and .1 between the derivative computed by Fourier transformation (solid circles) and those calculated algebraically for the Gaussian peak (line) is very good. calculate 2" times f (from column G) times the real component (from column E).l %. and a place for its numerical value.2 0. 5. in column I. (10) Use the two remaining.4iJ (open circles and gray line through them) and its derivative as calculated by Fourier transformation (closed circles) and by calculation (black line). (7) Add a column of Gaussian noise of zero mean and unit standard deviation. (4) Highlight the data in columns G:I. calculate the logarithm of the sum of the squares of the real and imaginary components of the transform.
jfor a simulated. but differentiation of noisy data does not get any better than this. make sure you have enough of them.7.. if you need to differentiate data without an a priori model. You can indeed approximate the Wiener filter in this case by a Tukey filter with a '" 0. Then inverse Fourier transform to get the derivative. then use these parameters to calculate the derivative algebraically.40 l and log NZC!) = 5. in which case you find the fitting parameters with Solver. incorporate in all terms the Wiener filter S2(j) / [S2(j)+N2(j)]. 5. (11) Now perform the crossmultiplication as under (3) but. log S2(j) and log N 2(j) respectively. 5. (12) Now that you know what filter to use.g. The gray lines are drawn with log S2(j) = 2.2: Plot oflog M2(j) vs. The moral: if you want to differentiate a set of data for which you have no good model. 2 4 6 o o o cr. The resulting plot of log M2(j) vs. which indicates how 'narrow' this Wiener filter is. (5.Ch. 5: Fourier transformation 257 noise. 5. with equidistant least squares (ELS.2 + JI[ . to match the Wiener filter. in addition. You can get away with differentiating relatively few data points when you know the precise mathematical formula to which the data can be fitted. The fit is not perfect. f might now look like Fig. because you will then have to rely on statistical methods. get as many data points as possible to define the curve. Use Solver to adjust a and b of the Tukey filter. and efficient filtering requires a sufficient number of data points so that noise can be averaged out.6. just as much as it does to differentiation with least squares. . as in Fig. noisy Gaussian curve.7.O o 0 o 8 8 4 o 4 8 Fig. (13) Follow the same procedure with the smaller data set for x = 16 (1) 15. Noise requires filtering before differentiation.3.023 and b '" 0. see section 3. Even if you use the Wiener filter you found earlier (which is cheating.2. but it is difficult to define S2(j) and N 2(j) with only a few noisy data points) the result is unsatisfactory: you simply have too few data points to pull it off.15). However. which do not work well for small data sets. e. and plot your result.3) with n = f and N = 2fmax. This applies to differentiation by Fourier transformation. you can actually mimic it with a Tukey filter in the frequency domain.043.7.
followed by 255 terms I. in GI6:I47.4a. 5.2 0.4h. .8 y 0. beyond the reach of digital representation. G16:G47.7.05 (filled black circles).2: (1) Generate one cycle of a square wave as follows.2 0. and plot it.7.3(t+OA)2] (thick gray line) and the same with Gaussian noise with zero mean and standard deviation 0. the same quantity multiplied by jOJ. 5. Use Fourier transformation to generate its derivative.7. because there the derivative would be infinite. see Fig. as in Fig. Bottom panel: the first derivative of the noisefree curve (thick gray line) and that of the experimental data (dots) after Wiener filtering. then with an exponential. in cells B17:B31 a minus one (1). In cells B16 and B32 place a zero. I16:I47 vs. and the remaining 255 terms I.2 16 0. then plot the result. de Levie. (4) Repeat the same. Below we will study this first with a square wave. 5. as already done in exercise 5.258 0.4 0.9375. Generate the numbers x = 16 (I) 15 in cells AI6:A47.4 16 8 o 16 Fig. another 0.4 dyldt R.2 0.7 exp[O.3: Top panel: the Gaussian peak y = 0. and use the result in D16:F47 to generate. Exercise 5.6 0.7. and in cells B33 through B47 a one (1). Leave C16:C47 blank. Noise is not the only source of trouble: differentiation can also run into difficulties when the data include one or more discontinuities. but now for x = 16 (0.0 0.0625) 15. (3) Inverse Fourier transform GI6:I47. Advanced Excel for scientific data analysis 8 o 16 0. For y now use one zero.1. 7.0 0. (2) Fourier transform A16:C47.
.. The next exercise illustrates the effect of making the transition less abrupt.. • o 8 16 4. 5: Fourier transformation 4.Ch. In fact.. the derivatives at the discontinuities are the same. 5..7.4a a relatively large number of points is affected by the discontinuities in the middle and at the edges of the range.5 259 a 3 1. as the inserts show. In the larger data set there are just many more unaffected numbers! Because the input consists of discrete.4h. evenlyspaced points.) .5 • 1.7. and affect the same number of adjacent points... (Even the continuous Fourier transform has a problem.. In Fig.5. with (in both cases) a horizontal range from 4 to +4. so far we have made that change over two intervals Ax. known as the Gibbs phenomenon. 5.2 to + 1.4: The derivative of a square wave of (a) 32 points and (b) 512 points. The inserts show the central regions of both plots. 5... whereas this is much less apparent in Fig.5 • o /•••••••••••• 16 .. and a vertical range from 0. at best we can make a change over one interval Ax. However... we cannot really represent a truly sudden parameter change. with a step function.7.5 • 256 12 o 128 256 Fig.5 b 3 1.
I (x13).. Advanced Excel for scientific data analysis 1. . and finally .booooooooooo ..5 OOOOOOOOOOOOOOO • I • • . 2/3.0. Exercise 5. 1 (xI3).. 1 (fifteen times). .. .5 0 R..e 0 . . 0..5. 0.. 00 0 000000000 0 I 1.'.0.ylJ. .e. . .. .5.0.5.260 1. . 0 ••••••••••••• ooooooooooooo 0 OOOOOOOOOOOOO • 1. and repeat the analysis.113. 0. . etc.113.7.. 5. •• •• • ~ 0 CD .5 • . 113.7. 113.5 1.1 (xlI). .. . and slopes t.5: A set of 32point square waves of unit amplitudes. .. .J'. .5 1. then 0.5 o ~8••••••••••• • 1.~. Q ••••••••••••• •OOOOOOOOOOOOOOO 1. and their Fouriertransform derivatives (solid circles connected by straightline segments) illustrating that the severity of the oscillations in the computed derivative decreases as the transition is made more gradual. .. 2/3.. 113. · . . 1 (xlI). 0. by performing the differentiation on the following input sequences: 0. 1/2...3: (1) Use the block AI9:L47 of the previous exercise....5. .. . 0... ..5 OOOOOOOOO 00 16 0 8 16 Fig. . . . .5 1. (2) Modify the input sequence from 0. . I (fifteen times). 2/3.. . de Levie... (open circles). ... 00 m . (3) Now make changes in the other direction. .5 1. . .x = 1... 1 u 000000000000 ~ o ••••. or make a similar new one. to sixteen terms 1 followed by sixteen terms 1. .5 ••• eg 00 ooooooooooo 00 .. .2/3. • 0 0 •i ~.. .
7. Clearly. 0. inside the measuring instrument.25. you need many data points in order to restrict the inevitable oscillations to a relatively narrow range. say as a Gaussian or Lorentzian peak. Lagrange interpolation may be indicated. extend the frequency range with contributions of zero amplitude at the added higher frequencies. the more oscillations we get. the more sudden the transition. 0. 0. 0.5. the data are already available in Fouriertransformed format. 0. nonlinear least squares can of course be used to fit the data to that function.5 illustrates that the differentiation of sharp transitions by Fourier transformation can cause oscillations. Fourier transformation might be considered. If you absolutely must use Fourier transformation to differentiate a small set of data without an appropriate model but with discontinuities and including much noise. Such zero filling results in a larger data set without added information.5. This can only be done by assuming that no significant features of the signal are missing. 0. In interpolation we use the existing data to construct intermediate values for which no direct evidence is available. especially when. In the present context this means that we assume the absence of signals at frequencies higher than those sampled. 0.75. data sets need to be interpolated. 0.25. and is therefore only practical when the data set is sufficiently large so that the oscillations can taper off long before the signal has done so. The procedure therefore is as follows: take the existing data points. consult your almanac or palm reader. in a .8 Interpolation Often. 5: Fourier transfonnation 261 0. especially if that model describes sections of the data set that do not contain discontinuities.25.5 illustrates the results.e. (4) Figure 5.75. if the data can be fitted in terms of sines and cosines.5. and use Fourier transformation to differentiate its parts. 0. Figure 5. 5. Below we will examine how Fourier transformation can be used for data interpolation.7. The message is clear: in order to use Fourier transformation to differentiate a function with one or more discontinuities. And the shorter the data array. fit that model piecemeal.25. If the shape of the function is known. 0.5.75. despite the limited data set. 1 (x9). If the data cannot be described mathematically but can be represented reasonably well in terms of polynomials..0. then transform the data back. If you can describe the function in terms of an appropriate model. 0. the less space there is for these oscillations to die down.Ch. Fourier transform them.75. 0. i. 1 (x9). Likewise.
as in the next example. with a reassuringly small standard deviation even though the answer is off by almost 0.05 from its correct value. which can lead to systematic distortion.262 R.B.8. more readily visible. where 32 data points were taken for x = 16 (1) 15 withy = 0.03125) 7. Plot these together with the original data. (2) Taking the Fourier transform of the 32 data yields a transform for f = 0. Xmax = 0.2 where we have focused on the peak region. where m denotes mass and z valency.g.46875 in the appropriate place in that data set.5 (0.0625) 15.03125) 0. we find Xmax = 0.5 (0. illustrating the dangers of misinterpreting the standard deviation as a measure of accuracy.2.. (3) Make a table withf= 8 (0.e.. such as a quadratic.46875.4i].3502 ± 0. By going to a less symmetrical curve. in mass spectrometry it is common to acquire the fragmentation pattern as equidistant points on an mlz scale. 5. Repeat the yvalue for x = 0.Q3125) 0. 15 data points will have been interpolated between every two original points. then copy the data for f = 0. make up a signal by. see the inset to Fig. and then computing the maximum as Xmax = . By using least squares to fit the top ten interpolated data to a parabola y = ao + a\x + a2x2. Use the fitted parameters to calculate the sought xvalue. If still higher resolution is necessary. therefore. What caused the above failure to obtain a closer estimate of the position of the peak maximum? The interpolation reconstructs the function in terms of the sines and cosines of the original Fourier transform. in terms of either acquisition rate or storage requirements. as in Fig.aA2a2).l: (1) In a new spreadsheet. such as that shown in Fig.3 (x+O.e.0001. e. i. (5) Estimate the position of the maximum by fitting the points around the peak maximum to a loworder polynomial.7 exp[O. (4) Upon inverse transformation you will obtain 16 times more data. The result looks good. the problem is exacerbated and. The question therefore arises: can one reconstruct the position of the peak maximum from a small number of measurements in the peak region? Exercise 5. Advanced Excel for scientific data analysis smooth interpolation. . but it may be impractical to acquire data at such a high resolution.5 at x = +0. For example.96875.8. but it isn't.5. 5. 5. the few data near the maximum can then be fitted to a loworder polynomial in order to find the precise peak maximum. computing a Gaussian peak of which only few data points have been sampled..1. The chemical identity of a fragment can usually be identified unambiguously when the peak maximum can be specified to within 104 mass units. i. 16 times as long.8. For the real and imaginary components enter zeros. and where this function is plotted for x = 16 (0. de Levie.4.
7 exp[0.8.3 (x+O.4 0.1 (continued): (6) Replace the Gaussian test function by the asymmetrical function y = 1/{exp[O.1 (small solid circles) in the region of the peak. 5: Fourier transformation 263 o. The paucity of data only makes the distortion worse.8.8. 5. 5. 5. and Fig.l)]}.8.1: The test function y = 0. while otherwise treating the data in the same way as before. 512 points (drawn curve). and no least squares fitting to a parabola is needed to bring it out.8.6 0.8.e.2: The result of Fourier transform interpolation of the test function of Fig. i. Now the distortion is obvious. Exercise 5.8.4i] plotted for just 32 points (open circles) and with 16 times smaller increments. 5. Fig. Inset: the top ten points of the interpolated data with a linear least squares parabola fitted through them.2 0.l)] + exp[4(x+O.0 CX:X::X::OCXXx:x:x:xP 16 8 o 8 x 16 Fig. Figure 5. . It is clear from Fig.3 shows the function. with Lagrange interpolation it would be a polynomial) poorly fits the interpolated shape. introduces distortion when the tacitly implied basis set (here sinusoids.4 its interpolation. like any other interpolation. 5..Ch.5(x+0. y 0.4 that Fourier transform interpolation.
and the spreadsheet is a convenient tool to visualize its consequences.8. given the functional forms (though not the particular parameter values) of the equations used.264 0.4 0. it is preferable to fit sparse data to that function using nonlinear least squares.6 0.9)] + exp[4(x15.8 R.2 0. In general.5(xlS.3 in the region of the peak. to within its numerical precision of about .9)]} (gray drawn line) of Fig. because this can avoid the aboveillustrated systematic distortion. even if they contain noise. because the interpolated curve will tend to go as closely as possible through those points. Solver can recover the peak position and height exactly (i.4 0.8 0.8.3: The test functiony = 1I{exp[O.8.2 0. You can readily verify that. de Levie. 0.9)]} plotted for just 32 points (open circles) and with 16 times smaller increments (gray drawn curve).16 8 o 8 x 16 Fig.2 8 6 4 Y L 2 o 2 x 4 Fig. Advanced Excel for scientific data analysis Y 0. 5. 5.9)] + exp[4(x15. interpolation is a poor substitute for making more closely spaced measurements. Another problem with interpolation is its extreme sensitivity to noise when only few data points are available.0 00000000 .4: The result of Fourier transform interpolation (small solid circles) of 32 samples (large open circles) of the test functiony = 1I{exp[O. 5. If the functionality involved is known.0 0.6 0..e.5(x15.
the file name. We will here focus on the qualitative aspect. And in some applications. Chern. the Fourier transform may well provide as good a guess as other convenient methods. and the file type (in this . Obviously. At the bottom of its page. and in the resulting Open dialog box specifY where to look for the data file. 5. We will illustrate the method here with spectral data available on the web from the Oregon Medical Laser Center of the Oregon Graduate Institute at omlc. 5: Fourier transformation 265 ±1O14) from the above. feel free to select instead any other fluorescence spectrum. and monitors the resulting fluorescence.ogi. This can give both qualitative and quantitative information on the eluted sample components. because it does not try to fit all data points exactly.e.9. Save the file.9 Data compression Least squares can be used to extract the essential data from. Some of these fluorescence spectra exhibit less finestructure. Fourier transform filtering can often serve this purpose. 49 (1977) 2069. select file => Qpen.edu/spectral PhotochemCADlhtmVindex.1: (1) Go to the above web site. click on Original Data. in Anal. However. assuming that we have a computer 'library' of reference spectra that can be consulted.. sparse but noisefree data sets. as in scanning tunneling microscopy. Likewise we can use Fourier transformation to extract some essential features from an arbitrary signal. A library search can therefore be simplified considerably by compressing the data.Ch. on how to use the spectral information to identify the chemical identity of the eluting sample. below its fluorescence spectrum (assuming it has one). i. Below we will use tryptophan as an example. then highlight and copy those data. where we look specifically for periodic phenomena. Notepad is a convenient intennediary between external data and Excel. a common chromatographic detector uses ultraviolet light to illuminate the effluent. e. and paste the spectral data there. say. (3) Open a spreadsheet. some have more.. and where finer detail is largely illusory anyway. the more details one can measure and compare. even if such compression leads to some distortion. Solver is also much less sensitive to noise. Exercise 5. open Notepad. For example. the more reliable the identification can be. from this or any other source. as described. and select a compound. Fourier transform smoothing is clearly the preferred method. when we don't know the correct functionality. but often contains only a relatively small number of identifiable features. by Yim et al. a noisy but otherwise linear calibration curve.html.g. A fluorescence spectrum can cover a fairly wide spectral range. (2) Leave the web page.
.1'0"'. (10) The small 'hook' at the lowest wavelengths is an artifact resulting from the requirement that the Fouriertransformed signal be a repeatable unit. the number of data points. had the signal levels at the two extremes ofthe wavelength scale been very different. (4) Graph both the fluorescent intensity FI and its logarithm. .00 o . Plot these to make sure that the extrapolated data are indeed continuous with the measured ones. . 5.• • • • '0lIl A nmI I. and by then using the computed intercept and slope to calculate values for 400 < A < 540 nm.2 0..6 0. 0. wavelength shows that. is not an integer power of 2.2 0.6 0. The zerofrequency point is far offscale in this plot.9.266 R. the 10. 380 to 400 nm.4 0.5 nm by fitting the data to a line from. .1 shows that retaining only 30 of the 256 frequencies is sufficient to exhibit the general shape of the fluorescence peak. which can only be avoided with additional effort. and how many more must be kept to show the minor shoulder near 300 nm. The logarithmic representation will be used here because it shows more characteristic features.06 0. inverse transform the data. 0. see Fig. the consequent change would have led to undesirable oscillations. Here we illustrate how to accomplish the latter.g.9. without noticeable loss of information. We therefore extrapolate this linear relationship to A = 535. In this example we were lucky. (8) In a copy. (5) Since these data were not intended for use with Fourier transformation.06 Fig. the nexthigher integer power of 2.M. ~ s:. On the other hand. 256. We now have two options: reducing the data set to the nearest smaller suitable number. .0 0. In the Text Import Wizard specify Delimited.1. .d6. and the spectral data will appear in your spreadsheet. . you will get a sense of how few lowfrequency data are needed to represent the overall shape of the curve.9. set the higherfrequency contributions to zero. at long wavelengths A. . and 40 lowest frequencies.n. 30. . say. .1: The 61 lowfrequency components of the Fourier transform of the tryptophan fluorescence spectrum (for excitation at 270 nm) after its extrapolation to 512 data points. the main fluorescence peak can be represented with fewer than 10 (positive and negative) frequencies. 5. (6) Perusal of the graph of 10g(FI) vs.'C. .Q4 0. 20..4 0. (9) Figure 5. or 'padding' the data to 512.02 0. 441. By repeating this while retaining... it exhibits an essentially linear dependence on A. . and plot the result. Advanced Excelfor scientific data analysis case: Text Files). . .. de Levie. e.• .Q2 0.rr \. and again plot the result. (7) Fourier transform this extended data set. Most of the signal will be concentrated in the few lowest frequencies.04 0.~n~~~m~. .
In the present example. On the other hand. 61 data (the real and imaginary components at the lowest 30 frequencies plus the zerofrequency term) can be used to replace the original 441 data points.5 run (thin black line) and its representation in tenns of only 30 frequencies (broad gray band). But this strongly depends on the number of features to be represented: with more peaks the balance shifts in favor of the Fourier transform method. G(f) must be the complex conjugate ofG(f). and mass spectra. and may therefore be more difficult to automate than Fourier transformation. For cataloguing and data searching. as it does with typical nuclear magnetic resonance. a fluorescence spectrum can be represented by a relatively small number of frequency terms. A further reduction to 31 points can be achieved through symmetry.2: The original data set extrapolated to 535.6 we already encountered examples of spectral fitting. thereby greatly facilitating library search routines for computerbased identification. because the data at negative frequencies can be reconstituted from those at the corresponding positive frequencies: since the input function g(t) is real. Relatively simple spectra can often be described to the same accuracy with far fewer parameters than required for Fourier transformation. 5. In principle. since they are not limited to a basis set of sines and cosines. The choice may also depend on whether the spectral information already exists in Fouriertransformed format inside a measuring instrument. Fourier transformation may need some help if the spectrum does not tend to zero at its extremes. the Fourier transform . and we therefore ask here how the Fourier transform and least squares methods compare. infrared. nonlinear least squares methods are more flexible. In section 4.Ch. Then there are practical constraints: fitting by nonlinear least squares may require personal judgment. By Fourier transformation.9. 5: Fourier transformation 6 267 logFI 5 4 A/ 11m 2 200 300 400 500 600 Fig.
Daniel Bernouilli. Exercise 5. 5. because it expresses all data sets in terms of the same. Laplace. Because the two methods complement each other in many ways. and local currents (as where rivers meet oceans). but you can of course pick another.2001) and an End Data (here: 20010831. ME.S.10 Analysis of the tides Below we will analyze a particular data set to illustrate how one can often combine Fourier transformation and least squares analysis for efficient data fitting of periodic phenomena. tidal records are readily available on the Web from NOS. their combined use can make a very powerful data analysis tool. . Passamaquoddy Bay. Since arbitrarily chosen data sets seldom contain precisely 2n data points. (3) Specify a time interval (we have used W2. Advanced Excel for scientific data analysis method may be the more convenient. but merely consider the tidal record as a signal that should have as its principal frequency components the lunar and solar halfdays. and/or pick a different time period. and take it from there.268 R. we will deliberately take a record that does not fit that restriction. barometric pressure. modulated by the shape (area and depth profile) of the particular body of water and by the cohesive forces that produce drag to water movement. the National Oceanic and Atmospheric Administration. Euler. limited set of fixed frequencies.gov/.nos. but has limited frequency resolution. the National Ocean Service of NOAA. which greatly facilitates their intercomparison. but needs extensive guidance. yielding a 2208hour period). and under Observations select Verified/Historical Water Level Data: U.10.noaa. and we will use one such record. which may lead to leakage. and then select a subset of it whenever we need to use Fourier transformation. for August 31. (2) Select a station. in the example given below we will use 8410140 Eastport. and Kelvin as due to the combined effects of lunar and solar attraction on the earth and its surface water. a Begin Data (here: 20010601 for June 1. hourly heights). Each method has its own strengths and weaknesses: Fourier transformation can show us many simultaneous frequency components.1: (1) Go to the web site coops. We need not look here into its detailed mathematical description. What we experience as tides is the differential effect of the attractive forces on the solid earth and on the more mobile surface water. You are of course welcome to select a record from another location. Fortunately. The tides have been understood quantitatively through the work of such scientific giants as Newton. and Global Coastal Stations. Least squares fitting is more flexible in what it can fit. de Levie. and further modified by wind.
These labels and data will occupy columns A through G. Look in: Windows.) Move to the next Step. Notepad triggers Excel to open its Text Import Wizard. except the column between 35 and 40 that had been labeled Sigma. . 32. Regardless of whether or not you save this COIUm!l. 13.e. Also label the next two columns Hcalc and residuals. highlight them all with . out of an average tidal swing of several meters! (12) Insert two rows at the top. Select Open. Save the file as a Notepad file using any name that suits your fancy. 5: Fourier transformation 269 (4) Take a preview of the data in ViewPlot. (8) You will now see Step 1 of the Text Import Wizard. (9) In the Data preview of Step 2 of the Text Import Wizard. For our Fourier analysis we will take the last 2048 data points.10. (13) Plot the water heights versus time t. say. In our example it is only 0. and 40. Click Finish. minutes (:00).!. 16. In the first column replace the station number (8410140) by a row counter: 0 in the top row. with a variable amplitude that is slightly more pronounced and alternating at its tops than at its bottom values. which you may want to save for the end of the exercise.21.txt file. etc. 12. that would be at row 23. they are tabdelimited. and then specify the row at which to start importing the data. at lines 8. thereby leaving some space near the top of the spreadsheet and avoiding two missing points.Ch. column J. As you will see in the next few steps. (5) Select the data with View Data. extend the highlighted area to include column K. (7) Open Excel. and call the forward Fourier transform macro. enter lines to define the columns you want (in our example. 1 in the next row. and in column I enter the shifted time t . then click Open. or 6 mm. as a function of time.27. You should now have 2048 data in columns I and J.. 35. (6) Start Word.1 illustrates the 2208 data points so imported. select Notepad. and select it. Preview the file to see where the file header (containing all the explanatory text) ends.160. (11) You can also delete the rightmost columns. (10) You will now have all the data in your spreadsheet. first place the instruction =STDEV(F3:F221O) (or whatever appropriate range) at its top to compute the standard deviation of the fit between the observations and the predicted data. Exercise 5. and paste the file into it with Ctrluv. Delete all peripheral columns.exe. hour.006 m. 15. data.. but then you will have more cleanup to do.*) so that you will see the justsaved Notepad.dit ::::} Select All. and Height (after having made sure that these labels are indeed appropriate). month. such as the one containing the year (2001). and use the higher one of these to enter the labels time. It clearly shows a periodic oscillation. a slant (I). Figure 5. in hours. i. which is useful to format the data properly.1 (continued): (14) In row 163 (or wherever you find t = 160) copy the water level in. and copy them to the clipboard with Ctrluc. starting in cell At. Minimize or close the web site.10. in columns. Highlight these. Copy both down to the end of the data file. 19. After their Fourier transformation we calculate and plot the magnitude of the response as a function of frequency. then specify Files of type: as All Files (*. (In our example. in which you specify that the data are of Fixed 1Yidth. You can use fewer columns.
20.08 h. by using the function =AVERAGE(range) to calculate the average we likewise obtain 2. The top panel shows a large contribution.1 = 0.2 at three different vertical scales. and one can therefore assume that there will be still higherorder terms as well.04.10. the "average lowest low water level") at Eastport. and 0. Advanced Excel for scientific data analysis (15) In column 0 calculate the square root of the sum of the squares of the real and imaginary components so obtained.. one at integer multiples of 0. ME. We can either fit these data on a purely empirical basis. and 0.e. or try to identify signals with known astronomical time constants. Indeed. there are lowfrequency components clustered near zero frequency. We therefore fit the data to an adjustable . 5. at 0.991.5. The latter approach.1.991. which introduces independently obtainable information into the data analysis. In addition.32 s or 1/12.36.32. 0. as we did in the above paragraph.16. and two series of minor peaks.0805114 h. and will be pursued here.1• This peak has a rather wide base. a value that roughly corresponds with half a moon day of 24 h 50 min 28.270 R. at zero frequency. 5.1• Neither series has quite died out atf= 0. i.44 h.48 h.4206 h.1: The height of the water (as measured in meters vs.40.12.10. 200l. of value 2. I o 110 220 330 440 550 660 770 880 990 1100 11 00 1210 1320 1430 1540 1650 1760 1870 1980 2090 2200 Fig. The largest peak at a nonzero frequency is found atf= 0. The result is illustrated in Fig.0. is usually the more powerful. as a function of time (in hours) during the period from June 1 through August 3. and plot these versus the frequency (in column L). 0. This component merely reflects the average value of the signal.0806 h.0. at 0. 0. suggesting that it may be broadened by multiple components and/or leakage. which is measured versus a "mean lowest low level" in order to make most data values positive quantities. the other at halfinteger multiples of the same value.28.I .1. 0. de Levie.
shown here as the magnitudes of the resulting frequency components at three different vertical scales.. andPI as zero. We then calculate the residuals.1 0.2 2 0. 0. in one column.1 0. The horizontal scale shows the frequency. in h1.Ch. o 0. etc. Note that the grayfilled circles fall beyond the scales of the lower panels.4 0.3 0. 1.01 0.5 0 r e i 0 . and in another (leaving at least one space inbetween) the fixed parameterft.2 0.00 .2: Results of the Fourier analysis of 2048 data from Fig.3 0.0805114.4 0.2. andfi as 0.1 2 0. to h = ao + al sin(21Z"Jit + PI) where t is time in hours.2 0.10.5 Fig.0 • 0 0. and Pt. i. 5. (18) Also deposit a label and cell for the computation of SSR as =SUMXMY2 (E3 :E221 0. in m.10.F3 :F221 0) or for whatever the appropriate ranges are. For a more compact representation ofthese data see Fig. at.1 0. and Ph and in column G calculate the difference between the measured and calculated water heights. 4 ~ 30 2I 0 0 0 0.5 o 0. 5.0805114 hI. at. Specify aQ. at. 5: Fourier transformation 271 constant ao plus a sine wave of adjustable amplitude al and phase shift PI but with a fixed frequency Ji of 0.4 0.1.3 0.1 (continued): (16) Arrange labels and values for the adjustable parameters aQ.10.3 2 2 0.4 0.2 0. (17) In column F compute the water height healc using the assumed parameters ao. . starting with 0 at the first data point. Exercise 5.e.6. and Fourier transform them in order to find the nextlargest term(s).
. Indeed. and in cell S163 copy the residual from G163.2 0. the corresponding firstorder correction term has a frequency of 0.08 hI.3 0. and plot them.I . which has a period of 27. and is due to the ellipticity of the lunar orbit. the amplitudes and phase shifts) in a single. 0. (25) After you include this frequency and repeat the protocol sketched in points (18) through (20) you will find that there is yet another frequency near 0./i = 0.078999. (20) In cell R163 repeat the count of t .1 (continued): (21) Extend the parameter lists to accommodate a2.. the square root of the sum of squares of the real and imaginary components) of the Fourier transform. 5.1 / (24 x 27.083 h.078999 h. . and in our linear analysis this shows as a difference frequency. place all adjustable coefficients (i. and add a corresponding.l . the gravitational attraction changes.5 Fig. clearly visible in Fig.3.55) = 0. The next mostimportant term. Copy these down to row 2210.55 days. (24) The nexthighest peak in the residual plot is at 0. contiguous column.082024 h. second sine wave to the instructions in column F. As the moon travels from its perigee (at the shortest moonearth distance) to its apogee (furthest away) and back. Fourier transform the residuals. and Pl.0 e 8 .1 0.0805114 + 0.0 7 0.272 R.10.0805114 . (22) Rerun Solver. p].55) = 0.l 0. and call Solver to adjust the nine resulting coefficients ao through a4 and PI through P4.I . one below the other.4 0.1 0 2 e 2 0.e. apply the forward Fourier transformation. and plot these. now simultaneously adjusting the five coefficients ao. in row X calculate the corresponding magnitude (i.e.l . 0.083333 h.2 e 0. Exercise 5. and P2 as well as h = 0. (26) Also incorporate this frequency. and look at the updated plot ofthese residuals.160 that you already used in cell Il63. ai. a2.0805114 h. aI.3: The magnitudes of the residual frequency components.0015124 = 0.079 h. close to the frequency of 2/24 = 0.0805114 + 11 (24 x 27.10. and P2· (23) Rerun the Fourier transform of the residuals. Highlight RI63:T221O.082 hI.0805114 0.I associated with half the solar day.0015124 = 0. after subtracting the average and the leading sinusoidal component at. de Levie. In order to facilitate later use of SolverAid. at about 0.l . is at 0. 5.3 S 2 0. viz. Advanced Excel for scient(fic data analysis (19) Call Solver to minimize SSR by changing the values of ao. which can be identified with the sum frequency 0.
4 shows that we finally have accounted for the four major frequency components near 0. (28) Set the frequencies at fi/2.041667 amplitude 2. and JJ2. 0. but not its alternating amplitudes or other details.0 £1 i rnA 0.04 h. and subsequently use Solver to adjust the coefficients. after accounting for the average and four sinusoidal components near 0. Even though the Fourier analysis showed only one peak around 0.158 ± 0. exploiting least squares analysis to find their amplitudes and phase angles.006 Table 5. 5: Fourier transformation 273 (27) Extend the parameter lists to accommodate four new frequencies.041012 0. The resulting Fig. and subsequently let Solver adjust the amplitudes ao through ag and PI through pg.00 RJ1fi 0.10.568 ± 0.1: The amplitudes found.5.993 frequency amplitude ± 0.004 0.4 Sh o. each onehalf of the corresponding values near 0. The nextlargest contributions are around 0. which now number 17.215 ± 0. (29) After you have done this. frequency 0 0.06 0.006 standard deviation of the fit: 0.1.08 hI.l0. we used astronomical information to resolve this into four different signals. .08 h. 1 0 f& " 0. 0.080511 0. 5.023 ± 0. amplitudes.2 e i e 8· 0. and phase angles. fil2. 5. This combination of different methods is more powerful than each method by itself.006 2.006 0. 5.040256 0.02 . with the four frequencies we have found so far we can indeed represent the general envelope of the tidal curve.078999 0.19 0.034 ± 0.03950 0.Ch.4: The magnitudes of the residual frequency components. As can be seen in Fig.006 0. and include them in the instruction for the calculated heights in column F.006 ± 0.04 0.08 h'.082024 0. run SolverAid (which requires that ao through ag and PI through pg form one contiguous column) to calculate the standard deviations of the coefficients.I . hf2.006 0.620 ± 0.006 0. with their standard deviations as provided by SolverAid.006 0.l .10.3 0.10.083333 0.08 h.S Fig. We therefore extend the analysis with four more frequencies. for the nine frequencies considered so far.286 ± 0.
see Fig.10.10. We see that we can represent most of the signal in terms of predictable periodic functions. a period of about 18. 5.0386 h.0. and that (using 3 times the standard deviation as our criterion) one of them is not even statistically significant. which we can tentatively associate with the difference frequency 0. de Levie.1• Comparison with Fig.6.08 h. We see that only one of the four halffrequency components is important.03 m. the Fourier transform shows that not all frequency components around 0. Advanced Excel for scientific data analysis o 110 220 330 440 550 660 770 80 990 1100 Fig. 5. based on much longer data sets (so as to include the length of the moon's node.038743.0015128 = 0.6 years) and on using more frequencies.041012 by 0.0805114/2 .10.274 R. since there is a remaining signal at about 0.1 lists the results so obtained for the (absolute values of the) various amplitudes. and then Fourier transform the residuals. By comparing the data in Tables 5. of course. and we therefore look into the mutual dependence of these results. we find that that all remaining components have amplitudes smaller than 0.5: The tides recalculated using the four principal frequency components near 0. so that tide tables can indeed anticipate the tides. 5. in this case an array of 17 by 17 = 289 numbers.2 we see that changing one frequency can alter the amplitudes of the neighboring frequencies. the phase angles are needed for the analysis but have no physical meaning because they are tied to the particular starting time chosen. Indeed. Below we show how we can quickly screen them for significant correlations. Table 5.04 h.1 and 5. However.038743 hI . if we replace the nonsignificant frequency 0.10. run Solver again.10.I have been accounted for. SolverAid can provide the corresponding array of linear correlation coefficients.I . . Such tables are.10.1 shows that this indeed represents the dominant longerterm features of the experimental data.
005 0.ABS(AAI).9. And if you want to see what is possible by harmonic analysis (using a longer data base and many more harmonic terms).082024 0. This plot is shown in Fig.994 ± 0.005 0. effects of earthquakes or storms.g.005 0. and copy this instruction to the entire block AS I :BI17. and indicates that there is very little noise on this signal.4 0. plot the original and calculated curves in one graph.04 h~l.568 ± 0. 5.078999 0.004 0. using different symbols and/or colors.1 (continued): (30) Run SolverAid (again) and let it provide the matrix of linear correlation coefficients.2! (31) To get an idea of how well you can represent the observed tidal data with just eight frequencies. You can of course set the bar lower.10.005 Table 5.Ch. 1 0. when caused by.10.005 0.17 0.9.083333 0.114 ± 0. 0.10. at 0.. frequency 0 0.2 o. but could only be recognized as such in retrospect.08 and 0.7.020 ± 0.041667 amplitude 2.286 ± 0. whereas all other cells will remain empty because they will contain the 'empty' string between the two quotation marks in the IF statement.8 or wherever. Exercise 5.2: The same results after one frequency near 0.007 ± 0. as in Fig. and certainly would not be predictable.03950 0. .5 Fig.8.216 ± 0.04 h~l has been redefined.6: The magnitudes of the residual frequency components. since in this particular case none of the 17 adjusted parameters has a very pronounced dependence on any other.156 ± 0. Any linear correlation coefficient with an absolute value larger than 0. 5: Fourier transformation 275 0.005 2.080511 0." "). Deposit in cell ASI the instruction =IF(ABS(AAl»0. Such noise may still be deterministic. 5. e.10. If you want to see how far you still would have to go. after accounting for the average and eight sinusoidal components near 0.0 0. as in Fig.005 frequency amplitude standard deviation of the fit: 0.005 0. 5.9 will show. In fact.10. plot the residuals. plot the data in the 'Sigma' column you may have set aside under point (11). by comparison with geological and meteorological records. the largest linear correlation coefficients (apart from the 1's on the main diagonal) are smaller than 0.620 ± 0.038743 0.040256 0.10. 5. Say that you have placed it in AAI:AQI7.
1111. in this case.J . de Levie.• /I I. . 1.5 o 1.08 and 0.n. about 30 times larger. '~~~ ~. Still. ~~'q mw. by including more and more frequencies. This is indeed how tidal tables are made.5 110 220 330 440 550 660 770 8 0 990 11 00 0..04 h..~~ . 5.1• . with a large signal and apparently relatively little 'noise' from earthquakes. storms etc.276 R..'7hlin t ~'If1 ~.. Remember that the standard deviation between the observed and the predicted heights listed in the NOSNOAA table was a mere 6 mm.8: The residuals after accounting for the average and eight sinusoidal components near 0.08 and 0. .5 :: .10. • I 1100 1210 1320 1430 1540 1650 1760 I 70 19 0 2090 2200 Fig. 5.10. Advanced Excel for scientific data analysis o 110 220 330 440 550 660 770 80 990 1100 Fig.1• It is clear that we can continue this process and.III k.7: The original data (solid points) and the fitted curve (drawn line) based on the average and eight sinusoidal components near 0.. . see under point (11) in the exercise. make the fit better and better. you get the idea. .04 h.. The corresponding value for our fit so far is 174 mm. . and the more so the longer is the experimental record on which it is based. the prediction can be extremely reliable.
.sa1 ~~N*~·. an optimal least squares filter.: .. It can also be used to manipulate data. differentiation. 0. Here we have used the mathematicaVphysical sign convention.9: The residuals in the NOS/NOAA harmonic analysis of the same data set. A minor nuisance in using Fourier transforms is the confusion between different conventions used in the literature: which transform to call forward. equivalent operations of the two approaches on the same data will yield somewhat different results. .10..05 110 sso 660 770 80 990 1100 . or a circuit. .h5... Because Fourier transformation uses trigonometric functions as its basis set rather than polynomials. 10 277 .05 BPI! IS 220 is: 2. the combined use of Fourier transformation and least squares methods can sometimes exploit the best features of both approaches. the transform method can be combined with least squares curve fitting to get the best of the two. 10 0. 5._~.. the advantages of Fourier transformation are most pronounced with large. it tends to compete with least squares analysis.I 0. As we have also seen in the example of tidal analysis. as in filtering.00 .~. . In all those areas.1 1100 1210 1320 1430 I 4 1650 :l!i.."'" A 12 1.Ch.15 0. 0. and we have shifted the burden of the normalization factor 1/(2Jr) entirely on the inverse transform in order to have a simple relation between the amplitude of a sine or cosine . .. which can then be used for cataloguing and searching. Fourier transformation is often the method of choice. 15 0. Aj7J<1~~.. With Wiener filtering.iPMIIlDli l . and interpolation. Note the ten times enlarged vertical scale.q±~:~R~. .~M_M.05 Fig.~.~• •"'". Fourier transformation of a data set and rejection of its highfrequency components can yield a compact set of descriptors of the main (lowfrequency) components of that signal. 5: Fourier transformation 0.11 Summary For determining the frequency content of a signal.~~k.05 j 0. 5.. . . 330 2 440 o 0.. informationrich data sets. __~p 1760 1870 1980 2090 2200 0. As in instrumentation. and what to use as normalizing factor. •• ~.00 ~~~.
as when log is used in VBA where In is meant. for more complete coverage the reader should consult entire books devoted to this single topic. There is also the factor liN. (As an analytical chemist I have often marveled at how my own professional tribe has been able to stick with liters. McGrawHill 1965. It can also be applied to analyze the frequency components of timedependent phenomena. which is here bundled with the inverse transform but can also be shared more equitably by both forward and inverse transforms. with different purposes and different traditions. as demonstrated by Hartley in 1942. a similar problem that has no inherently good or bad solution but would benefit from a globally accepted choice. the Greek litra. Tukey.. In short. This makes the frequency f(in Hz or cps) rather than the angular frequency OJ = 2lTf(in rad SI) the primary frequency parameter. N. Fourier transformation can be used to predict the distortion (convolution) of experimental information by measuring instruments and/or complicating physical phenomena. which do for a square wave what the definition used here does for a sine and cosine. this chapter should be considered an appetizer rather than a main dish. W. because it makes Fourier and Laplace transforms compatible. We have not even mentioned here the possibility of performing a fully equivalent frequency analysis without imaginary terms. denoted a rather ordinary weight. a volume measure that fits neither the cm/gls nor the m/kgls system. or the existence of the Hadamard transform. even though its real etymological root. . presumably in honor of a mythical Dr. by R. it can contribute to the correction of such distortion (deconvolution).) This short chapter is a mere teaser as far as Fourier transformation is concerned. The Measurement of Power Spectra. Advanced Excel for scientific data analysis wave and that of its Fourier transform. But then. Ultimately. this is the price we pay for using the same concepts in different disciplines. Bracewell. It has even managed to get the symbol changed to a capital L. see section 1. and occasionally there may still be relapses.278 R. The Fourier Transform and its Applications. de Levie. Blackman & J. Liter (first name Milli?). just as it will be for keeping either right or left in traffic. Alternative arguments can be advanced for a more equitable distribution of normalization factors v' (1I2lT). among others. This convention was advocated. i. Conversely. Several such applications will be discussed in the next chapter. B.16. Dover 1958. some consensus will be reached. in both cases we use the negative exponent for the forward transform. and by R. But forging such a consensus may take a long time. the digital equivalent to Fourier transformation.e.
O. Bracewell. Prentice Hall 1974. For the Hartley transform the reader is referred to Hartley's paper in Proc. or to Bracewell's book The Hartley Transform. Oxford University Press. The Fourier Transform and its Applications. A classic reference for the (closely related) continuous Fourier transformation is R. 1986. 1997. Brigham's book on The Fast Fourier Transform. McGrawHill 1978.12 For further reading An excellent introduction to the discrete Fourier transform is E. . 5: Fourier transformation 279 5. N. IRE 30 (1942) 144.Ch.
filter. say to t = 100 with increments I':. A characteristic property of such a filter is its memory. and with column headings for time. (2) Start time at negative values. 6. In principle these are different from stationary data sets.1: (1) Start a new spreadsheet. the combination of a series resistor and capacitor that has a characteristic rate constant k = lIRC. e. and C is the capacitance of the capacitor.1 Timedependent filtering We first consider the wellknown example of a socalled RCfilter. leaving the top 12 rows for graphs. such as spectra. a direct application of Fourier transformation. Finally. These techniques will be discussed first as independent methods.t of 1. where R is the resistance of the resistor. Below we will illustrate how we can use a spreadsheet to simulate the behavior of such a filter and. at least until the entire signal has been recorded and has thereby become just another set of numbers. deconvolution. it will respond by exponentially approaching the new steady state. in F (for Farads). typically in n (the symbol for Ohms). Exercise 6. eventually. and timefrequency analysis In this chapter we will consider timedependent signals.1.. we will examine timefrequency analysis or Gabor transformation. because evolving time has an inherent directionality. We will see the consequences of this in convolution and its undo operation. and output in. which only slowly leaks out through its resistor.g. then extend the column down to as far in the positive domain as desired. input. say. . at t = 20 in cell A17.Chapter 6 Convolution. deconvolution. through the charge stored in its capacitor. of much more complicated filters and other signal distortions. Subsequently we will illustrate how they can sometimes be performed more efficiently with the help of Fourier transformation. AI5:DI5. When we pass a stepwise signal change through such a filter.
a unit step starting at t = 30 and returning to zero at t = 65. for nonpositive values of time t only (i.g. there are only a few points that significantly differ from 0. or the rounded comers and overshoot from using interpolating cubic splines. In other words.) and ek(tr. where '1 and '2 are the times at which the signal jumps from 0 to 1 and from 1 to 0 respectively.9 0. (8) Make another column in which you compute the functions lek(tr.1.Ch. '1 '2 1. Fill the rest of the column with zeros. in order to avoid the trapezoidal look you get by just connecting successive points. 6. and copy that down to row 137. (6) Place the label norm= in cell C13.1: The test function (open circles and. the thin line) and the same after filtering with a single time constant k = 0.e.5 is convenient for the scale and unit stepsize used here: if k is too large. and timefrequency analysis 281 (3) Place a signal in the input column.2 Y 0. (5) In the filter column. . 6: Convolution. e.6 0. in cell E66 deposit the instruction =1EXP ($B$13* (A66$A$66)) and copy this down to cell ElOl. and the instruction =SUM (C1 7: C37) in cell D13. with its label in A13. its filtered form (in column D). (7) In cell D37 place the instruction = (B37*$C$37+B36*$C$36+ + B18 * $C$18+B1 7 * $C$l 7) / $D$13. A value for k between about 0..1. III CI7:C37). The heavy line shows 1 . deconvolution.l and ek(tr.2 and 0. calculate the exponential ekt . where the dots indicate 17 terms of similar form. idealized. where you replace it with =EXP ($B$13* (A101$A$101)). say in cell B 13. if k is too small. Don't worry about such a bland signal: you can soon make it as fancy as your heart desires. (4) Place a time constant k somewhere at the top of the spreadsheet.ek(tT. and with Smoothed line turned off. and compare with Fig. The thin vertical lines shown there were drawn separately. Copy this instruction down to row 13 7.1. (9) Plot both the signal (in column B).3 \ 40 60 o 20 80 t 100 Fig.). where and are the times at which the signal jumps from 0 to 1 and from 1 to 0 respectively. the exponential hardly approaches 0 at t = 20. and its calculated form (in column E) as a function of time (in column A).3 (solid circles). 6.l..3 I ( 1 o 0.
(13) Save the spreadsheet for subsequent use.1.. the RC filter is characterized by its rate constant k = liRe or characteristic time RC.1. 6. de Levie. y 0... 6... 1.3 illustrates the response of such a filter to a sinusoidal signal...9 0. (12) Also try a test function with added Gaussian noise.. as in Fig. or just modify the signal in column B..6 1.1.... (11) Figure 6.. 1.J o 20 40 60 80 t 100 Fig.3: A test function with a sinusoid (open circles connected by a thin line) and the same after filtering with the same time constant k = OJ (solid circles connected by a heavier line)._ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _..1. and the transient when the signal is terminated abruptly... l _ _ _ _ _ __ _ _ __ _ ~!!'!!!'~ _ __ ___1 o 20 40 60 80 t 100 Fig. y 0.6 0..2 .. Its . the short initial transient before a steadystate harmonic response is reached.... This simple spreadsheet program indeed mimics the effect of an RC filter... the usual tradeoff. 6.....3 (solid circles connected by a heavier line). 6. Advanced Excel for scientific data analysis (10) Either copy A13:D137 to..2: A fantasy test function (open circles connected by a thin line) and the same after filtering with the same time constant k = 0.L.3 OJ . Regardless of the input signal to which it is applied. Play with it. an example of such a signal and its filtered response is illustrated in Fig. The filter greatly reduces the noise but also distorts the signal.2..... Now you can give your imagination free reign.1.. 2 .. . G13 for a new signal..282 R.4.. say.6 0.. You will recognize the reduced amplitude as well as the phase shift of the filtered output.. 2 ...
by using a higherorder filter. the shorter is the memory. the point measured before that by $C$35 (only 0.3 0 20 40 60 80 t 100 Fig. and the same after filtering with the same time constant k = 0.74). in contrast to. the previously measured signal value by $C$36 (here 0. and timefrequency analysis 283 exponential response to a sudden input change of unit amplitude is called its transfer function. The asymmetry comes from the directionality (the 'arrow') of time: the past is knowable. which then slowly leaks out. the least squares smoothing method we encountered in section 3.6 0. but with factors that decrease as the information gets older.1. 1.5.Ch. the past is included. The smaller is k.1 multiplies the most recently observed signal value by $C$37 (which in this example has the value 1). i.e.3 0 0. Just ask your stockbroker: it is easy enough to spot. You can convolve the convolving data. In this manner the filter incorporates the past.4: A test function (thin line) with added noise (open circles).9 0. . e. although it is usually more efficient to achieve the same in one single operation. and thereby achieve multiple filtering. and so on.g.e. whereas the future is not. The filter therefore acts asymmetrically. deconvolution. An interesting aspect of this simulation is that the filter function as it were looks backwards. when the Dow Jones closing index last went through a maximum. the faster the filter will respond to changes in the input signals. but (as tradeoff) the less effective it will be in rejecting noise.1. retrospectively... as illustrated in Fig.2 Y 0. just as you would with two successive. it stores the applied voltage as a charge.1. This is no accident: an RC filter has a (short) memory.15. 6: Convolution. independent RC filters. i. 6. but gradually forgotten.3 (solid circles connected by a heavier line).55).. 6. but it is another matter entirely to predict correctly when next time it will crest. The instruction under point (7) of exercise 6.
..1.9 0...._ _ __ _ _ _ __ _ _ _ __ _ __ __ _.3 0 OJ 0 20 40 60 80 t 100 Fig. 1. in both cases with the same time constant k = 0. invert it with Sort Descending.pecial ::::} Values.6 0. signal inversion.. and run it again through the convolution protocol. 6..' o 20 40 60 80 t 100 Fig.284 y 0.4). take the output of the filter (after it has been recorded in its entirety).1. Obviously.1..9 0..5: The same test function (thin line) with added noise (open circles). copy it together with the associated time sequence using gdit ::::} Paste . Advanced Excel for scientific data analysis 1. Signal inversion is possible only after the entire output of the first filter has been observed.3 . with twice the filtering action. see Fig..3 (solid circles connected by a heavier line).2 .s. filtered by convolution (see Fig. 0..6: The same test function (thin line) with added noise (open circles).. .1.. you can do this only after the output has been completed. If you want to use this method for symmetrical filtering.. at which point it has become a fixed sequence of numbers rather than a signal evolving in time. after filtering.3 R... and filtering again with the same time constant k = 0. and filtering that output again by convolution.L..6. de Levie. 6. ... 6.6 0.2 Y 0..3 (solid circles connected by a heavier line). This will indeed yield a symmetrically filtered result. 6.
1. 0. those details exist.e. 6: Convolution. In all the above examples we obtain a filtered.Ch. because only distorted music reaches our ears.r) N . When the data sets become large. The macro Convolve simply automates what we have done manually in section 6. from which it may be difficult to reconstruct the original sound. as the Hubble telescope actually did before it got its eye glasses. whereas in a measurement instrument we usually do not mean to distort. distorted signal. When we use a filter we distort intentionally. but the effect is nonetheless the same. 1000. or 10000 product terms. Convolution describes mathematically how the effect of the measurement instrument distorts the input signal to produce the observed output. With a lowpower microscope.2. By now you will not be surprised that we can deal with this complication with either a custom function or a custom macro. we cannot expect to see fine details in the sample. The macro Convolve operates exactly as exercise 6. it approximates the continuous integral +00 x(t)0y(t)= fx(r) y(tr) dr 00 (6. (In much of the literature the asterisk * is used for that purpose. It is a mixed pleasure to listen on a tinny radio to a superb musical performance. i.1) by its discrete equivalent 1 N (6. Convolution is not only used for filters. the output will reflect the original spectrum as well as the distorting effect of the instrument used.=1 where x and yare both functions of t.2 Convolution of large data sets The abovedescribed timedependent filtering. in which we multiply a function term by term by the timereverse of another function. but we will not do so .. Convolution is a sufficiently common operation that it is denoted here by a special symbol.2) x(t) 0 yet) =.Lx(r) yet .. When we use lowresolution equipment to observe a spectral feature. and timefrequency analysis 285 6. but they are lost to us in the limited resolution or chromatic aberration of our tool.1. the instructions required for direct spreadsheet convolution can become impracticably large: just imagine typing in 100. e.2.1. we get the image as if looking into a laughing mirror. deconvolution. typically in order to reduce noise. is called convolution. while ris a 'dummy' variable that does not figure in the final result. an instrument can distort a phenomenon under observation.g. but is also a very useful concept in describing how.
call Convolve. (7) Plot the input signal B17:B216 and its convolution C17:C216 vs. but at the expense of a sluggish response to signal changes. The macro includes the inversion of yet) necessary to compute y( . and plot your results. (4) In cell A13 write the label k=.2.1 clearly shows the tradeoff involved in filtering. filter. (3) In B17:B216 enter a simple test function such as used in exercise 6. and again call the macro. perhaps resembling that in Fig.4.1. The distinction between 'signal' and 'noise' is usually a subjective one. highlight it anyway. Fig. highlight AI7:C216.2. and for yet). i. and copy this instruction down to row 216. and place column headings for time. data or formulas in the first column will neither be used nor erased. for x(t). 6. (9) Reset the noise amplitude. (5) In cell CI7 place the instruction =exp($B$13*AI7). which is included here only for the sake of consistency with another macro.2. If there are no time values. which we will encounter in section 6. (2) In cells A17:A216 depositt= I (1) 200. The only requirements are that the time increments in the two data sets x(t) and y(t) are constant. or even if that space is used for some other purpose. You may have to reduce the filter time constant k in order to preserve some semblance of fidelity.286 R.1. (6) Highlight the area AI7:C216. ConvolveFT. We reduce the effects of highfrequency noise.1 (continued): (8) Replace the input signal by a more fanciful one. or at whatever starting number we want to assign. Our first example will use the data already encountered in Figures 6. Advanced Excel for scientific data analysis here because * is easily confused with the multiplication symbol in computer code. de Levie. and output in.1.2 or 6. input. and in cell B13 place a numerical value.1.4.e.3) The macro requires three adjacent input columns. reserve the top 12 rows for graphs. including that of Excel and VBA.) The order of convolution makes no difference.2. Save the spreadsheet.1.t). and that the two signals x(/) and y(t) are defined at the same values of t. . say. the macro does not use the time column.1. one each for time t. Exercise 6. and call Convolve.2 illustrates what you may obtain.1 through 6. because the filter also reduces the highfrequency components in the signal that describe its sudden jump. AI6:A216. 6.. so that both x(t) and y(t) should be listed as starting at t = o or 1. and then produces the convolution x(t) Q9 yet) in the fourth column. In fact.2.1: (I) In a new spreadsheet. set the noise amplitude to zero.3. convolution is commutative: x(t) @ yet) = y(t) Q9 x(t) (6. Figure 6. Exercise 6. A 15:D 15.
6. The best (proactive rather than afterthefact) solution is to reduce the noise at its source. We will assume that a laser pulse with a reproducible and known intensitytime profile is used to excite molecules to excited states.2.0 287 a • 50 100 150 200 O. In these examples k = 0. and we describe the fluorescent decay by a firstorder rate process with rate constant k.04. deconvolution. Filtering is only the nextbest option.2 1. In panel (b) we use the same input as in panel (a) plus some Gaussian noise of standard deviation 0. The input function X(/) is shown as open circles. For the time course of laser light emission we take a skewed Gaussian (a rather arbitrary function picked here merely because it starts rather quickly and decays slowly).Ch. and xC!) <8> exp[kt] as smaller solid points connected by line segments. Exercise 6. from which they decay soon thereafter by fluorescence.0 0. Whereas with an RC filter the distortion is usually intentional (in order to remove noise). and timefrequency analysis 1.2 1. if k is chosen too large.0 0.2 0.4 0. the signal is distorted beyond recognition.6 0. 6: Convolution.8 0. here it is the undesirable consequence of the un .1.1: The result of convolving a step function xC!) with an exponential decay y(t) = exp[kt] using the macro Convolve. in that case collect data as closely spaced as possible for maximum noise rejection at small k.2 illustrates using this macro for a transient such as might be encountered in the study of shortlived fluorescence. the filter is inefficient in reducing noise.2 0 1. b o 50 100 150 200 Fig. and to shield all noisesensitive parts of the signal path. Again the tradeoff is obvious: if k is too small.2.
..2: Some results of convolving a function x(l) with an exponential decay yet) = exp[kl] using the macro Convolve... In B47 then place a formula for the undistorted decay..6 0..8 0.1. In the example of Fig._ _. AI5:D15. The input function x(/) is shown as open circles. then extend the column down to as far in the positive domain as desired. with zeros.5. 0. 6. say BI7:B46. and otherwise y = O.2. Advanced Excel for scientific data analysis avoidably finite rise and fall times of the laser pulse. and its nonzero width. 6.2. input. In (a) the input function is a fanciful set of steps followed by a sinewave.3 we have used IJ= 10 and aJ= 7.03.. and output in.0 0..I 0. .. EXP (1 * (LN (1+ (A1 7$0$13) /$0$14) ) A2) ...2 0. Copy this instruction all the way down the column. such as =EXP ($B$13* (A47$A$47)). . reserve the top 12 rows for graphs... and place column headings for time. de Levie..2.2 .. 1. Exercise 6.04. where B13 contains a value for the rate constant k... e._ _ _ _ _ __ _ _ _ _ _ __ o 50 100 150 200 o 50 100 150 200 Fig.+ ..g. (3) Fill a top section of column B... filter. ..I .. 1.4 0..2: (1) In a new spreadsheet. and in (b) the same with added Gaussian noise with a standard deviation of 0. say..0 _ _It. with k = 0. (2) Start time at t = 0 in cell A 17... where D13 and D14 contain values for the filter parameters IJ and aJ of y = exp({ln[1+(/tr)laJ]}2) for l+(tIJ)/aJ> 0. say to 1= 300 with increments t11 of I.....2 ..288 R. (4) In cell CI7 deposit =IF (1+ (A1 7$0$13) /$0$14>0..... and x(t) ® exp[kl] as smaller solid points connected by line segments...0)..• .
and compare it with Fig.5* ( (A22$G$13) /$G$12) "2). then with increments Ilt of I extend the column as far down as desired. time tin column A. such as one actually measured for a particular light source. shown here with a delay for visual clarity. only the resulting numbers. . +$G$l1 *EXP (0.. and will work equally well if one uses different fluorescent decay kinetics and an arbitrary shape for the laser pulse profile. cs2=. 6.Ch. and possibly overlapping or nearly so. and r in. bs3=. t. cr1=. in this case Gaussians for both the undistorted peaks and for the broadening effect. and D vs. in cell B22. then plot the functions in columns B. 6: Convolution. and timefrequency analysis 289 (5) Call Convolve. cs4=. C. cells A20:D20. as2=. Small solid circles connected by thin curve (labeled c): an assumed profile of the light pulse.2 0 c 50 100 150 200 Fig. (3) In column B generate a fantasy spectrum or chromatogram consisting of four Gaussian peaks.6 0. Wide gray curve (labeled b since it displays the contents of column B): the theoretical. s. cr4= in cells K2:K13.3: The exponential decay of simulated fluorescence.2 1. and arbitrary peak shapes and broadening functions will therefore work equally well. say to A321 where t = 300. 1. as4=.4 0. . place the labels asl =. ar2=. The macro does not use the generating equations.2.. deconvolution.3.2. Note that the macro simply uses the resulting numbers. bri=. csl =. bs2=.8 0. defined by a firstorder rate constant k = 0.3: (1) In yet another spreadsheet. at=. For the sake of simplicity we will again assume simple forms... as3=. bt=. Exercise 6.0 0. again reserve the area AI:E16 for graphs. bsl =. Save your result. As our third example we will use the convolution macro to illustrate the effect of.2 0. bs4=. or of instrumental peak broadening on a chromatogram. of different widths. ct= in cells F2:A16. exponential decay for a deltafunction as excitation source. 6. say. Also place column headings for time.03. and the labels ari=. =$G$2*EXP(0. as convolved by an excitation light pulse with finite rise and decay times.2.0 0. cs3=. Open circles (labeled d): the emission signal that would result for this combination of fluorescence and excitation. with finite rise and fall times. (2) Start time at t = I in cell A22. limited optical resolution on spectral peaks. say.5*((A22$G$4)/ $G$3) "2) + . using instructions such as.
bS1 = 2. deposit the instruction for a single Gaussian representing the signaldistorting transfer function t with the parameters b.290 R. bs2 = 15. as well as rounded.4.4 0. Unless you remove this feature from its code. The convolved spectrum is shifted with respect to the original spectrum by the amount of Ct. C t = 35 is also displayed in panel a. such as =$G$14*EXP (0. as4 = 0. The transfer function t (connected small solid circles in the same panel) with at= 0.2 0. cn. Advanced Excel for scientific data analysis (4) Likewise.4. bt = 5.6 0.5.8 0. The convolution r of sand t is shown in panel b. in which the peaks are shifted by the amount of Ct . and Ct . Csl = 110. the custom macro Convolve will ignore at and will. bs4 = 3. We see that the convolved spectral peaks are broader (as is most noticeable with the narrower peaks) and less tall (because the convolution does not change their integrated areas). instead. Cs 3 = 210. bs3 = 3.2. periodicity. Save the spreadsheet. 0. and use the area Al :F16 to plot the results. a s2 = 0.4: A simulated spectrum s containing four Gaussians (connected open circles in panel a) calculated with the coefficients asl = 0. in column C. but requires neither symmetry. and Cs4 = 225. normalize t to unit average value. de Levie.6. make sure that there is enough space at the end of the signal to accommodate such a shift.0 0.2 0 0.2. nor a specific number of data points.7.4 0. Cs 2 = 130. 6. 6.65.0 0. which might resemble Fig. as 3 = 0.2 0. so that adjacent peaks tend to coalesce. Therefore. . You can add any number of dummy data points: Convolve needs equidistant signals x(t) and y(t) (because these two functions will be sliding past each other).8 0.5* ((A22$G$16) /$G$15) "2) in cell (5) Convolve the fourGaussian signal s with the singleGaussian function t. otherwise just add zeros to the signal to provide that space.6 0.2 0 50 100 l50 200 250 300 b 50 100 ISO 200 250 300 Fig.
or integrate to undo differentiation. as reproduced from Fig. Just as we can exponentiate to counteract taking a (natural) logarithm. The technical term for the undo operation of convolution is deconvolution.1: (1) Start a new spreadsheet.6.4 0 0...8 0. 6. 6: Convolution. Beware: this term is sometimes misused to mean decomposition or resolution. For input. copy the earlier values: t = 20 (1) 100.1.3.4 0 Y 0.1: Top: the original input signal. (3) Also place the labels k= and norm= in cells H13 and J13 respectively.3. deconvolution.3 Unfiltering Say that we have filtered a signal with an RC filter. a much more trivial problem discussed in. 1.2. we will here assume to be columns H through L).1. 6. (2) Make the following new columns (which. and with column headings for time. before filtering. and output2. section 4. the resolution of (often simply additive) constituent components. leaving the top 12 rows for graphs. =D17 in cell Il7. e..Ch. and want to undo that operation. for ease of specifYing the instructions. the unfilter operation now recovers the original signal (solid circles).4 0 20 40 60 80 t 100 Fig. Exercise 6.g. .2 Y 0. and repeat the earlierused kvalue in cell 113. we can use the spreadsheet to unfilter the data. filter. (4) For the time column. input. (open circles) and its filtered output (solid circles). Bottom: using the filtered signal as input (open circles). as illustrated in exercise 6.4 0 1.8 0.1.g.1. outputl. or add on to the spreadsheet of exercise 6.3.2 20 40 60 80 t 100 0. e. copy the data from column D to column I with. and timefrequency analysis 291 6.e. i.
. and copy this filter function down to row 37.5 0 r 40 = 0.._ _ _ _ _ __ _ o 1. 1..2 .2 20 y 0.3.27 0.2 o 1..1. of course. 6. Copy these instruc tions all the way down to row 137...1) deposit the instruction =I38. slightly different from that for convolution.33 p 0 ~ 60 80 0.30 in this example).5 0 :::> r 40 k = 0.. (6) In cell K13 enter =SUM (J17: J37). distortion occurs. Top and bottom: when a different kvalue is used for filtering and unfiltering..2 o 20 t 100 Fig..2: Middle panel: when the filter rate constant in unfiltering is the same as that in filtering (k = 0..5 k = 0. de Levie...2 20 40 60 o k t 100 y 0... In cell K38 (to take a place equivalent to that of cell D37 in exercise 6. as shown here in the form of overshoot or undershoot. .292 R. and in cell 138 =K38*$K$13...30 p 0 ::> ~ 60 80 t 100 0..(K37*$J$36+K36*$J$35+K35 *$J$34+K34*$J$33+K33*$J$32+K32*$J$31+K31*$J$30+K30*$J$29+ K29*$J$28+K28*$J$27+K27*$J$26+K26*$J$25+K25*$J$24+K24* $J$23+K23*$J$22+K22*$J$21+K21*$J$20+K20*$J$19+K19*$J$18+ K18*$J$17+K17*$J$16).2 ..... the original signal is recovered. Advanced Excel for scientific data analysis (5) In cell 17 place the instruction =EXP ($I$13*H17). (7) The expression for the deconvolution is. y 0.1.
Exercise 6. 6. as illustrated in Fig. Again we can use a custom macro. unfortunately.3.3.4 0. Deconvolve.2 0.Ch. by changing the data in column B to signals such as used in Figs.3: Top panel: convolution of a step function (large circles) with an exponential filter function preceded by a constant level (connected small solid circles).8 0. (10) Try the same spreadsheet for other input functions.0 0. Try it for yourself. The following exercises illustrate some of the limits involved. You have now unfiltered the original signal.2 for the simple square pulse of Fig. and timefrequency analysis 293 (8) Plot I17:Il37 vs. H17:H137. (9) Check the residuals. (11) Ifwe use a different kvalue in the unfiltering operation from the one used for filtering.2 1.4 0. 6. and advance the function representing the RC filter in column C to start at t = 10 rather than at the very beginning of the data set. Bottom panel: deconvolution of the same recovers the original step function (small circles) without problems. at t = 1. The few examples given so far suggest that we can always undo the effects of filtering or distortion.1. The labels refer to the spreadsheet columns used. This is illustrated in Fig.3. 1.1.2 0 1.0 0. .0 0. we will not recover the original input signal.0 0. 6.6 0.6 0. deconvolution. but that is. the differences between corresponding values in columnsKandB.1.2 0 50 100 150 200 Fig.1. with the filter function set at 0 for t < 10. 6.2 0.4.2: (1) Take the spreadsheet used in exercise 6.2 50 100 150 200 1.3 and 6. 6. too good to be true.8 0. to ease our work.1.1.l. 6: Convolution. with a single step function as its signal in column B.3.
.. D...4 illustrates the result.4 0.0 0. _. a requirement that often cannot be met. and C respectively.294 R.8 0.50 (0. 1.. You will fmd that the result of the deconvolution yields a wildly oscillating signal that bears little resemblance to the original.. If we increase the initial slope..I_ _ _ __ _ __ _ _ __ _l o 50 100 150 200 0.. G.4: Top panel: convolution of a step function (large circles) with an exponential filter function preceded by an initial rise (connected small solid circles). and repeat the convolution and deconvolution.0 . We therefore look for alternative methods to perform the deconvolution in those (many) cases in which direct deconvolution does not work. there is no problem here.g.2 '.3 and 6. 6. Bottom panel: deconvolution of the same recovers the original step function with a (damped) oscillation (small circles). (3) Apply Convolve to the data in block Al7:C2l7.2. which shows the onset of instability in the form of a damped oscillation.... (4) In order to see more clearly what happens. Figs. ._ _ _ _ _• 0..6 0..... by giving it the values 0.3.. (5) Now give the filter function a slightly positive slope over the first few points. Apparently the deconvolution works reliably only when the transfer function has nowhere a positive time derivative.1) 1 for t = 1 (1) 10. set the filter function for t < 10 to 1..2 .' 50 150 a 100 200 Fig. see e. e.3.e. de Levie. it is even worse if one steps suddenly from 0 to 1 at t = 10. so that convolution of the data in columns A through C can readily be followed by deconvolution of the data in columns F through H...g.3.. e. Advanced Excel for scientific data analysis (2) Set up columns F. the oscillation is no longer damped) with initial values for the filter function of. and is fully out of control (i.05) 1.00 for t = 1 (1) 10. the problem worsens.2 .. 6. Figure 6.3. 6.2. As you can see in Fig. .g.. 0 (0.. As you already saw... and H to copy the data in columns A. then Deconvolve those in block F 17 :H217.4.. The labels refer to the spreadsheet columns..2 0. 1.
A basic theorem states that the Fourier transfonn of the convolution of two continuous functions x(t) and y(t) is equal to the product of the Fourier transfonns of those functions. when +00 z(t) =x(t) ® y(t) = fx(1') y(t1') d1' 00 (6. and Z(f) are the Fourier transfonns of x(t).2. say. y(t). but they are often suboptimal for collections of. or whatever the relevant parameter is. Y(j). A second reason to consider Fourier transfonnation is that we already saw that direct deconvolution can lead to unstable results.1) l' where t is time.4.4.3 and 6. 1000 log2 1000 is approximately 104 • Even though convolution or deconvolution requires three Fourier transfonnations plus a complex multiplication or division. especially those in multidimensional arrays.4. and z(t) respectively. and timefrequency analysis 295 6. 6. One reason to consider such an alternative approach is algorithmic efficiency. when computers were slow. and for their applications to relatively small data sets. wavelength. use of Fourier transfonnation can make sophisticated noiserejection methods readily available. spreadsheets are great for learning and exploring the principles of the various methods. which we will not consider here. this still works out as much faster for sufficiently large data sets. Computational efficiency still matters with large data sets.6. log2 1024 = 10. then Z(j) = X(j) x Y(j) (6. In other words. it may even get away with a rather inefficient method during the time it takes you to blink an eye. 6: Convolution. deconvolution.Ch. and can not be used for curves such as those of Figs. and is a 'dummy variable'.2. Because of their transparency. and f is the reciprocal of t. Parenthetically. whereas a Fourier transfonnation uses a number of operations of the order of N log2 N rather than N 2• Since 2 10 = 1024. as in general their handling should not be attempted on a spreadsheet. Now that the beast on your bench may work at or above 1 GHz. Applied to instrumental distortion. efficiency used to be an important problem. as we will see in section 6. Finally. we can symbolically express the observed result r in tenns of the .4 Convolution by Fourier transformation A different route to convolution and deconvolution can be based on Fourier transfonnation. or log2 1000:::.2) where X(j). For two 1000point functions the direct method requires 10002 or 106 multiplications. more than a few thousand data points.
the reverse might make sense: to consider the object 0 the true signal. and in the next columns calculate (acbd) and (bc+ad) respectively. In exercises 6. (4) Again copy the column for time (from column A). (2) In new columns copy the data from columns A (for time) and B (for input).4.296 R.4) so that we can compute r by Fourier transformation of s to S.g.3) where r. then recalculate the data in column D. s. and taking the antilog of Ina to find a. t with time t) as r=s®t (6. c. b. In optics. using values from the columns labeled a. For example.2. Label the resulting columns (containing the Fouriertransformed input) freq. (3) In a separate cell compute the average of the filter data..1: (1) Use the spreadsheet of exercise 6. the procedure is analogous to computing a = be by taking logarithms to obtain Ina = c x lnb. Highlight these two new columns. and d respectively. c. and t are functions. performing the multiplication. similarly extend the computations in columns Band C. Extend the times in column A to 255. a transformation (Fourier transformation or taking logarithms) allows us to reduce a more complicated mathematical operation (convolution or exponentiation) to a multiplication. s with standard deviation s. the image i its distorted response. a. and next to it calculate the filter value (from column C) divided by its justcomputed average. This yields the complex multiplication.1 and 6. de Levie. whereupon inverse Fourier transformation of R yields r.2.4. in electronics. Apart from the fact that we are dealing here with functions rather than with single numbers.4.2. together with an adjacent blank column. . and b.3. to be left blank). the 'true' signal might be called the input signal i.4. for which you can use the convenient function =AVERAGE ( ) . Upon Fourier transformation this yields R=SxT (6. and again call ForwardFT. e. and d. Advanced Excel for scientific data analysis original input signal s and the transfer function t (here printed in bold to avoid confusion of.2 we will use Fourier transformation to calculate the convolution already encountered in exercise 6. and the filtered result the output o. multiply S and Tto form R. and call the macro ForwardFT. and t to T. In yet another column copy the frequency (from one of the earlier columns labeled freq).4. In both cases. Label the resulting columns (with the Fouriertransformed filter) freq. Exercise 6. (5) For the multiplication of the Fourier transforms use (a+jb) x (c+jd) = (acbd) + j(bc+ad). A note on nomenclature: it is difficult to find a set of symbols that is convenient across many different disciplines. all in order to facilitate Fourier transformation. highlight them plus a third blank column (for the complex input.
or use the data from Fig. you again find the convolved result. deconvolution. Then change the M in $M$3.2. There is nothing special about this set. 1910. A3:A2050.4. Exercise 6. it is easy to verify that convolution is indeed commutative. It should be (within roundoff errors) the same as obtained with Convolve. A . Here is another application. 3. or just insert a blank column between it and the copied filter function. paste it back eight times in the instruction. You can either cut and paste it into the adjacent column.3): Exercise 6. by limiting the input to real functions. and $N$5.7. and remove the final plus sign.1 (continued): (9) Move the output column to keep it for subsequent comparison. Moreover. and 2010. 1050. 1590.Ch. and I. and $0$5. call ConvolveFT. (4) Click on the cell handle to copy this instruction all the way down to row 2048. and compare your latest result with the earlier one you just moved. copy the part beyond the equal sign.1 (continued): (7) Select another three columns in which you copy the data from columns A (time).15. other than that it has nine essentially baselineseparated Gaussian peaks of varying widths. 1740.3.2. (2) In row M2:T2 enter the amplitudes 0. and in row M4:T4 the center values 500.5. Exercise 6. which contains four Lorentzians. (3) In cell B3 enter the instruction =$M$2*EXP (0. and spreadsheet space. and $M$5 in the second exponential into an N to make $N$3. see (6. (8) Highlight these three columns. call InverseFT. $N$4. Then plot B3:B2050 vs. effort.2. There should be no significant differences. B (input). (10) Reverse the order of the columns in which you copied data from columns B (input) and C (filter). $M$4. 1970.2: (1) Starting with cell A3 of column A in a new spreadsheet.30. etc. You can of course make your own signal instead.50. because convolution is indeed commutative. enter 1 (I) 2048. in row M3:T3 the standard deviations 100. and plot your result. in the third into an 0 to get $0$3. (11) Highlight the three last columns.4. scales the filter (by dividing it by its average value). 6: Convolution. and C (filter). and performs the complex multiplications. but uses quite some time. 6.5* ( (A56$M$ 4) / $M$ 3) 2) +. $0$4. it has an even smaller spreadsheet 'footprint' than a single Fourier transformation macro. and bingo. 10. see curve d in Fig. which incorporates the forward and inverse Fourier transformations. It is far simpler to use the custom macro ConvolveFT. The above illustrates the principle of the method.20. 1370.4.1.3. Now that we have reduced the process to invoking a single macro. Below we will illustrate its use by applying it to the same problem. 1840. call ConvolveFT. and timefrequency analysis 297 (6) Highlight the justmade three columns.
0.6 0. vs.4. 0.2a..d. 6. Figure 6. which has appeared in D3:D2050. In Fig.4.. Again.4.298 R. and call the custom macro Convolve.2.5*«A32048)/$D$1)A2). and plot the resulting curves.0001. Copy this instruction down to row 2048 as well.2d distorts all peaks and hides the narrower ones in a single broad shoulder.2 o 512 1024 1536 2048 Fig. an atomic absorption line has an inherent width governed by the Heisenberg uncertainty.. and in cell C3 deposit the instruction =EXP(O.. A3:A2050.4 0. (6) Highlight A3:C2050..8 .4. .. and wiII represent our transfer function. where h is the Planck constant. Plot the result.. (8) Save this spreadsheet for use in exercise 6.. 6. .... and 0..2. 6..2b. Advanced Excel for scientific data analysis (5) In cell Dl place the value 0. (7) Repeat this with different values in D1. . and the sum of the reciprocals of their life times M.4.5*«A3)/$D$1)A2)+EXP(O. The open circles show the individual data points. the lines merely connect adjacent points.01. feel free to use another function instead.5. 0. such as 0. For example. It should resemble Fig.2c the rightmost peak has almost disappeared. because the product of the energy difference M between the two states. This yields a Lorentzian line shape rather than the infinitely narrow line one might otherwise expect for a quantum transition. This is a single Gaussian peak centered at zero. de Levie. 6.4. 6. Here we have assumed a Gaussian transfer function. but its precise form is less important than its characteristic width.. . We see that convolution can even wash out some of its qualitative features.1. cannot be smaller than hI27r. as in Fig..001..4.2 illustrates how measurement instruments or other sources of broadening can distort a signal.c. Such broadening is not restricted to instruments.1: The test function used in exercise 6. while Fig.
10. .4.8 0..6 0.2 " I I 0 .2 o 51 2 1024 1536 2048 0. .8 .~_+3~+_~~~~~~~~ o 51 2 1024 1536 2048 0. . 1. b 0..2 o 512 1024 1536 2048 Fig. .8 . c 0.6 0.2: The convoluted test function (which contains Gaussians with standard deviations ranging from 100 to 1) for (from a to d) increasingly broad transfer functions.4 0. with standard deviations of 0. . deconvolution. and 100 respectively.~_+~~~~~~~~~ o 0.4 512 1024 1536 204 ~~ d 0..6 0. The gray line in each panel repeats the test function.4 0. 299 a 0.6 0. and timefrequency analysis 0. .1.2 o .8 .4 0. .Ch. 6: Convolution. 6.
instrument users are often more interested in correcting for such distortion.5. and t to T. 311 (1901) 459). the thermal motion of gaseous atoms with respect to the 'laboratory frame' (containing the light source and the detector) causes line broadening.2. mathematically described as the convolution of the Lorentzian line with a Gaussian distribution due to diffusional motion (W.5. r. in the time domain. socalled collisional broadening.e. and inverse transform the latter to find s. . The effects of such interactions must. Deconvolution based on Fourier transformation works as follows. s=r0 t (6. we first determine the transfer function t of the instrument.1) or. 6. 6. Using an appropriate input signal. or reconstruct an actual spectrum by correcting it for the distortion of the nonzero slit width of their instrument (as in going from curve r to curve s in Fig. 6. The symbol 0 suggests the corresponding Fourierdomain division (/) just as ® implies mUltiplication (x) of the Fourier transforms. Consequently we transform r to R. which (assuming that the instrumental parameters have not changed) is given by r = s ® t. then calculate S = R / T.2 we will illustrate this alternative method of deconvolution. Phys.3).300 R. again.5 Deconvolution by Fourier transformation While convolution is useful in understanding instrumental distortion and in instrument design. undistorted signal. Ann. They may want to compensate for a finite laser pulse width in order to determine the rate constant k of a fast fluorescence decay (i. from which we obtain S=R/T (6.2) where 0 denotes deconvolution. deconvolution in principle allows reconstruction of the original.4). which typically show rotational fine structure in the gas phase. de Levie.5.5. In exercises 6.. the inverse operation of convolution 0. see (6.2. recover curve b from the measured curve d in Fig. Fourier transformation of r = s ® t yields R = S x T. Now that we know both rand t.1 and 6. High gas pressure may result in further.2. Voigt. Advanced Excel for scientific data analysis However. where s is the sought. we want to find s. be described in terms of convolutions. distortionfree signal s from its corrupted response r. In condensed phases such rotational fine structure is often blurred as the result of interactions with neighboring molecules. We then take the measured output of interest. The same applies to molecular spectra.2). Assuming that the transfer function t is both reproducible and known.
likewise. and again call ForwardFT. copy the data in column D (for output). Likewise. Label the resulting columns (containing the Fouriertransformed output) freq. (7) In yet another column copy the frequency (from one of the columns labeled 2 2 freq). (3) Instead of multiplying the Fouriertransformed input and filter signal.1 under point (2). a. and b. instead of (be+ad). together with a blank column.. one offering to apply an adjustable von HannlTuckey window and. (4) Again copy the column for time (from column A). (3) Calculate the average of the filter function with =AVERAGE (). we 2 have (a+jb)/{c+jd) = (a+jb){cjd)/(c2+d2) = (aC+bd)/{c 2+d2 ) +j(bcad)/(c2+d ). (6) Instead of the instructions under point (7) of exercise 4. like multiplication.5. and compare your result with the original signal in column B.5. you had earlier calculated (aebd). then recalculate the data in column D.4.Ch. highlight them plus a third blank column.2. as instructed under exercise 6. call InverseFT.2. b/ a and. Because they are both complex quantities. but that deconvolution.1 under point (5). Exercise 6. c. like division. replace the input from column B by the output from column D.1.1: (1) ModifY. now compute (bead)/ (c 2+d2 ).e. is not: a/ b. i. in the next columns calculate the complex division (aC+bd)/(e +d ) and 2 2 (bcad)/(c +d ) respectively. all in order to facilitate Fourier transformation..4.1 as follows. and b. (2) In new columns copy the data from columns A (for time) and D (for output). Extend the times in column A to 255.j:. Then highlight them. we now divide the Fouriertransformed output and filter signal. in the next column. (2) Instead of column B. then proceed as before by highlighting them plus a third blank column. . (4) Therefore. by clicking on OK or pressing Enter. a second to zero out high frequencies. 6: Convolution. Exercise 6. and by calling ForwardFT. or copy and then modifY. use copies of the data from columns A (time). spreadsheet exercise 6. Deny both by approving the default 0.) DeconvolveFT condenses all this into a single macro operation. a. (6) Now that you have Fouriertransformed both the distorted output signal and the filter. according to the instructions in exercise 6.j:. and next to it calculate the filter value (from column C) divided by the justcomputed average. we need to divide the two.2: (l) Use the spreadsheet of exercise 6. i. (5) Highlight these two new columns. (5) Proceed as before by highlighting these three columns (one for frequency. is always commutative: a x b = b x a and a ® b = b ® a. now compute (ae+bd)I (e 2+d2). D (output). Label the resulting columns (containing the Fouriertransformed output) freq. and call ForwardFT. (You will encounter two input boxes. deconvolution. and call DeconvolveFT. b 0 a.5. and C (filter). and two containing the real and imaginary parts of the quotient). Label the resulting columns (with the Fouriertransformed filter) freq. similarly extend the computations in columns B and C. a 0 b.4. in the columns where. Since they are both complex quantities. we have (a+jb)/(C+jd) = (a+jb)(cjd)/ (c 2+d2) = (aC+bd) / (c 2+d2 ) +j( bead) / (e 2+d2). using values from the columns labeled a through d. if you decline that. and d respectively.e. and timefrequency analysis 301 Finally we note that convolution.
e. as in the case of Fig. . call the custom macro DeconvolveFT. 6. 6. 6. (4) Can DeconvolveFT.e. Do this for the various transfer functions used in exercise 6.4.4 indicates where the shoe pinches.1 with Figs. and plot your result You should recover the original exponential with very little distortion as the real result. but treats r and t differently. 6. signal.4: (1) It is easiest to add to the spreadsheet of exercise 6.2.2.5.5. (3) Now use three more columns. it is a tall order to want to recover peaks with a standard deviation of 1 when they were first convolved with a 100 times broader peak. and in them deposit instructions that copy the data in columns A through C. for a transfer function with a standard deviation of 100.1. one to copy time or #.5. even in the complete absence of noise.5.5. and plot the result. 6. Exercise 6.1 and 6.ld all peaks but the first remain strongly distorted.1d.1. together with a negligible imaginary result. 6. In column E copy column A (i. Repeat columns for #. Exercise 6.1d. and in colmnns F and G copy columns D and C respectively. However. de Levie. and enter the result in the graph made under point (2).4. cannot recover the information blurred by the earlier convolution when the characteristic width (expressed. you will now fmd results similar to those in Fig. and recovered sizable parts of the narrower peaks in panel c. the distortion is primarily associated with truncation errors in the transfer function t.g.4. in Fig. and transfer function. Still. Admittedly. there must be a reason why recovery of the original signal is so poor.. as its standard deviation) of the convolving and deconvolving function is larger by about an order of magnitude than that of the feature involved. in E3 place the instruction =A3 and copy this down to row 2050). i.4. the third to copy the binarized transfer function.5. (There is nothing special about the number 16. Other integer powers of 2 work equally well. Now your result is quite different from that in Fig. the next to copy the result you just found.5.2 shows that deconvolution has almost completely restored the original test function in panel b.4.3 deals with noisefree.5. in the third column.2d. Note that deconvolution does not commutate.. H3:H2050 vs.2. It looks as if deconvolution.2. call InverseFT. 6. instead of an instruction such as =B10 in the tenth row you now use = (Int (16*BIO) /16. (2) Highlight E3:G2050. synthetic data..5.302 R.e. compare Fig. Comparison of Fig. Apparently. Advanced Excel for scientific data analysis (8) All that still remains to be done is the inverse Fourier transformation.5. and plot your result.3: (1) We continue with the spreadsheet of exercise 6. 6. If there you obtained data resembling Fig. either E3:E2050 or A3:A2050. since exercise 6. Exercise 6.4. 6. Highlight the justmade three columns. The curve you get is not very much different from what you found in Fig.4. The only difference is that.) (2) Call ConvolveFT. This will make the transfer function exactly expressible in terms of binary numbers. as does the simple rectangular transfer function.
.........8 ...6 0..2 0.1.1: The deconvolved test function for (from a to d) increasingly broad transfer functions......2 1024 512 1536 2 8 0. 6: Convolution.Ch.4 0..2 0.' 0..... with standard deviations of (a) 0.. 0.2 . and timefrequency analysis 0......6 b 0..6 c 0.. 6.' 0....8 ........ (b) 1.. .. 0....8 . deconvolution. The thick gray line in panel d repeats the test function. 0. (c) 10..1... the thin black line in d is a reminder ofhow convolution had distorted it.' 0.4 303 a 0.2 .' .2 '.. . 0....4 0...2 512 1024 1536 2 8 0...2 '' 2 8 Fig.....5.4 0..6 0......8 ... and (d) 100.
0.5. Advanced Excel for scientific data analysis a ·'.4 0.2 de Levie. . Clearly. (b) Deconvolution with the same (binarized) transfer function (black points) recovers the original data (gray).2 0.2 0. . • e I Q 0.4 0. (c) Normal convolution followed by deconvolution with the binarized transfer function (shown here as black points) yields results that (because ofthe mismatch in t) are slightly worse than those shown in Fig. ·.) The same applies when binarization is used only in the convolution step.8 0. 6. (aJ The original function (gray). ·•• .6 0..' ••• e e e.1d..2d for otherwise identical conditions.5. 6. 6.0 0.6 0.4. • 512 1024 1536 8 b I .0 512 1024 1536 8 Fig. or when different binarizations are used.0 0. as shown in Fig.2: Using a binarized transfer function allows total recovery.6 0.2 0. and its convolution (thin black curve) with a binarized transfer function (as shown here) is no different from that obtained with a nonbinarized one.1 • • • .4 0.8 0. identical binarization must be used for both convolution and deconvolution. even when the transfer function has a standard deviation of 100.. (The original data are again shown in gray.8 0.304 R. .2 0.2 0.
Ch. 6: Convolution, deconvolution, and timefrequency analysis
305
While one could exploit what we have just found in Fig. 6.5.2 for cryptographic encoding and decoding, we cannot use it in experimental science because, unlike the situation in a synthetic example, in data analysis we seldom have sufficient control over the distorting process. Moreover, experimental data contain irreproducible noise far in excess of truncation noise. For all practical purposes we therefore have to live with the rather unsatisfactory results of Fig. 6.5.1. These are summarized in table 6.5.1, which lists the standard deviations Ss of the nine Gaussian peaks in Fig. 6.4.1, and the corresponding Sf values of the four Gaussians used to represent the transfer function in the convolutions and subsequent deconvolutions in Fig. 6.5.1 ad. Practical signal recovery by Fourier transform deconvolution of noisefree signals is possible only for signals with characteristic widths not much smaller than that of the distorting transfer function t.
S=
•
100
0.1
50
30
20
10
5
3
2
(a):
Sf
=
(b): St= I
(c); s,= 10
(d):
Sf
=
100
Table 6.5.1: A semigraphical display of signal recovery after Fourier transform convolution and deconvolution, as a function of the standard deviations s, of the Gaussian signal peaks and Sf of the equally Gaussian transfer function. Quality of recovery is indicated crudely in the style of Consumer Reports, with solid black circles indicating excellent recovery. Encircled solid circles ® identify where Sf = s,.
• • • • • • • • ® • • • • • • ® • • • • • • • • • • ® • • • 0 n
Now for the bad news. Convolution is an integration and, as such, attenuates noise: smoothing blurs many details. On the other hand its inverse operation, deconvolution, is akin to a differentiation, and tends to accentuate noise. We already saw the extreme sensitivity of the method to truncation errors in t, and below we will illustrate the effect of Gaussian noise added to r. Similar effects are observed by truncating or rounding the data for r to a limited number of digits, similar to what happens when analog signals are digitized. Truncation and rounding follow a triangular rather than a Gaussian distribution, with welldefined limits, but that detail is of minor importance here. For noisy signals, deconvolution usually trades lower signal distortion for much enhanced noise. That may not be a problem if the enhanced
306
R.
de
Levie,
Advanced
Excel for
scientific
data
analysis
noise can be removed subsequently, e.g., by fitting the data to a mathematical model expression, as was done by Hufner & Wertheim, Phys. Revs. BII (1975) 678. Otherwise we may have to use filtering to reduce the noise, as illustrated in exercise 6.5.5.
Exercise 6.5.5: (1) This will be a fairly wide spreadsheet, which (including a few empty 'spacer' columns) will take up more than a full alphabet. It is therefore best to start with a fresh sheet. In colunm A, under the heading #, deposit the numbers 0 (1)2047. (2) In column B, labeled s, generate the function s, or copy it from another worksheet. Reminder: to copy a value from, say, cell B2 ofSheetl to cell C3 of Sheet2 in the same workbook, place in cell C3 of Sheet2 the instruction =Sheetl! B2. To copy from Bookl Sheetl cell B2 to Book2 Sheet2 cell C3 use =' [Bookl] Sheetl' ! B2 in the receiving cell. (3) In column C generate or copy a transfer function t. (4) Highlight the data in these three colunms, and call ConvolveFT. This will yield r in column D. (5) In column F generate Gaussian (or other) noise, e.g., with Tools => Data Analysis => Random Number Generation. (6) In column H copy the numbers from column A. In colunm I copy r from colunm D, and add to it a ('noise amplitude') multiplier times noise from column F. Figure 6.5.3a illustrates what you would get if you used for input the same data as shown in Fig. 6.4.2c. plus noise with a standard deviation ('noise amplitude') of 0.1. We now have set up the problem. (7) Highlight the data in columns H through J (the latter being empty) and call ForwardFT. This will deposit the corresponding frequencies in column K, and the real and imaginary componentsK andR" in colunms L and M respectively. , (8) In column 0 calculate log M2 = log [(R )2+(R,,)2], and plot it versus frequency (in column K). (9) Find approximate functions for the signal (in colunm P) and the noise (in column Q). For the data shown in Fig. 6.5.3b we have used (and shown) log S2 = 2 125 Jt[ and log N2 = 4.6. In column R then compute the Wiener filter as
10 1og(S2) / (1 Olog(S2) + 10IOg(N 2 ») =1021251/1/ (1021251/1 + 104·6) .
(lO) In column T copy the frequency from column K, in column U calculate the product of R' (from column L) and the Wiener filter (from colunm R), and in column V place the corresponding product ofR" (from column M) and the Wiener filter. (11) Highlight the data in colunms T through V, and call InverseFT. This will produce the numbers 0 through 2047 in column W, and the filtered real and imaginary components ofr in colunms X and Y respectively. The data in colunm X are shown in Fig. 6.5.3c; those in column Y reflect computational imperfections and should therefore be quite small. (12) In colunm AA copy the numbers 0 (1) 2047, in colunm AB the data from colunm X, and in column AC the transfer function t from colunm C. (13) Highlight the data in colunms AA through AC, and call DeconvolveFT. Colunm AD will now contain the deconvolved data, see Fig. 6.5.3d.
Ch. 6: Convolution, deconvolution, and timefrequency analysis
0.8 0.6 0.4 0.2 0.0 0.2 0
307
b
5
 10 0.05
0.03
0.01
0.01
0.03
0.05
0.8 . . . .                    , 0.6
0.4
d
0.2
0.0 _~c__,r_~~H__l~'1W__W~_l
0.2
..L_ _'::...:.::_ _ _..:.=''_ _=.:::.:::.::...._ _=:=1
Fig. 6.5.3: Wiener filtering followed by deconvolution of a noisy signal. (aJ The undistorted signal s (gray) and its convolution r with a Gaussian of standard deviation 10 (displayed in Fig. 6.4.2c), to which was added Gaussian noise, of zero mean and standard deviation O.l (black circles). (b) The central part of the power spectrum ofthe noisy r. This plot is used for the visual estimates ofthe parameters ofthe Wiener filter. (c) The resulting, smoothed r (black curve) with the corresponding noisefree curve (gray). (d)Upon deconvolving the smoothed data we obtain the final result (black curve) with the corresponding noisefree curve shown in gray.
308
R.
de
Levie,
Advanced
Excel
for
scientific
data
analysis
0.8  r                  . , 0.6 0.4
0.2
0.0 0.2
.,...1I!I'''z:._I_~
____f_30'.....,..
512
1024
1536
2 8
o r+~;_r~~
b
4
8
o
L _ _ _ _ _.l_..L._'_ _ _ _ _'
12
0.03
0.02
0.0 I
o
0.01
0.02
0.03
0.8 . ,                  . ,
0.6 0.4
0.2
0.0 T'"""""'I _ _ f _     +     ' l 0.2
L..._ _
'S::...:1..=.2_ _ _10.=.24 '_ _'=..::....._ _== :..;:: 
0.8 . ,                 ,
0.6 0.4 0.2
0.0
t;.~1JIL... _ _'::...:..=. _ _ _:..:::..=.'_ _
d
0.2
..:.==_ __'='
Fig. 6.5.4: Wiener filtering followed by deconvolution of a noisy signal. All data are as in Fig. 6.5.3, except that the original convoluting function had a standard deviation of 100 (see Fig. 6.4.2d), and the noise amplitude (the standard deviation ofthe added Gaussian noise) was only 0.02.
The above procedure is rather laborious. Its tedium could be reduced somewhat by constructing a custom macro, which in this case would require two parts, one to generate the power spectrum, log [CR,)2 + (R,,)2] , the second (after operator intervention to distinguish between signal and noise, and to approximate both components in terms of mathematical
Ch. 6: Convolution, deconvolution, and timefrequency analysis
309
functions) to finish the process. We will leave this as an exercise to the interested reader. What is worse than its tedium is that even the small amount of noise left after Wiener filtering interferes with the deconvolution which, after all this effort, often produces only a relatively minor correction for the original distortion.
0.8 0.6 0.4 0.2 0.0 0.2 0.8 0. 6 0.4 0.2 0.0 0.2 0.8 0.6 0.4 0.2 0.0 0.2 0.8 0.6 0.4 0.2 0.0 0.2 8
Fig. 6.5.5: The use of an adjustable von Harm filter for the deconvolution ofthe noisy trace illustrated in Fig. 6.S.3a. The input signal r is shown in gray, its devolution in black. The convolving and deconvolving t is a Gaussian curve of standard deviation lO; the signal has added Gaussian noise of standard deviation 0.1. The svalue used is noted in each panel.
310
R.
de
Levie,
Advanced Excel for
scientific
data
analysis
0.8 ,                , 0.6 0.4 0.2 0.0 0.2
.L._ _ _::....:..=_ _ _..:..=''_ _....:..::..::...::...._ _~
0.8  ,                 ,
s
0.6 0.4 0.2 0.0
=
15000
N~~+~=_+____l"~
 L _ _ _
0.2
..::. 12 _ _ _I...::='_ _'15=" _ _~ 5..:..;" : O24 ..::. 36
0.8  ,                     , s = 18000 0.6
0.4
0.2 0.0
.li!!!!~'+.....:..:=_t__tL~
L _ _ _
0.2
.:. 12 5.:..::...._ _....:I O24 c:::.:..:._ _....: 53 6 _ _= 1.::;:::..:~
Fig. 6.5.6: The use of an adjustable von Harm filter for the deconvolution ofthe noisy trace illustrated in Fig. 6.5 Aa. The input signal r is shown in gray, its devolution in black. The convolving and deconvolving t is a Gaussian of standard deviation lOO; the signal has added Gaussian noise of standard deviation 0.02. The svalue used is noted in each panel. When s is too small, the result oscillates; when it is too large, there is no deconvolution.
This is illustrated in Fig. 6.5.4, where we have used the same spreadsheet for the more strongly distorted case of Fig. 6A.2d. Once the spreadsheet is set up and properly labeled (it helps to color code the data blocks to be highlighted for macro use), repeated operation is fairly easy, even though it still involves four macros (ConvolveFT, ForwardFT, InverseFT, and DeconvolveFT) and making the necessary adjustments in the parameter estimates Sand/or N 2 of the Wiener filter. Nonetheless, despite Wiener filtering, the method often tolerates very little noise, lest it yields wildly oscillating results.
Ch. 6: Convolution, deconvolution, and timefrequency analysis
311
When the effort involved in Wiener filtering is not warranted, some nonspecific, 'general' filtering can be had with two filters included in the custom macro DeconvolveFT: an adjustable von Hann filter or, when this is rejected, a sharp frequency cutoff filter. The application ofthe von Hann filter is illustrated in exercise 6.5.6 and in Figs. 6.5.5 and 6.5.6. By increasing the value of the filter parameter s one can make the von Hann filter arbitrarily narrow, in which case it approaches a delta function. The deconvolution macro then merely reproduces its input. Similar results can be obtained with the sharper frequency cutoff filter, but its abruptness tends to enhance oscillations in the result.
Exercise 6.5.6:
(1) This will be a short continuation of exercise 6.5.5. In column AF copy the numbers from column A, in column AG copy the noisy test function r+n from column I, and in column AH copy the transfer function t from column C. (2) Highlight the data in columns AF through AH, and call DeconvolveFT. Select an appropriate filter parameter s; some filter curves are displayed in Fig. 5.6.1. You will fmd your answer in column AH. Figure 6.5.6 shows that, when the noise is too large, this approach does not work, but merely causes the result to oscillate around the input curve.
6.6 Iterative van Cittert deconvolution
An alternative, relatively robust approach to deconvolution that, for single data sets, is often easier to implement than deconvolution via Fourier transformation, can be based on the rather general principle of'operating in reverse'. In exercise 4.0.1 we illustrated this for a number, by computing a cube root iteratively when only knowing how to calculate a cube. In the present case we try to find the unknownfunction that, when subjected to convolution, will yield the observed result. This approach was first applied to deconvolution by van Cittert et al. (Z Phys. 65 (1930) 547, 69 (1931) 298, 79 (1932) 722, 81 (1933) 428 and, with modem computers, has become much more practical. The idea is as follows. Say that we have a measured spectrum r, and the transfer function t with which is was convolved, and for which operation we want to correct r. In other words, we seek the undistorted spectrum s = rOt given the experimentally measured functions rand t. To this end we consider r the zerothorder approximation So to 5, and convolve it with t to form q\ = So Q9 t = r Q9 t. This obviously goes the wrong way: q! is even more distorted than r. But we now assume that we can get a better approximation to s by adding to So the difference between r and ql, i.e., that s\ = So + (r  ql) = 2r  qi will be a closer approximation to s than So.
312
R.
de
Levie,
Advanced
Excel
for
scientific data
analysis
Repeating this process, we compute q2 = s[ ® t, then add the difference (r  q2) to s[ and obtain S2 = s[ + (r q2) = 3r  q[  Q2, etc. In general, after n such steps, we will have Sn = (n+l)r L~ qi. Van Cittert et al. already studied the convergence behavior of this method, but it is a fairly complicated matter, for which you may want to consult P. B. Crilly's chapter 5 in Deconvolution ofImages and Spectra, P. A. Jansson ed., Academic Press 1997 and the references therein for recent results. Exercise 6.6.1 illustrates this approach.
Exercise 6.6.1:
(1) In a new spreadsheet, enter in row 1 the following column headings: #, s, t, r, leave a column blank, then #, r, t, ql, blank column, #, sl, t, q2, blank, #, s2, t, q3, blank, etc. (2) In the columnunder#, say A3:AI8, deposit the number sequence 0 (1) 15. (3) In cell B3 then use the instruction =0. 9*EXP (0.5* (A3B) "2) to calculate a signal, for which we here use a simple Gaussian with amplitude 0.9, standard deviation 1, centered at 8. Copy this instruction down to row 18. (4) Place =EXP(O.25*{A3) "2)+EXP(0.25*(16A3)"2) in cell C3. This instruction again uses wraparound to avoid a phase shift, by exploiting the fact that the Fourier transform assumes a cyclic repeat ofthe signal. (Incidentally, this only works for a symmetrical transfer function.) The amplitude ofthe transfer signal is immaterial, since ConvolveFT will normalize it anyway. (5) Highlight A3:CI8, and call ConvolveFT to compute r in column D. (6) Copy #, r, and t into the next columns, e.g., with the instructions =A3, =D3, and =C3 in cells F3 through H3, to be copied down to row 18. (7) Highlight F3:HI8, call ConvolveFT, and thus calculate q, in column 1. (8) Copy # to column K, and tto M, and in L calculates! as s\ = 2r qj. (9) Convolve s, with t to obtain q2 in column N. (10) Repeat the process: copy # and t into columns P and R, and in column Q calculate S2 = S2 + r q2. Then convolve to find q3, and so on.
Figure 6.6.1 illustrates that S] is indeed a better approximation to s than is r, S2 is better than SJ, etc. For noisefree curves the method usually converges onto s, albeit slowly, even though convergence cannot be taken for granted for arbitrary transfer functions. The first few iterations are usually the most effective, and are easily performed on the spreadsheet. We illustrate this here with 28 synthetic Gaussian peaks that crudely mimic those of Fig. 9 in chapter 7 by P. B. Crilly, W. E. Blass & G. W. Halsey in Deconvolution ofImages and Spectra, P. A. Jansson, ed., Academic Press 1997. In order to make the exercise more realistic, we will add some noise to r.
Ch. 6: Convolution, deconvolution, and timefrequency analysis
313
d
o ~~~~+~~~
o
8
16
o
16
o c
8
16
o
8
16
f
o .....~~+.:~~Ml
o
8
16
0
8
16
Fig. 6.6.1: The van Cittert deconvolution method uses only convolutions to achieve its goal by working backwards, iteratively. (a) An assumed Gaussian signal s (gray) is convoluted by a Gaussian transfer function t (not shown here) to yield a measured result r (heavy black curve), which will be our starting function. (b) Convolution of r with t produces an even more broadened curve, qj, shown as a thin line with open circles. (c) The difference between ql and r is then used to generate a better approximation Sl of s, shown as a slightly thicker line with solid circles. The process is then repeated (d). Three cycles of this iterative procedure are shown, with panel (e) showing q2 and S2, and if) illustrating q3 and S3' The latter is certainly much closer to s than r (shown as a thick black curve). For numerical details see exercise 6.6.1.
Exercise 6.6.2: (1) In column A of a new spreadsheet place the numbers (0 (I) 511, and in column B generate a synthetic signal s using a number of Gaussian peaks of the form a exp[~b(xc)]. In the examples shown in Figs. 6.6.2 through 6.6.7 we have used b = 0.1 throughout (i.e., a standard deviation s = Ih/(2b) = ./5 ~ 2.2), and the parameters a and c as listed in table 6.6.1.
314 R.
1,0
o .~
de
Levie,
Advanced Excel for
scientific
data
analysis
s
0,6
0.4
0.2 0,0
0.2
0 1.0 0.8 0,6 0.4 0.2 0.0 0.2
0
SO
100
ISO
200
250
300
350
400
450
500
50
r
SO
100
150
200
250
300
J50
400
450
500
550
1.0
o.~
q,
0.6 0.4 0.2 0,0 0.2 0 50 100 ISO 200 250 300 350
~oo
450
500
550
Fig. 6.6.2: Top panel: the mock undistorted 'spectrum's, repeated in gray in the other panels. Middle panel: the convolution of s with a single Gaussian t, plus noise, simulating a measured spectrum. Bottom panel: q\ as obtained by convolution ofr with t.
a= 0.5 0.5 0.6 0.2 0.25 0.l5 0.6 0.6 0.25 0.l5 0.5 0.6 0.4 0.2
c= 28 37 49 73 91 llO 127 142 172 178 205 212 216 238 361 379 385 390 433 469 496
a= 0.2 0.25 0.2 0.7 0.6 0.57 OJ 0.03 0.6 0.4 0.35 0.6 0.6 0.07
c= 248 262 293 310 320 329 341 Table 6.6.l: The parameters used in exercise 6.6.2 for the synthetic 'spectrum'. (2) Column C for the transfer function t should again contain a simple Gaussian, split so that it has its maximum at the beginning ofthe data set, and its other half at the end ofthat set. In other words, for the 5l2point data set used, the formula to be used is exp[b'?)] + exp[bt(t512)2] iftruns from 0 to 5ll. In our example we have used b l = 0.03125, for a standard deviation of4. (3) Highlight the data in columns A:C, identify them (e.g., by giving them a light background color), and call ConvolveFT. This will place the function r = s ® t in column D. (4) In the next colunm, E, deposit Gaussian noise of zero mean and unit standard deviation. (5) In colunm F repeat the order numbers from colunm A, in colunm G copy r plus a fraction ofthe noise from colunm E (in our example we have used 0.02) to make the noisy measured signal rn, and in column H repeat t. The data for rn in colunm G will be our points of departure. (6) Highlight (and provide background color to) the data in columns F:H, and call ConvolveFT to compute q\ rn (is) t, which will appear in column 1.
0=
Fig. 6.6.3: Successive estimates ofs by iterative deconvolution ofthe noisy simulated spectrum r shown in the middle panel of Fig. 6.6.2.
(7) In colunm J again copy the numbers from colunm A, in colunm K calculate 2rn  Ql, and in colunm L copy t from colunm C. (8) Highlight (and color) the data in colunms lL, and call ConvolveFT to compute (in colunm M) the function Q2 = SI ® t. (9) In colunm N copy the numbers from colunm A, in colunm 0 calculate S2 = SI + (rn  fa). and in colunm P copy t from colunm C. (J 0) Repeat the instruction in (8) and (9) to calculate first q3 = S2 ® t. then S3 = S2 + (rn  Q3), etc. (ii) Plot your results. Ifyou have used the numerical values listed, your results should resemble those in Figs. 6.6.2 and 6,6.3, (12) Now if you want to modify the signal s and/or the transfer function t. just change them in colunm Band/or C, then highlight the colorcoded areas onebyone, going from left to right, and call ConvolveFT, In the same way you can change the noise level in colunm G and then call ConvolveFT starting with colunms F:H.
SI =
Figure 6.6.3 illustrates the usual problem with deconvolution: noise. While the signal 'grows into' the peaks with each successive iteration, the noise also grows, but faster. To understand why this happens, consider that St = Tn + (rn qd, and that q\ is much smoother than rn> see Fig. 6.6.2. We now write rn = ro + n, where To represents r in the absence of noise, and n is the noise. Let the convolution reduce the noise n in Tn to an in qt, where laI« 1. Then Sl = 2rn  q\ = (2ro  ql) + (2n  a) n,
316 R.
de
Levie,
Advanced Excel for
scientific
data
analysis
which has almost twice as much noise as r since I aI « 1. The same argument applies to subsequent iteration stages, so that the noise increases almost linearly with the number of iterations while Si creeps up on s at the steadily decreasing rate of an asymptotic approach. But when the noise grows faster than the signal, the iteration cannot converge, and the process ultimately becomes oscillatory, completely obliterating the signal.
Exercise 6.6.2 (continued): (13) In any iterative process we need to have a termination criterion, otherwise the process can go on indefmitely. Since our starting function is rn. and the procedure is based on trying to match rn with qj, calculate the sum of squares ofthe residuals between the data in colunms G and I, G and M, G and Q, etc., using the instruction =SUMXMY2 (junctioni,jimclion2)These numbers duly decrease upon successive iterations. This suggests that you might be able to use this SSR to determine when to stop the iteration: whenever it becomes smaller than a given value, or starts to go up. (14) But now try to use the same instruction to determine the sum of squares of the residuals between s (in column B) and successive versions of Si (in colunms K, 0, etc.). This is not realistic, because we normally have no access to s which, after all, is the function we seek. Still, in this simulation, it is instructive to take a look: in the presence of sufficient noise, as in our numerical example, this SSR increases with successive iterations. Therefore, using the sum of squares of the differences between rn and qr as suggested under point (13) can be misleading, yet in practice, we have no other option when using this method.
A possible way to get around noise is to incorporate smoothing at every iteration step, as was done, e.g., by Herget et aI., 1. Opt. Soc. Am. 52 (1962) 1113. Now the spreadsheet becomes much more complicated, and we therefore use a custom macro to take care of the busywork. DeconvolveIt performs the convolutions interspersed with smoothing steps, using a moving least squares parabola of variable length to keep the noise in check. Since the data must be equidistant for the Fourier transform convolution, we can use the method of section 3.15.
Exercise 6.6.2 (continued): (15) In the next three colunms copy the data from columns A (the numbers representing the independent variable), G (for rn), and C (for t), then highlight the data in these three new columns, and call DeconvolveItO. Accept the default nofilter value 0, and let the macro run. You can follow its progress by observing its intermediate results, displayed in the lefthand comer ofthe bar below the spreadsheet. After your patience has grown thin, interrupt the program with the Escape key (Esc), terminate it, and plot. You will find that the result oscillates wildly, and shows no inclination to converge. (16) Again highlight the data in those three colunms, and call DeconvolveltO, but now use different lengths ofthe parabolic filter. If the noise level is not too
for comparison (bottom panel) the results of deconvolving with DeconvolveFT.0 .. .. such as in the top panel of Fig. (20) Fit simple curves to log (S2) and log (N 2)..'"' so 100 ISO 200 250 300 3S0 400 4S0 SOO SSO 1. This will create three now columns........4.2r). O ..... the iterative method clearly outperforms direct deconvolution in this example. and call ForwardFT.. (19) In the next column compute log M2 = log [(R')2+(R")2]...6 Dcconvot . ...... and is illustrated below and in Fig.::.6..4: Deconvolution of a noisy signal by the iterative macro DeconvolveItO. That gives a fairer comparison of the two methods... deconvolution.0 hllU t¥'\tlIl'MW!!rilil. 08 0.1~~~fYlf. but you should be able to find a workable result with the builtin.6. Still. you should be able to find a filter length that will generate a useful deconvolution. adjustable von Hann filter.2 1 . Instead of relying on the smoothing filters that are built into these macros. Again. cllO OA 0.....w~~__l ·0. 6. 6.. 6: Convolution. ..." .see Fig....... also try DeconvolveFT... 6. one can use Wiener filtering followed by filterfree deconvolution..0.. Again... without filter it will not converge (if you have noise comparable to that in Fig.and plot it versus frequency.....:~~I/ r_\l~~I!fl~.. Exercise 6.6. and one for R". Neither method would yield a convergent result without filtering.840111 2 + 105.:...4.. translates into 101... . in our example..2 0. 1. then highlight the data in these two new columns plus the one to its right (which should be empty).. and in the next colwnn compute the Wiener filter 1010g(S2)/ClOl0g(S2)+ lo1 og(N »which.6..5) ...... one for R'.J \ ·0..2 1 ...JOI) 50 100 150 200 250 o .. and the nature of the filters is obviously quite different. using the adjustable von Hann filter with s = 23.. see the bottom panel of Fig.6 IJcconvolvcl:""f s 23 0. 6.. one for frequency. which required 28 iterations (top panel) and. 6... 6.l 350 400 450 500 sso Fig... for comparable noise. and timefrequency analysis 317 high..Ch.JlI'fI______\j!ll&IL\ifJJ_......6....6..lIE\.R 0..6. the iterative method takes more time but comes out ahead of the direct one in terms of signal recovery.. and C (for t).....2 0. By using different filter parameters.2 (continued): (18) In the next two columns again copy the numbers from column A and from column G (for Tn).4 0.6.0 »... both methods have some flexibility.5..:..... . (17) For comparison.8401/1/(101.
When these simulated data represent an actual optical spectrum.4 0. 6. Replace the latter by copies oft. the iteration was cut short at 100. The second of these contains the filtered r. o 2 4 8 0.6.6.3 Fig. The two results are compared in Fig.6: The results of Wiener filtering followed by either iterative deconvolution (top panel) or straight Fourier transform deconvolution (bottom panel).2 0. while the third contains only junk data that should all be zero. (23) For comparison also run DeconvolveFT. In this example.6 Dccon\'olvcFf 0.5: The power spectrum of the noisy input data can be fitted to the simple. 6.6.g 0. then highlight these three columns and call DeconvolveltO.3 0. approximate expressions 2 log(S2) = 1. We note that both methods produce negative values. and in the next two calculate the product of R' and R" respectively with the data in the Wiener filter column. 1 0.2 0.0 O. mostly in regions where the signal is small. the iterative deconvolution again holds its own against direct deconvolution.6.5. if you let it run. such negative values would be physically meaningless.0 DcconvolvchO .0 O. Plot your result.8 40llf and log (N ) = 5. de Levie.1 o 0. You may have to interrupt it fairly early.0.2 0.6.2 0. 6. it can consume considerable time but most likely make a worse fit.6 0.0 Fig.6. In the example shown in Fig. Advanced Excel for scientific data analysis (21) In the next column again copy the numbers from column A.g 0. as shown.4 0. and can of course be lopped offifthat . 6. 1.2 0 1. (22) Call InverseFT.318 R.
0 ·0. is the primary source of these extra peaks. the two peaks at 28 and 37 are clearly resolved by deconvolution. Jansson et aI. 1.Ch.6. as it would carry us too far from our simple goal of illustrating what deconvolution is and does. It would seem that the Fourier transformation. and that cutting off their negative portions makes them more convincingly look like peaks rather than processing nOIse. Opt.8 Deconvoh'el1 ] M 0. because the correction occurs only in regions of weak signals. We will not do so here. .7 with Fig. Soc. plus two satellite peaks. in section 6. 1. and 104 that do not occur in the original. 6. 00 150 200 250 300 350 400 450 500 sso Fig. deconvolution has introduced peaks centered at 5. If the latter macro is used.6. and is implemented in DeconvolveIt2. in order to compare Fig. and the peak at 49 is almost baselineseparated.1. the two overlapping peaks at 172 and 178 are replaced by two that are more clearly resolved than the original ones. by looking for sinusoidal components. Such a modification was introduced by P. say 1. 6.0 0.7. A. 60 (1970) 184.2 0.6. and time!requency analysis 319 makes you feel better. At this point it may be useful to look back. 58 (1968) 1665. Instead. 82.6. And so it goes for the rest of the spectrum: the shapes of existing peaks are often improved. More efficient iterative deconvolution can often be achieved by introducing a relaxation factor. Am. Likewise. On the other hand. DeconvolveItl removes the negative values during each iteration. 6.7 we will explore an alternative approach that can give superior results in those cases to which it is applicable.2r and the data of table 6. but additional. The systematic bias introduced by doing so is believed to be small. Starting from the left.. the data should be scaled to fit appropriately in the range from 0 to 1.4 0. deconvolution.6.2 0 50 100 uner Wiener filtering 1 lI er. 6. Results so obtained with Deconvolveltl are shown in Fig. fabricated peaks appear as well.7: The same data after Wiener filtering followed by use ofDeconvolveltl. One can combine a weighting function that discriminates against negative points with one that removes data above a given limit. 6: Convolution. at 162 and 190.
0.4 0..6 0...:..6 0..0 ~"~~+~~I11=+"''... 6..2 .8 1 ..6.8 1... iterative deconvolution has limitations similar to those of direct deconvolution.0.8..." .... 6.6 0..:c'_ _'=. because the original signal s did not contain any details that were lost completely in the original distortion...0. The above example involves a fairly typical.1.. Shorter runs were obtained by changing the iteration limit in the Deconvolvelt macro..132 iterations 0..' .:.::.0..2 .::... Advanced Excel for scientific data analysis 0...:.6. and merely replaced the final deconvolution step by DeconvolveItO. In this respect.5....4 .. relatively easy deconvolution.. 0.. 0 .:"'_ _ = 0.._ _ _. as illustrated in Fig.....2 02 ..2 ·0. illustrating that the builtin termination criterion may be gross overkill.0 DcconvolvcltO with 8....0.2 ..........320 R..::..2 " .''.:.... The results are not significantly different from those of Fig..:"'_ _ ="'." ' _ _'="_ _ _.:..L_ _ _. .... DeconvolveJ tO stopped after 100 iterations 0.2 0. 6. 6.8: Iterative deconvolution of a noisy signal after Wiener filtering... .  .. de Levie.." . ._ __ ""=_ _'="_ _ _. ..::.= 1...::. DeconvolveltO stopped after LOOO iterations 0...' ...5..3 after Wiener filtering.8 1 ....H 0.._ _ _.4 0...1..3d. where we use the noisy test data set illustrated in Fig. .:!W~III'''' .6 0..4 0. DeconvolveltO slopped after 10 iteration s O . .._ _= ig.
Sometimes the signal itself may suggest how best to approach it. The above comparisons all involved triply Gaussian data (based on Gaussian curves.8 x 1025 for 8.000 steps. on the kind of blurring for which correction is sought (such as amplifier distortion. but that there is relatively little gain (and much time to lose) in going beyond the first few iterations. An alternative approach introduced by Grinvald & Steinberg. in two dimensions. Hlifner & G. camera motion. 6: Convolution. The resulting (considerably narrower and higher but also much noisier) curves were then fitted by nonlinear least squares to their theoretical line shapes.J x 106 for 10 steps. 2. using as its termination criterion that SSR (for the difference between the input function r and that calculated during each iteration) decreases. and timefrequency analysis 321 We note that the iteration stopped after 8. It is amusing to look at those SSR values: SSR = 4.132 steps.7 Iterative deconvolution using Solver The van Cittert deconvolution method is general but quite sensitive to noise.132 steps. and on the type of noise. K. If we see a spectrum that exhibits characteristic sinelike sidelobes. They deconvolved xray photoemission spectra of a number of metals in order to correct for the broadening effect of the spectrometer. optical aberration. plus Gaussian noise).4 x 10 20 for 1. We convolve the model signal Sm with the experimental transfer function t to obtain fm = Sm ® t. 59 (1974) 583 is much less sensitive to noise. with Gaussian blurring. Sm = j(ai). Whether this holds true in general cannot be answered in the abstract. yet the resulting de convolutions are rather similar. which will be assumed to be describable in terms of one or more model parameters ai. 6. 1. It also uses reverse engineering but requires an analytical (and therefore noisefree) model for the undistorted signal Sm. tip profile in scanning probe microscopy). we can estimate the width of the pulse that most likely caused it. Anal Biochem.e. Revs. and may therefore be most appropriate for classical (nearultraviolet to infrared) optical spectroscopy and chromatography. because it will depend on the nature of the signal.Ch. The above examples suggest that the iterative method can yield marginally better results. An early example of the application of the van Cittert approach was published by S.. and use that information for deconvolution.7 x 1014 for 100 steps. i. Wertheim in Phys. BII (1975) 678. deconvolution. may be applicable to astronomical data that show the equivalent rings around inherently pointlike object such as stars. and then use Solver to adjust the model parameters ai . Other methods may well yield other outcomes. The same approach. and only 9.
i.2. and in D101 a value for a/. noisy data to try the deconvolution.3. such as 1.03). to remind yourself to keep it clear.g. and result r. and r for the rank number # (which can represent time. (2) In A3 place the value 100. based on a single exponential decay.2 and fig. .. Copy it down to row 303. for a nonmanual program. In cell D97 enter an amplitude value (such as 1) and in cell D98 a value for a rate constant (e.02 in cell Nl 00 and 0. we can either rewrite Solver so that it can accommodate macros. 0.2. Place corresponding labels in colunm C. In cell FI03 use =D103+$0$100*0103 for a noisy response signal rexp. t. on purpose rather large to show the effect ofnoise. and label it in G I as s model.1: (1) Our example will be modeled after exercise 6. and an initial estimate for the rate constant in 198. The latter approach is illustrated in exercise 6. as anticipated by their labels. In a real application these simulated values should of course to be replaced by experimental data. Do not place any numbers or text in G3:Gl02. (9) In Gl03 place a model function. (5) For the transfer function t.2 and Fig. Advanced Excel for scientific data analysis by minimizing the sum of squares of the residuals between rm and the experimental (or simulated) r expo The requirement that Sm be describable as an explicit analytical function makes this method less widely applicable than the van Cittert approach. In row 1 place appropriate labels. in cell C103 place the instruction =EXP (1 * (LN (1+ (AI 03+4$D$1 00) / $D$101») A2) and copy this down to row 303..a/is used here in order to avoid a nonnegative argument in the logarithm for the specific values offtandafsuggested in (4). in A4 the instruction =A3+ 1. (7) Deposit Gaussian ('normal') noise (with mean 0 and standard deviation 1) in Nl 03:0303. 0. e. with accompanying labels in colunm H. as in exercise 6.e. etc). Exercise 6. de Levie. (4) In cell DIOO deposit a value for if. Place a guess value for the amplitude in 197. and in cell B 103 insert the instruction =$D$97 *EXP ($D$ 98 *A1 03) for the transferfunction t.7. (3) Go down to row 103. s.3. or (much simpler) perform the convolution using a function rather than a macro. Copy both instructions down to row 303. e.7. original (undistorted) signal s. e.03 in cell 0100. such as =$I$97*EXP ($I$98*A103). 7. The value 4 = 1 + if. filter or transfer function t. with accompanying labels in colunm C.. such as 0. but it also makes it much more immune to noise.g.g. such as 10. Solver cannot respond automatically to the effect of parameter changes that involve macros.322 R. (8) In cell EI03 place the instruction =C103+$N$100*N103 to simulate a noisy transfer function texp.1. which will write the convolution r in D103:D303. (6) Highlight A103:C303 and call the macro Convolve. Because macros do not selfupdate. and supply corresponding scale values. In cells AI:D 1 deposit the colunm labels #.02. 6.. This means that. bright yellow. and copy this down to row 303. wavelength.2.2.. 6.2. You now have a set of simulated.g. but instead fill it with. and copy this down to cell A303.
03. We have used a barebones custom function ConvolO to keep it simple. Then engage SolverAid to find the corresponding uncertainties.0302[ ± 0. Equal to Mig.014 and km = 0. even though it is rather wasteful of spreadsheet real estate. HI 03: H303) and place a corresponding label such as SSR= in HIOO. Denom. replace the instruction in nOl by the value ofN or by =COUNT (EI03: E202) . try to keep the convolving custom function as simple as possible. based on the average value of t. and Set Target Cell to noo. as can be seen in fig. deconvolution. and timefrequency analysis (10) In cell HI03 deposit the function =Convol(Gl03:G202. and in cell 1101 the instruction =SUM (EI03 : E202). N) Dim i As Integer Dim Sum As Double Dim Array3 As Variant ReDimArray3(1 To 2 * N) For i = 1 To N Array3 (i) = Array2 (N + Next i 323 1  i) Sum = 0 For i = 1 To N Sum = Sum + Arrayl(i .03 1 of the fit in r.1.0005 3 . and especially avoid IF statements which tend to slow Solver down. but selected here for clarity of illustration).7. with a standard deviation Sy = 0. (11) In cell HIOI place the label Denom=. 6.Ch. This summation normalizes the convolution.015 ± 0. By Changing Cells 197:198. 6: Convolution.100) and copy it all the way to row 303.02 (unrealistically far off. and yields nonoscillatory results. (12) Go to the VBAmodule and enter the following code forthis function. Use of a model function Sm keeps noise down. We started with amplitude a = 1 and rate constant k = 0.79.7. 6. $ 1$1 01. used as initial guess values am = 1.2 and km = 0. and a correlation coefficient fak between am and km of 0.1. . Function Convol(Arrayl. and then found am = 1. This exercise demonstrates the principle of the method. (15) Compare your results with those in fig. If that is not desired. In practice. (14) Call Solver.$E$103: $E$2 02. In cell HI label the column as r model. Array2.N + 1) Next i Convol = * Array3(i) Sum / Denom End Function (13) In cell BOO deposit the function =SUMXMY2 (Fl 03: F303. which shows them before and after using Solver.
.._ _UJI Fig. e..2 . texp Grinvald & Steinberg emphasized the use of properly weighted least squares based..g..0 0.''''..>< .l.0 0. and one might also want to compute and plot the autocorrelation function of the residuals as an aid to discriminate between various model assumptions.. Grinvald & Steinberg gave as examples Yo = 0.'''' 1.6 0.26 + (1/2) e. the result rexp obtained by convolving s with the (noisefree) transfer function and then adding noise (large open circles)..S + 0....__ __i.tl8 andY1 t tI6 7 = 0...0034 even thoughY2 and Y3 use quite different models.. the assmned model function sm (line) and the resulting function rm after convolving Sm with texp (heavy line).. 0.I_ _""l.. One should of course be alert to the possibility that different models may yield experimentally indistinguishable results.75 e. Especially in the latter case.6 O. and can also handle multiexponential fits. The top panel shows the situation just before calling Solver.5 + 0.4 ~~~ !..7..2 0. orY2 = (1/2) etl2 .!4._ _ _ =l.2 '"  """" . the bottom panel that after Solver has been used...ti5 .1: The assumed signal s (gray band).324 R. de Levie.2 0. AdvancedExcelfor scientific data analysis This method is not restricted to single exponentials..2 x...."".1.0025. that never differ by more than 0.0 . 1.. 1.s + (1/3) eti6 for which the maximal difference is always less than 0.25 e. 6.2 0...__~UL.46 andY3= (1/3) etl2 + (1/3) etl3 .8 0.75 etl5 .4 ~~~iii 0... the noisy transfer function (line with small solid points).''..25 e.. .8 0.0 0. the residuals may be dominated by experimental noise. on Poissonian counting statistics when the spectrum is obtained by single photon counting..\L.
and we will now apply this to deconvolution. upon analytical Fourier transformation. yield (6.8. 6: Convolution. can be described as a sum of Gaussian curves. deconvolution. Below we will see how this approach can be carried to its logical conclusion by using a noisefree transfer function t as well. which (when not carefully compensated in the deconvolution routine) can again lead to distortion and loss of detail. We therefore start with the mathematical functions (6. noisefree analytical functions.Ch.8.3) and (6. this approach may still work when earlierdescribed methods fail.)lbi} (6. the most critical part of the procedure being the initial fitting of Gaussians to the experimental functions rand t. Noise only affects the result insofar as it limits the proper assignment of the Gaussians gr and gt. We will first deconvolve two single Npoint Gaussians. The approach we will take here substitutes deconvolving r with t by instead deconvolving Lgr with gt.1) and gl = a l exp{Y2 [(tc. The method described in section 6.8. t = gt· We have already seen in chapter 4 how we can use Solver to fit complicated functions in terms of sums of Gaussians.2) which. wavelength. this greatly reduces the effect of noise on the deconvolution. say a spectrum or a chromatogram. Because gr and gt are fitted. and by then performing the deconvolution algebraically. wavenumber.4) . r = Lgr• while the transfer function t is given by a single Gaussian.8.7 avoids this problem by fitting the undistorted signal s to a noisefree analytical function. we will here consider only a relatively simple case in which a measured result r ofthe convolution.8 Deconvolution byparameterization Many approaches to deconvolution are quite sensitive to noise. gr and gt that are both functions of a common parameter t which can represent elution time. and timejrequency analysis 325 6. Even though it has an even more limited applicability. To illustrate the basic idea. and may therefore require filtering. Again we use bold lowercase symbols to indicate timedependent functions rather than constants. etc. The actual calculation is straightforward.
b/)f 2] so that the original.9) b$ = (b/ .6) atb.8.r2 (b/ . Note that the constants b in (6.7) (6.r (6.8. and subtractive in deconvolution./ albd exp[2nj f(c.8. bl .8. once we have characterized the two Gaussian functionss gr andgt in terms of the constants an br . With Gaussian peaks it is therefore easy to predict how much convolution will broaden them and. r r exp bt2 ) .8.8) (6. in which case we will want to maintain that calibration by deconvolving with a function that has been scaled to have unit average.r(b. and Ct respectively.. are additive in convolution.Cl)] exp[2.8. (6. undistorted signal is given by g$ = g. the corresponding variances.326 Ro de Levie.r(b: bt2 ) = a.8. how much deconvolution can possibly sharpen them. Advanced Excelfor scientific data analysis respectively.byN at bt bs & (6. Equation (6.t + c )2 2(b.b/)'h and Cs = C'C l In other words. Cr and at. b/ = b/ + b?. If we simply set it to zero.1) and (6.8. Typically the experimental response r to be corrected by deconvolution is calibrated. _b?)2 =as exp{Y2[(t where as = arbrN cs) / hs]2} (6.b. Gs = Gr / G1 (6.8.8.~2.11) Moreover.8) shows that their squares.2.2 = b} . where we have used Euler's relation From these we obtain by division eO/x = cos(x) j sin(x).2) are simply standard deviations. we can simply calculate the deconvoluted Gaussian gs. the value of CI is usually arbitrary. In the case ofa Gaussiangt that implies that we should use N af = J2i hi 2. 0 gl = abN atbt~2. conversely.5) = [arb.10) so that (6.8.8.7) reduces to (6.(t  C .9) becomes . b.
3.2. in E22:E321 compute the function t with added noise (with. crl=.8. and the recovered value Sreeov. 6: Convolution. cs4= from F2:F13. Alternatively you can copy them from F2:F16. as in Fig.. .4. as =H3$H$15. then call it again to adjust 15:17. (3) Generate Gaussian noise of zero mean and unit standard deviation in N22: 032 1.5* ( (A22$1$4)/$I$3)A2)+ _ +$1$11*EXP(O. rexp . brl=. G22 : G321) . Enter this curve in the justmade plot. copy this down to row 321.8.13) Exercise 6. (6) In cell G22 deposit =$1$14 *EXP (0 .e.S. Do this adjustment groupwise: first let Solver adjust 12:13. using noise from column O.1: (1) Retrieve the spreadsheet used in exercise 6. and copy this instruction down to row 321. The graph might now resemble Fig. and enter this curve in the graph. ". Exercise 6. then modify them. so that you will not be tempted to look at the data originally taken for the simulation of s. br3=. because then R = 'L. or repeat that exercise. (4) Place appropriate noise amplitudes for t and r in cells C 19 and D 19 respectively. bt=. bsl =. (5) In H2:HI6 place the labels arl=. and ct=. H22 : H321) .Gr and T = Gt so that (6. in cell H22 place the instruction =$1$2 * EXP (0. and timefrequency analysis 327 (6.8. (7) Place numerical values in I14:Il6 so that the resulting curve approximately fits curve texp.Ch.1c. deconvolution. When you are done fitting the data. III :Il3.$1$16) / $ I $15)"'2) .. then 12:17. . 6. br2=. and let it minimize SSR in F19 by adjusting the guessed parameter values in [14:116. e. cr2=. r = 'L. ar2=. (You may first want to color the data in G2:G16 white. Plot these noisy versions oft and s.5*«A22$1$13)/$I$12)A2). (12) Use the curve made under point (9) to guess numerical values for arl through Cr4 in 12:113 so that the resulting curve approximately fits the data rexpo (13) Call Solver to refine these values by minimizing SSR in cell H19.g. the instruction =C22+$C$19*N22 in cell E22).8. and fmally 12:113.8.1 illustrates this procedure for the deconvolution of the data shown in Fig. ar4=. 18:113. 6. 6. Similarly compute a noisy version ofr in column F. (15) In L4 calculate Csl = Crl. 'model. at=. change their color back to black or whatever. (9) Call Solver. rmodel. (2) In cells E20:H20 deposit column headings for (exp. br4=. cr4=. cr3=.12) When r must be expressed as a sum of Gaussians. (14) In K2:K13 copy the labels asl=. (11) Compute SSR forr as =SUMXMY2 (F22: F321.) (8) In cell Fl9 calculate SSR for t as =SUMXMY2 (E22 : E321.1d. (10) Likewise.g" the same approach can be used. ar3=.C" i. then 18:110.2. and in N20 and 020 place headings for noise n. 5* ( (A22.
2 1. 6.... (19) In cell H22 compute the reconstituted signal Srecov with the instruction =$L$2 *EXP (0. and compare it with Fig. Advanced Excel for scientific data analysis O .....8.~ o Fig.0 0. (18) Copy the block L2:L4 to L5.' o O. Bottom panel: the same as in panel b after adding random noise... . L8...~f~. (20) Plot this curve.1a...~~~.I.O M ..b... a repeat from Fig. 6.b(2) .' o O. 0...8. 0. or =SQRT (H2 A2$H$14 A2) . 0. 6.. de Levie. 5* ( (A22..328 R...4 0. (16) In L3 compute bs ' = V (b:. 0.1f (21) In this graph also display the function s used as the starting point of this simulation from B22:B321.l_ _ _ _ _ _ __ __ __ _ _ _ _ _.2 0.0 0.4.2 • 50 100 150 200 250 300 O..6 50 100 150 200 250 300 b 0.4 0..6 a 0. showing in panel a the original simulated spectrum s and the distorting transfer function t.._ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _. and Lll...$L$13) / $L$12) A2) ... (J 7) In L2 calculate asl = arl b rl I bsl ."'...~ .2 1... and copy this down to row 32l.... and in panel b its convolution leading to the result r.. with =H1 * H2 / J2.+~_? L..2 0.....5* ( (A22$L$4) / $L$3) A2) + +$1$11 *EXP (0 .2 . 6.la.8.2.4 0.6 50 100 150 200 250 300 c • 0..c: Top and middle: replicas from Fig.
and the column (H22:H321) in which r was calculated.2 0 0. deconvolution.2 0.0 0. the original curve ofthe simulated function 5 repeated from panel a.6 0. Let SolverAid display the covariance matrix in N2:Y13.4 0.8.8 0. It will also deposit the standard deviations ofthe individual parameters in J2:J13. (23) Once more call SolverAid.2 0 0. and the resulting dec onvoluted signal s (open circles in panel}). the . 6.8 0.e/= The manually adjusted curves (with the parameters selected 'by eye') through the noisy data (gray curves in panel d).4 0.8.1d.Ch.6 0.6 0. as a thick gray line. where all peaks are Gaussian. enter the Solverdetermined parameters in 12:1 13. This method can indeed reconstitute most features of the original curve. Exercise 6. 6: Convolution.2 0.2 0. this time to find the uncertainty estimates for t. Therefore enter the location (Il4:I16) of the Solverdetermined parameters.2 0 50 100 150 200 250 300 329 • t 00 50 LOO 150 200 250 300 f 50 LOO L50 200 250 300 Fig.If. the location ofSSR for r (HI9).4 0.1 continued: (22) Call SolverAid. and can be identified as such in r despite the noise. and timefrequency analysis 0. Panel/also displays. at least in a favorable case such as shown Fig.8 0. 6. the same after Solver has refined the parameter estimates (panel e).0 0.0 0.8.
Finally.03 for rand t respectively. 0.1 2 0. 1 0.04 and 0.01 3 0.44 6.03 0 0.34 5 5. de Levie. The data in table 6.49 0.5! 0.]= b s ]= Cs ]= a s2= 0. Repeat this for the other 11 results in column 1.1 8 0. noise addition. and column Sst.4 15 130 0.5 7 a t= b. and those obtained by SolverAid for their standard deviations.4= 0. The added noise was Gaussian with zero mean and standard deviations of 0.8.37 7 5.4 14 165 0.2= a. Sony.1. N2:AB 16 as Covariance matrix.61 5 4.4 4 260 tguessed 0.65 3 br2 = Cr2= ad= br3= Cr3= a r4= b r4= Cr4= 210 0.3 0. and deconvolution: for all 12 coefficients ofsjounaihe recovered value ofs is within two standard deviations of that used in the simulation of Fig. The column labeled Staken lists the values used for simulating the data in fig.dev.8 5 35.5 0.1 9 0.8.01 7 0.01 0 0.9 5 245. The standard deviations depend.8. column sfounadisplays the deconvolved signal s as computed from Tjoundand tfound. and the next two columns the values obtained by Solver for the parameters.6] "1 st. so that it shares its main diagonal with that in N2:Y13.3= C.3 0 0.12 Table 6. Place the covariance matrix in ZI4:ABI6.01[ 1.75 l[ound 0.1.dev.14 1.60 0.8.75 0. These indicate satisfactory agreement between the parameters used to simulate s and those recovered after convolution. (24) Call Propagation. and that (G22:G32l) of the column where twas computed. Propagation handles only one parameter at a time.38 2 a s ]= b s2= 15'2 Cs2= a. The columns Tguessedand tguessed contain the initial guess values shown in fig. 0 C. of course.3 5 140 0.0 131.2 0.dev the corresponding uncertainty estimates.60 3'4 1.26 1.69 144.3 5 245 0.4= 3 Cs4= 225 ltaken 0.72 0. Advanced Excel for scientific data analysis location (F19) of SSR for t.8.5 0 0.la.= C t= 0.6 5 35 0. and give it T2:I16 as input parameters. including the precision estimates generated by SolverAid and Propagation. a.01 9 0. and L2 as function.5 2 a r ]= b r ]= Cr ]= arl= bo2= C.8.3= b'3= Cs3= a. 6.62 2]0.95 166. on the amount of noise added to the test functions.330 R.8 0 2'6 225.1 illustrate the results obtained.2 0.5 0 260.13 1. Sloken l'guessed 7 st.9 6 1.7} 0. .1c.3 6 0.7 4.3= a s4= b s4= C.1} a s2= 0. 7.23] 5.] 0.7 6.1= 3. It will then place the corresponding standard deviation in M2.4= 110 0.1: Some numerical results from exercise 6. or present in the experimental data.3= b.0 0.36 4 15.!= 109. rfouna Sfouna Sstdev.5 34 0. 6.
music is an interesting example because its common form of notation. All baselineseparated peaks and peak aggregates can be treated individually.9 Timefrequency analysis Fourier transformation presumes a steady state. Musical script is. because it considers the data set as one unit of an infinitely repeating sequence of identical units. multiplies these by the window function. As a practical matter. which we will therefore use. Yet. where M > N It starts with the first N data points in the set. is really a graph offrequency (notes) as a function oftime. It is an inherently imprecise approach because the product of the resolutions in time and frequency is subject to the uncertainty relationship discussed in section 5. (That uncertainty is intrinsic to the problem. and timefrequency analysis 331 This approach works. even with quite noisy signals. can yield estimates ofthe standard deviations of the deconvolved signal s. Timefrequency or Gabor transformation (D. In fact. We here address how. we will exploit the fast Fourier transformation algorithm. can we analyze and visualize sound (or any equivalent. Gabor. It even has explicit time notation (for the lengths of notes and rests) and the corresponding scale factors (tempo indicators and/or metronome settings). However. 6: Convolution.Ch.) The uncertainty can be minimized with a Gaussian window function.5. and in each window a Fourier transformation is applied to obtain its frequency content. deconvolution. and does not lead to oscillatory instabilities. and independent of the use ofFourier transformation or any other specific analysis method. The Gabor transform macro uses a Gaussian window function of N contiguous data points (with N = 2" where n is a positive integer) on a data set containing M data. This method. like that of section 6. and then performs a Fourier transformation on that product. so that the inverse transform Fourier of their quotient RIT can be expressed in analytical form. Engin. 6. of course. such as speech and music. It then moves the win . nonauditory signal) as a function of time and frequency. as they usually are when a timedependent signal is sampled. 93 (1946) 429) is an analysis in which a sliding time window moves along the data. a set of instructions for the performer. and therefore require that the data are equidistant in time. 1. other than by ear. Inst. musical script. complete with grid lines for both time (vertical lines identifying the various measures) and frequency (the horizontal lines of the staff). its applicability depends critically on how well one can represent both the measured result R and the transfer function T in terms offunctions with relatively simple Fourier transforms.7. Elect. there are many phenomena with frequency content that are not stationary.
The results are returned to the spreadsheet as a function of time and frequency.9. 'welltempered' musical scale. for the harmonics ofa major chord. which has as frequency the value specified in BII divided by 2n: Its amplitude.t. (10) In Cll deposit the instruction =B11 * 2A (3/12). Such a plot or map is called a sonogram.g. all halfnotes differ in frequency by a factor of2"(J 112). Advanced Excelfor scientific data analysis dow function over by one point.1. quickly rises just before t = 200. You may notice that the 3D map for a sizable array is slow to rotate.332 R. which should resemble Fig. and can then be plotted as either a 3D plot or a map of the absolute magnitude of the sound as a function of time and frequency. and in its successive input boxes enter the time increments (here: 1). (11) The resulting signal is not so transparent any more.I(t200)] + exp[O. and the user can further restrict the size of the output file. DII. Exercise 6. listing the frequency in its first column. the location ofthe input data (here: BI5:BIOI5). and also a surface map with Mapper. the macro will automatically move the window function each time by several data points. and the next 4 rows for constants and column headings. and in Ell the instruction =D11 * 2A (5/12) . Extend the instruction in cells B 15:B1015 to include three additional terms. as in a chord.9..2.003(t200)]}. (9) Now add some harmonics. or uses rows instead of columns. and that its presence slows down the operation ofthe spreadsheet whenever it must be redrawn on the screen.003*(A1S200»)). Inclusion ofthese parameters makes it easy to generate a labeled 3D plot as well as a surface map. (7) Make a 3D plot ofthe result. Leave the top 10 rows for graphs.9. in DlI the instruction =C11 *2 A (4/12). and Ell respectively. 6. and the rank number ofthe first data point used in each window in its top row. for such a simple trial function. modify the macro so that it stores rather than displays the data. such as t = 0 (l) 1000. you need not go through all this trouble. 6. and repeats this process MN+ 1 times until it has reached the end of the data set. identical to the first one except that their frequencies are specified by cells CII. given by 11 {exp[O. (4) Plot the trial function. in cell BI5 with the instruction ~(SIN($B$11*A15»/(EXP(O.1: (I) Start a new spreadsheet.. On the Western. .1*(A15200» +EXP(O. see Fig. (5) Call the macro Gabor. (6) The macro will now generate a data array. since the spreadsheet contains many more rows than columns.. (8) Obviously. Ifthe 250column limit presents a problem. such as CEGC. de Levie. e. When the data set is so large that it would result in more than 250 columns (and therefore might exceed the 256column width of the Excel spreadsheet). Copy this instruction down. (2) Starting in cell Al5 of column A deposit time t in constant increments t. and the (optional) integer to restrict the number of samples to be analyzed (which you can leave at its default value of5). somewhat like a note played on a piano. (3) In column B deposit a trial function. and then slowly decays.
Figure 6. 300.5 in cell B 11. (14) Modify the instruction in cells B15:Bl015 to correspond with a broken chord. and timefrequency analysis 333 (12) Repeat the process of Gabor transformation and mapping.5 in cell Bll. deconvolution.9.3b. and their time courses. say at t = 200. 2 O +'i.. 6.2: The extended test function used.9. 6: Convolution.9. again with the value 0. starting at the same time but at different frequencies. with the value 0.I I 2 3 L ________________________________ ~ o 200 400 600 800 1000 Fig. 6. 400.Ch. 3 . 6.9. (13) The surface map reveals very clearly the four different notes. 2 0 ++1 I 2 3 L ______ ~ ________________________ ~ o 200 400 600 800 1000 Fig. in which the various notes start one after the other. whereas they really start only around t = 200. With such complicated signals we can readily appreciate the advantages of the Gabor transform and its representation as a 3D graph or surface map. The different signal frequencies. are clearly displayed. 3 ~. and 500 respectively. This will become even more obvious when we consider more realistic musical signals.5 its Gabor transform. 6. The notes appear to start at about t = 100. The map should now look similar to that of Fig.1: The test function used. which may include short (staccato) .9. and Fig..4 illustrates such a signal. This time distortion results from the use of a Gaussian filter in the Gabor transformation macro.
5.1 o 100 600 Fig.9. 0.2 0. 1 0 R.9.3 0.5 0.5 0. 6. de Levie.4. surface maps of the Gabor transforms) ofthe functions shown in (a) Fig.4 0.e.3.2.5 0.2 0.1 and (b) Fig. 6. 3 2 I 2 3 ~~ o 200 400 600 00 1000 Fig.2 0. 6.9.4. Sonograms (i. The sonogram ofthe broken chord shown in Fig.9.1 0 100 300 500 700 900 100 300 500 700 900 Fig.3 0.334 0.9.9.. 6.4 0. Advanced Excel for scientific data analysis 0.3 0. 6. .4 0. 6. The extended test function for a broken major chord.
deconvolution. The echolocation pulses are short. The uncertainty relation causes some vagueness.1 and the twocolor background of the back cover of this book illustrates what you might obtain. 6: Convolution.fuscus) can be downloaded from www. More subtle details can be discerned by using a full color palette. as in a glissando. at the cost of a correspondingly larger uncertainty in the frequency scale. and amplitude (intensity. volume). and after about 50 ms is joined by a second descending signal at its double frequency. contains 400 equidistant data points taken at 7 J1S intervals. 6.10 The echolocation pulse o/a bat Bats orient themselves at night by sending out short sound bursts of varying amplitude and frequency.1.1: (1) Start a new spreadsheet. whereas the sonogram displays it in 3D or as a color or grayscale map.10. so that they do not overlap with the reflected signals. The sonogram exhibits the three basic attributes of sound: time. Ken White.edu/software/ TFAlRGKlBATlbatsig. 6.10. as with Mapper 1 through Mapper3. In addition it displays their harmonics. A digitized echolocation pulse of a large brown bat (Eptesicu. This can be reduced. Exercise 6. descends to about 20 kHz.10. and by analyzing the reflected sound.Ch. while the vertical axis shows their pitch. and therefore covers a total time ofless than 3 ms duration. (2) Apply the Gabor transform. will have harmonics (characteristic for the musical instrument used).g.sig. Import the bat data.. and plot them.10.1 starts out at about 30 kHz. 6. The Gabor transform and its visualization make this much more transparent than the original data set.dsp. e. frequency (pitch. and timejrequency analysis 335 and drawnout (legato) notes. the front cover is leftright reversed. 6. toneheight). leaving the top rows for graphs. loudness. The grayscale of Fig. The signal in Fig. and then map the results. richb@rice.rice.edu. The recorded pulse. in that it uses the horizontal axis for time (indicating the duration of the various notes). and may also exhibit gradually varying frequencies. In musical notation. We will analyze a real signal in the next section. by using a shorter data set for the Fourier transform analysis. most noticeable in the rather fuzzy onset of the pulse in Fig. The signal also contains weak higher harmonics at the triple and quadruple frequencies. and Al Feng of the Beckman Center at the University of Illinois. amplitude (loudness) must be indicated separately. and can also be obtained byemail from. courtesy of Curtis Condon. . In some respects it mimics musical notation.
5 2 2. in ms.000014 0.4 1.6 1.2 2.0024 0. 6. de Levie.E 04 18 19 3. 1 0.6 0.5 II 3 16 5.5 B C 0 E 2 3 4 5 6 7 8 1.3 0 0.000007 0. Top graph: the echolocation signal as a function of time. time (in startofsequence number).10.E+04 17 4.E+04 20 2.8 a 0.E+04 21 1.0029 0.2 0.4 0. Bottom graph: the corresponding sonogram: frequency (in Hz) vs.0024 0. Advanced Excel for scientific data analysis A 1 0.2 0.0 1. +00 24 f 0.00002 Y 0.0020 0.6 2.000021 0.336 R.E+04 22 23 O.1 0.1: The top ofa spreadsheet for Gabor analysis ofa bat chirp.4 2.8 2.0 0.2 1. . A twocolor (red & black) version ofthis sonogram can be found on the cover.2 0.0 2.8 1.0015 Fig.
.8 and in Fig.7.11 Summary This chapter illustrates several applications of Fourier transformation: convolution. knowing its transfer function.4. more robust (i. deconvolution is an imperfect tool: whatever is lost to distortion to below the level of random noise is seldom retrievable. as in section 6.e. the transfer function is the key to the distortion.2). In favorable cases. exhibit a clear sense of directionality.2. deconvolution is usually not possible. and to efforts to correct for it. Analogous to cryptography. When direct or Fouriertransform deconvolution are not practicable. Still. you may have noticed that least squares and Fourier transformation can often make a very powerful combination. or on combining nonlinear least squares with Fourier transformation.6 and 6. 6: Convolution. while this is not the case in other examples. deconvolution. iterative approaches based on using convolution in reverse are often still possible. deconvolution.3. perhaps because it is considered too difficult) it has here been given rather extensive coverage. therefore. to design experiments with minimal noise and minimal distortion. as described in section 6. 6.1 and in Fig. or of some calibration feature in the data set (cf. 6. see sections 6. and a consequent. and several tools have been provided to make it more readily accessible and userfriendly. An essential part of knowing your instrument is. noiseresistant) methods can be based on using theoretical models. Where the latter tries to minimize the effects of random fluctuations.2.8. The distinction is due to the form of the transfer function used: asymmetrical transfer functions affect the signal asymmetrically. Fig. such as those illustrated in section 6. symmetrical transfer functions affect it symmetrically.4 through 6.3. Some convolutions.7.10 and. Because of the practical importance of deconvolution (even though it is often underplayed in discussions of scientific data analysis. 6. in sections 6. such as those in sections 6. therefore. The prudent approach is.6 and 6. deconvolution addresses a particular (but equally ubiquitous) type ofsystematic distortion inherent in all physical measurements. Incidentally.Ch. ***** . and timefrequency analysis 337 6. and then to use deconvolution to reduce the effects of any remaining distortion. in the present chapter. as demonstrated earlier in the analysis of the tides in section 5. We have belabored deconvolution because it is the counterpart ofleast squares analysis. and timefrequency analysis.8. asymmetric action. Without knowledge of the transfer function.
On the other hand. 6. A. and that the fast Fourier transform is its principal transformation tool. Even moving objects can often be represented as a sequence of stationary states. Bloch in his book SSP. 1997. It is therefore not surprising that spectroscopy is predominantly a steadystate method. sound is experienced primarily as a timedependent phenomenon: we hear steps. Advanced Excel for scientific data analysis Visual information is primarily steady state: lighting conditions usually change slowly. edited by P. C. the sound of a refrigerator or of a fan moving air. as in a movie or on television. Jansson. de Levie. Halsey. Prentice Hall 1992. The deconvolution oflarge data sets with the van Cittert algorithm is discussed in W. . Note that a sonogram can be made continuously as the sound evolves.3 are described by S. as described in sections 6.12 Forfurther reading Many additional applications to electrical engineering and signal processing of the direct spreadsheet methods used in sections 6. and in Deconvolution ofImages and Spectra. music. To analyze timedependent phenomena we use timefrequency analysis.338 R. E. Blass and G. W. and most objects around us are stationary. Academic Press 1984. the humming of fluorescent lamps.10. and tend to ignore constant background noises: the leaves rustling in the wind. lagging only slightly because ofthe need to perform a single Fourier transformation.1 and 6. the constant drone of car traffic near a highway.9 and 6. the Spreadsheet Signal Processor. voices. Deconvolution of Absorption Spectra. Academic Press 1981.
that have initial or boundary conditions that are fully specified at one point in time or space. which often requires specialized hardware for complex problems such as the design of automobiles. this is a large field of expertise. or use numerical integration. and even there we will use only a few relatively simple ones. which is why we will now illustrate how numerical integration can be performed in Excel. only relatively few of the differential equations encountered in science and technology have known solutions. or computer chips. Ordinary differential equations only contain derivatives with respect to a single variable. We will further restrict the discussion by illustrating only socalled onepoint methods. it may at least give you an idea of what is possible. and are therefore more complicated.Chapter 7 Numerical integration of ordinary differential equations It is almost always easier to describe a complex physical system in terms of a set of differential equations than it is to solve them. While a short chapter cannot begin to do justice to this topic. pigs might have evolved wings before humans would have flown in craft heavier than air. . If airplane design had depended on closedform solutions of aerodynamic equations. In practice. In that case. for more mundane scientific tasks a spreadsheet may well fit the bill. camera lenses. such as time or distance. many of these pertain to idealized geometries. one can either simplify the problem while hoping to retain its most essential features. In this chapter we will illustrate the numerical integration of ordinary differential equations. However. with constant coefficients. numerical integration can provide a solution to any desired accuracy for any properly specified set of differential equations. their solution can proceed in stepwise fashion from that starting moment or boundary. Unfortunately. If a closedform solution does not exist. while partial differential equations have derivatives with respect to several such parameters. In principle. such as that of the proverbial spherical cow.
In this chapter we primarily consider a particular type of errors. Just for the record. but instead it might depend on some other criterion. de Levie. especially when taking differences between almost identical numbers.t is kept constant.1 The explicit Euler method When an ordinary differential equation is fully specified by initial conditions at one point in time or space. We will then use Solver to adjust the parameters of the numerical integration to fit the experimental data. repeatedly using the same single step method. i. We will use that exact solution. Advanced Excel for scientific data analysis We will use custom functions. and to work systematically from there. Other computational errors are seldom significant. and so on.t to advance to t = to + 2 IJ.t) and the corresponding slope (dF(t) / dt)t=+p. These can lead to systematic bias. plus Gaussian noise. Below we will first consider two sequential firstorder rate processes in series.340 R. it is possible to start from that point.. among the errors not considered here are those caused by the finite representation of numbers in a computer. to inaccuracies that are by far the dominant errors in the context of this chapter. the algorithmic deviations caused by replacing a differential equation by an approximation thereof. the time intervals are shown explicitly. We then repeat this process by considering the value ofF(to+lJ.such as dF(t)/dt. and need not be constant throughout the calculation. viz. 7. but the double precision of the spreadsheet usually keeps them at bay. it describes the slope of the function F(t). such as the absolute magnitude of the slope at each point. For instance. Readers unfamiliar with computer code may first want to read the introductory four sections of chapter 8 before delving into the present subject. the interval IJ.3 we will use customfunctions. In a spreadsheet. the acronym for Visual BASIC for Applications.e. when the differential equation is of first order. to simulate an 'experimental' data set.. and we will compare the thus found rate parameters with their correct values. In its simplest form.t is small enough so that we may consider the slope dF(t)/das essentially constant over that interval. Starting with section 8. written in VBA. and . Starting from the initial value Fo at t = to and its slope (dF(t)/ dt)t~p we can compute F(to+lJ. Since a closedform solution is available.. Those latter errors typically lead to imprecision. we can determine the errors involved in the simulation by comparison with the exact solution.t. or in two successive firstorder chemical reactions.t) as long as the interval f. such as occur in a twostep (mother/daughter) radioactive decay.
and will merely indicate how to use a few ofthe many useful algorithms. we will consider two sequential.1.7) (7.Ch.1.6) tlt dt tlb db :::::=k\ak2 b tlt dt (7. so that we need not worry about truncation and roundoff errors. schematically represented by (7. b. Below we first consider the conceptually simplest method. while section 7. but this chapter cannot replace entire tomes written on this topic. and C are denoted by a. while the next section will describe a very useful yet still rather simple modification. There are many other. and how to accommodate extreme parameter values. bto = 0.6 we will use the more sophisticated RungeKutta method. and in the next four sections. Here.1. B.5 and 7. for this case no analytical solutions are known.1.4) where the concentrations of species A. and c respectively.1 ) which we will assume to be described by the differential equations do =ka dt \ db =k\ak2b dt de =k b 2 dt (7. We will then consider how to use custom functions to make the calculation more efficient.7 illustrates the application of these techniques to a somewhat more complex system of equations that can lead to oscillatory behavior. 11a da :::::=k1a (7.2) (7.1. CtO =0 (7. 7: Numerical integration ofordinary differential equations 341 we will use dimensioning to make sure that double precision remains in force.1.5) The simplest (and crudest) approach to simulating such a set of equations is merely to replace the differential quotients by the corresponding difference quotients. published by Euler as early as 1768. irreversible. worthwhile methods. and the rate constants by kJ and k2 • We will simulate this reaction sequence for the initial conditions atO = ao.1.3) (7. firstorder reactions.8) tlc :::::dc =k b tlt dt 2 . In sections 7.1.
(2) In column A start at t = 0.342 R.1.1 + bl (1 . At its top deposit labels and numerical values for the initial conditions ato = ao. Advanced Excel for scientific data analysis and to compute successive changes !. for all practical purposes.14) 6. we use that solution to compute the value at t + 2M. (7.1.1b. b~o = bo.1. columns B through D for the exact solutions. and then use constant increments l!.1 .1. Below those.12) (7.1a.1 so that Ct+flI= c/+6.1.kJ a l dl = 01 (1 .c 1::l k2 b 6.1. likewise. Typically we start the process at t = 0.t. at which point the calculation can be stopped. using their exact solutions a= aoe ' k. just as we might do when resetting a timer or stopwatch. Note that we have arbitrarily called zero the starting time of the simulation (or of the corresponding experiment) .g.t as (7.1. de Levie. In columns B through D compute a through c.6) through (7.. we will use column A for time. That is precisely what is done in the explicit Euler method.1t.hI k2 6.ll) or bl +flI =b/+ 6. approximate relations.t.k2 6. just the way we walk. either equilibrium or a cyclically repeating state is obtained. and Ac from the resulting. Exercise 7.1 = a/ kJ 6. and c at time t + b. and (except in some chaotic systems) we will ultimately reach a time when. we will use columns I:K to display the differences between our simulated solution and the exact one.8) respectively.b 1::l b l + a r kJ 6. one step at a time.kJ 6.1) and (7. b. Do the same for the rate constants k\ and kb and for the interval l!. Once we have found the concentrations at t + !. Since we have the benefit of an exact solution.c 1::l c/+ b1 k2 6.1.1: (1) Start a new spreadsheet.1 by moving a distance M in the direction given by the slopes defined by the righthand sides of(7.13) (7.1. and so on.10) al+flI = a l + t1a 1::l 01 . and columns F through H for our simulation. !.t) and.9) or (7.1. in Cl :D3. we start from the initial conditions at time t. in cells A1:B3. and ero = CO/3.15) . In the present example. and then calculate the concentrations of a. (7.
We use colunms E and I merely as spacers.3.17) _ek.1 and 7.001 and 0.12). ifnot. and then make the necessary adjustments in the instructions in colunms J through L so that these will now refer to the constants in KI :K3. 7: Numerical integration ofordinary differential equations b = a k1 (ek" o C 343 (7. In block Jl:K3 place another set oflabels and numerical values for k1' kb and ao. In cell F8 place the instruction = B8+$K$1 *R8.1 unaffected.1. (5) Plot your results. 7. Make the numbers in cells Kl :K3 somewhat different from those in DI. we create a makebelieve data set from the exact theory with added Gaussian noise (to be replaced by experimental data if available).1.1) back in Gl. (7) Insert four new colunms: highlight the colunm labels E:H. then use the numerical simulation to approximate the theory.1. The comparison of the simulated and exact curves in Fig.1 (continued): (6) Use Tools => Data Analysis => Random Number Generation to add three colunms of Gaussian ('nonnal') noise ofzero mean and unit amplitude in some outofsightcolumns. and B1. read on. and copy this to the entire blockF8:Hl 08. In lower rows of those same colunms. JIOS). (7. while focusing on their differences in Fig.4. (3) In the row for t = 0.1.J8. and the graphs like those in Figs.) / (k1 k2) = (ao + bo +co) (a+b) so that the instructions in. 7. Now the plot will show show both the added noise and any misfit caused by changing the values in Kl :K3. (9) Introduce the 'experimental' data of colunms F:H into the concentrationtime plot. 7. with a label in Fl. Your spreadsheet may now look like that in Figs. Then put a nonzero value ofs. 7.1.1.1. Add the corresponding noise amplitude Sn in G I.1.4 indicates that the agreement is only semiquantitative. Exercise 7.1. and (7. (4) In colunm J show the differences between the simulated and exact results for a. In order to simulate a somewhat realistic data analysis. and do the same in colunms K and L for those in band c. e.2 shows that the simulation indeed yields the correct overall behavior. and H respectively. .1.14) respectively. stop right here.Ch..10).16) (7.g. 7. Ifmaximum deviations of the order ofa few percent are acceptable. compute subsequent concentrations from their immediate predecessors using (7. (10) In order to provide Solver with criteria to gauge its progress towards a bestfitting solution.1. VerifY that this leaves the results in Fig. and in the resulting menu click on Insert. cells B8 and B9 might read =$B$l and =$B$I*EXP($D$I* $A9) respectively. bo. also deposit the initial values for ao. and Co in colunms F. as in N8:PI08. rightclick. (8) Temporarily set the value ofsn in Gl equal to zero. (typically between 0.1.1. then copy this instruction to cells 01 and PI. G.2 and 7. D2. in cell Nl place the instruction =S(]MXMY2(F8:FlOB.. and finally use Solver to adjust the latter to find the bestfitting concentration and rate parameters.
1.4 0.416 0.0 0. 1.l.8 19 0.179 0.407 0.1 f a exact soilltion b 0.7 0.607 0.087 0.384 0.155 0.067 0.l.368 0.905 0. .131 0.019 0.0 0.9 1.344 R.333 Fig.8 0.670 0. Broad gray curves: exact solution.000 0.0 1.009 0.462 0.2 0.033 0.240 0.549 0.6 0.741 0. solid black curves: simulation.497 0.000 0.477 0.345 0.297 0.000 0. 109 0. (11) When we can monitor all three concentrations as a function of time t.2: The concentrations a.5 0.6 0. a proper criterion for optimizing the adjustable parameters in It:13 with Solver might be the sum of the quantities computed in N l:P 1.2 0.1: The top lefthand comer ofthe spreadsheet of exercise 7. 7.3 0. 7.l. b.1 1.449 0.488 c 0.049 0. and c as a function oftime t.002 0. A Advanced aO = I bO = 0 cO = 0 Excel for scientific data analysis L\t = 0.8 0.4 0.093 0.0 0. 172 0. Therefore. calculate the sums of squares of all the residuals in 02 as =Nl+Ol+Pl. de Levie.442 0.1 0.2 0 2 4 6 8 t 10 Fig.
Ch.0000 0.4: The differences E between the results from numerical integration and the exact solution.0183 0.9000 0.014 0.03 0 2 4 6 t 10 Fig.1850 0.0056 0.0052 0.0128 0. 7.1.0191 inb 0.0228 0.0087 0.104 0.3874 0.082 0.0041 0.3: Columns F through L ofthe spreadsheet of exercise 7.5905 0. 7.5 illustrates what you might get in the presence ofa fair amount of noise.005 0.6561 0.4783 0.0189 0.0200 0. emphasized by using a greatly enlarged vertical scale.0044 0.4073 0. call SolverAid and find the associated uncertainties. 7: Numericalintegration ofordinary differential equations 345 L F G H J K explicit Euler integration II b c 1.0239 0.1000 0.3138 0.0239 0. and instruct it to minimize 02 while adjusting K1:K3.3666 0. Figure 7.5001 0.02 0.5314 0.4305 0.4659 0.5100 0.061 0.0058 0. (12) Call Solver.00 0.0235 0.4857 0.7290 0.8100 0.1.0142 0. As your finishing touch.0060 0.1.0028 Fig.3168 0.0051 0.0072 0.0220 0.0170 0.151 0.2568 0.0036 0.0000 0.01 0.0191 0.0118 0.000 0.127 0.0048 0.02 0.1. .1.027 0.3487 0.0024 0.0233 0.000 0.0059 0.043 0.0160 0. 0.4401 0.0000 0.0174 0.0192 0.0218 in c 0.0000 0.01 0.0000 0.176 e"ors in II 0.
don't overspecify the found parameters: they are only estimates. de Levie. the covariances between ao.4 0. If such parameters are to be used in subsequent calculations.5044. the opposite will be true. fairly close to the assumed values ofao =.9995. Assumed data: ao = l. kl = 0.01 h kl = 0. k2 = 0.5.0 111. you would find aD = 0. in the above example.. The gray bands are the theoretical curves in the absence of noise. measurements of a alone would not be as useful: since a does not depend on k2' such measurements can only yield ao and k l . D 1.003 + 0.1o.490 ± 0.5: Fitting simulated noisy data (open circles) to the model. kl = 0.03. and D2. (15) Using the explicit Euler method and Solver yields ao = 0. (16) Keep in mind that the parameters you fmd will be correlated rather than independent. the proper criterion for optimizing Solver would instead be the sum of squares of residuals as calculated in 01.1 9. sn = 0.2 1. Ifyou had only measurements on c.504 + 0. Here they are given with one extra. S2 = 0. changes in one may affect the values ofthe others.1678.6 0.05 5. so that the results are best described as ao = 0. (18) Incidentally.99 9 ± 0. kh and k2 should also be obtained from SolverAid. k2 = 0. In that case you would obtain slightly different results. and the standard deviations will be smaller.49 7 ± 0. Therefore. (17) Ifyou could only monitor the concentration b of species B as a function of time t.1. i.2 0. Advanced Excel for scientific data analysis (13) For smaller values of sn the recovered parameters will be closer to those assumed in B 1.9 4 ± 0.5. 1.0 Ib kl = 1. k2 = 0. the found bestfitting curves are shown as thin black lines.346 R.9411.06 7.2 0 2 o o A to B to C o o 4 6 8 t 10 Fig. in the form ofthe corresponding covariance matrix.8 0. A different simulation would also yield slightly different results. and find ao = 1. you would instead use the sum of the squares of the residuals in PI. (14) Save the spreadsheet for later use.kl = l.04 2 .k2 = 0.1 0 + 0. If you tried.9 8 ± 0.0665.1 b k2 = 0.07 0 . 1.0 0.97 3 ± 0. for larger values ofna. while SolverAid yields the corresponding standard deviations So = 0.e.51 = 0. subscripted guard digit that is not significant. kl = 1.0 0. you would see that Solver in that case wouldn't change the value ofk2 from its initial . andk2 = 0. but merely guards against systematic roundoff errors in possible uses of these numbers in subsequent computations. 7. because (simulated or real) noise is never quite the same.
2) (7. b. somewhat misleadingly. the concentration an during the interval from In to In+l by its average value (an +an +l)/2 = an + (a n"1an )/2 = an + l1a/2. (19) Save the spreadsheet. for the sake of completeness.1) (7. Upon replacing the concentrations a.4) a t+ru /j.2. We do not know how those concentrations are going to change. 7: Numerical integration ofordinary differential equations guess.2.2 The semiimplicit Euler method The procedure illustrated above uses the initial concentrations to compute the behavior during the intervall1t.2. assign it a standard deviation of zero. and that SolverAid would therefore. Still. This leads to the semiimplicit Euler method.1.k J (a+AaI2)k2 (b+AhI2) from which we obtain (7.2. k) (a + da J2) (7.1.17).b:::.2.Ch. = at + da :::. because it follows directly from the mass balance (7. 347 7.5) (7. in which we replace.2.1.8) . and c in (7.2) through (7.2.7) We need not compute c.4) by their initial values plus half their anticipated changes we have  Aa At :::.3) Ab I1t :::. it is listed here as (7.6) a klAt (l + kiM /2) (1 + k2M 12) (7. but instead of assuming a to remain constant we will now approximate the change in a as linear over a sufficiently small intervall1t. say. lkJ11t12 at 1+k J M/2 (7.2.
17) or (7.4 to the new page.9954 ± 0. and inferior to the inaccuracy oc (/1t)2 of the semiimplicit method.9). Since that is just as lopsided as using the initial value.sheetname!Yn:Ym.2.01 the results ofthe explicit and semiimplicit Euler methods would be ao = 0. where XnXm and Yn:Ym specify the ranges.944 ± 0..998 5 ± 0. but must first be solved for the concentration changes J:.5) and (7.0096.a/2 commethod. i.1).1.b. (7. the equivalent of Fig. with the highest number being shown on top ofthe other curves.t = 0.1.1 and the same Gaussian noise but now with Sn = 0.1. .023 and ao = 0. You may wonder what constitutes a fully implicit Euler method. Advanced Excel for scientific data analysis (7. (5) As suggested by comparing Figs. and k2 = 0.2.2.9) We see that equations such as (7.2 is immaterial. Here the explicit method clearly shows its bias. J:. or its initial value (as in the explicit method). This accounts for the implicit in the semiimplicit Euler +!)/2 = an + J:. a much better overall fit to the assumed values ofao = 1.2.0047 respectively.1.sheetname! Xn:Xm. k2 = 0.1. Exercise 7. because any differences are too small to be visible on this scale.a.2. and Ac. 7. In columns J and K of this copy. k\ = 1. In colunm L you can use either (7.002 3. 7..1.992\ ± 0.003 8.e. it uses the final value of the slope to evaluate the new value of F(t). Incidentally.3) cannot be used directly. (4) Repeat the analysis ofthe simulated data set with added Gaussian noise. kl = 0. (2) Click on the curves in your equivalent of Fig.1. change the instructions to incorporate (7.2).. For example. Instead of the average value of the slope (as in the semiimplicit method). this is the best one can do and still retain an absolutely stable solution.1. kl = 0.1: (1) Copy the spreadsheet of exercise 7. kl = 0. and (7. In doing so.03) the errors are now more than an order of magnitude smaller. a n+!. comparable to that of the explicit Euler method. de Levie.50 3 ± 0.5 than obtained earlier. It is only semiimplicit because (an +an bines half of the known term an with half of the next one. kz = 0. 7.2.003 2. k2 = 0.997 3 ± 0. For the same noisy data as used in Fig.1 to a new page of the same workbook.998 3 ± 0.2.05 8. 7. the improvement is more obvious for data that contain less noise.0070.1. andp defines the relative precedence of the curves. the implicit Euler method has an inaccuracy propOliional to At.4 and 7. 7.10) and (7.2..4 is immediate and dramatic: for the same step size (8. (3) The improvement in Fig. 7. All you need to change in the argument is the sheetname.4 we now fmd ao '" 0.p).7) instead of(7.2. which you fmd on the tab at the bottom ofthe spreadsheet. for /11 = 0. be careful not to alter the general format of the argument: (.348 R.2.1 over the results shown in Fig.014.49 6 ± 0.1.498 7 ± 0.12) respectively.1) and noise amplitude (na = 0. then redirect their definitions in the formula box to the current worksheet. For linear systems.
11) from which we would obtain (7. (7.1) with (7.<:.14) b t+!::J =b + jj.E04 0 2 4 6 8 t (0 Fig.t (7.b .E04 4. we verify that the semiimplicit method is indeed the average of the explicit and implicit Euler methods.2.2.::.:1'_ (I + k1!J. 7.t + bt (1 +k2M) Upon comparing.E04 E 349 4. 7: Numerical integration ofordinary differential equations 6. the implicit Euler method would read .a) k2(b+ !J. akM b I 1 + _.2.b J .1: The differences between the numerically integrated and exact solutions for the semiimplicit Euler method with fit = 0.Ch.b) !J.::.2.13) (7..g. e.a k\ (a+ !J.12) (7.2. k( (a+!J.15) = :C"':"="_ __ a'+!J.2.6) and (7.2. In our example.10).10) (7.t kJ!J. 04 O.1.2.1." !J. It combines the absolute stability of the implicit method with an accuracy that is higher than that of either the explicit or .a) !J. 04 2.E+OO 2.t) (1 + kzM) (1 + k 2 8t) (7.t !J.2.
Copy this down through row 108. de Levie. We saw in (7. Exercise 7. 7.+l!. the concentration differences Ac in Fig. You can readily verify that the simulation errors shown in Figs.5.1. we would have to lengthen the columns tenfold. Equations (7..1. Try it.6) through (7. Further reductions in M would make the columns even longer. it may be undesirable to lengthen the columns. (2) For the concentration a an elegant solution exists that does not require an increased column length.8) clearly show that the simulation is based on replacing the differential quotients dyldtby difference quotients I:.y/ M.4 also become smaller by an order of magnitude. since a generous amount of noise may mask many inadequacies of the model.so that we can replace the instruction in cell J9 for a by.1.1 indeed stem from the step size At. and those in Fig. and set the values in Kl:K3 back to the corresponding values in B 1:D3. Unfortunately. 7.9) do not have such a simple recursivity. Upon applying this n times with an n times smaller interval At we find a'+nl!. and observe its effect.kl!1t12) I (1 + kl At/2).3.g. because (7. For fitting data with a high signaltonoise ratio we may therefore need to improve the algorithm.t * a.kl~t/(2n)]r.7) and (7. as we will do in section 7. the successful fitting of simulated.1: (1) Return to the spreadsheet of exercise 7. However.2.2. However. Advanced Excelfor scientific data analysis implicit Euler method. This will improve the precision ofthe simulated avalues another two orders of magnitude withoutlengthening the columns.1.1 by two orders ofmagnitude. we can go a long way with the Euler methods by using the spreadsheet more intelligently. Then change the value ofn in these instructions from 10 to.3 Using custom/unctions As indicated in the previous paragraph. 7. e. because we may only have experimental data at given intervals At. Moreover. Below we will indicate how we can improve the accuracy of our simulation without increasing the column length.2. =J8*«I$K$I*$D$3/20)/(l+$K$I* $ D $ 3 120) ) A 10 for n = 10. say. 1000.2.1. and is therefore often the method of choice for solving simple problems involving ordinary differential equations. in order to cover the same total time (in the above example: from t = 0 to t = 10). For example.350 R. to 1000 rows.5) that a. say. a substitution that should become increasingly accurate as I1t becomes smaller. (1.2. this trick does not work for the other concentrations. upon reducing At by a factor often. 7. noisy data can be somewhat misleading.''''' at {[1kl~tl(2n)]/[1.4 and 7.2. This will quickly lead to impracticably long columns. For .
) that are imported through the function argument. b.e. Exercise 7. select Insert => Module ifthe display does not show a white writing area to the right of the Project colunm. . or copy from SampleMacros) the following instructions: 'semiimplicit Euler method for A Function siEulerA(kl.1 through 8. the parameters it will use (in exactly the same order as used here in the function argument. except that they apply only to a numerical value in a single spreadsheet cell. Ifwriting computer code is new to you. oldTl.Ch. which in the present context is a significant advantage. The next line specifies the name by which we can call this function. f As Double. and c to higher accuracies by reducing the step size while keeping constant the number of spreadsheet cells used in the simulation. Excel allows us to incorporate socalled userdefined or custom functions. just move to the end of any text on it. and (optionally) its precision. oldA) As Double step As Double A n = Clnt(n) = oldA step = (oldT2 . contains a comment that will be ignored by the spreadsheet but reminds the user ofthe purpose ofthe function. an option that. is listed at the very top ofyour module. Below we will use custom functions to compute the concentrations a. Then enter (type. On the other hand. do not specify the dimensions ofparameters (such as kl. In general these lines are optional though very useful. etc. These have much in common with small macros (to be discussed at length in chapter 8). otherwise. when used. (5) The next three lines define the types of constants used in the function. Dim A As Double. the last line identifies its end. and press AltuFll (on the Mac: OptuF1l).kl * step / 2) / (1 + kl * step / 2) For i A = 1 To n = A * f Next i siEulerA = A End Function (4) A short explanation is in order.3. The top line.oldTl) / n f = (1 . You will see a Microsoft Visual Basic screen appear. On that bar. Dim i As Integer n.. they are mandatory if you use Option Explicit. oldT2. within the brackets following the function name). you may first want to read sections 8. custom functions update automatically.1 (continued): (3) Return to the spreadsheet. if a page already exists.4 of the next chapter before continuing here. i. oldTl. with its own menu bar. 7: Numerical integration ofordinary differential equations 351 those more general cases we will need to use some spreadsheet magic. starting with an apostrophe.
This line will be executed from right to left. the equal sign here functions as an assignment. the output ofthe function is defined in its penultimate line. i. These are terms the Visual Basic editor recognizes as instruction keywords. fA As Double. by n times repeating the computation ofA for a time interval step that is n times smaller than the data spacing oldT2 . etc. or copy the instructions for siEulerA and then correct and amend that copy. As Integer. e. the line A = A * f should be read as ifit were written as A <= A * f.e. We insert this line here merely to illustrate how you can make a function somewhat less errorprone by anticipating possible mistakes.g. and using a negative number." (11) We calculate the value of f separately on line 9. because it uses its value which.. the computer takes the value ofn. de Levie. Advanced Excel for scientific data analysis (6) The sixth line (optional as well) makes sure that the method will work even if a noninteger value for n is used by mistake. would not be defined. step. fB As Double Dim i As Integer n A B Clnt (n) oldA oldB . because it can noticeably slow down the computation. oldTl. By letting oldTl and 01dT2 refer to relative addresses of cells containing t in successive rows ofthe spreadsheet. (13) Now enter the corresponding instructions for siEulerB. the time intervals in the spreadsheet need not be equidistant. (12) Finally. k2. B As Double.. In other words. Incidentally. and then assigns that value to the variable to the left of the equal sign. It is in such loops that we should be most careful to avoid busy work. For your convenience. Dim.e. 'semiimplicit Euler method for B Function siEulerB (kl. The calculation starts in eamest on line 8 by defining the new step size. converts it to an integer (ifit isn't already one). As Double..oldTl. This does not imply that the function is now immune to entry errors: using zero for n would certainly trip up the function when it tries to divide by 0 in the next line of code. the changes between the two sets of instructions are shown below in boldface. (10) Again. oldB) As Double Dim A As Double. whereas in line 11 the same calculation would be repeated n times. step As Double Dim f As Double. oldA. A= A* (lkl*step/2)/ (1+kl*step/2) directly in line 11. (9) Lines 10 tlrrough 12 contain the action part ofthe function. Note that the line specifYing f must follow the defmition of step. by converting it to an integer n with the instruction CInt (for convert to integer).) are displayed in blue after you have entered the line on which they appear. would also give problems. oldT2. otherwise. (8) Line 7 sets the concentration parameter A equal to the value of aIdA imported tlrrough the function argument. n.352 R. or a letter. and seeing them in color therefore assures you that your instructions are being read. Alternatively we can make the step size constant tlrroughout the calculation by referring to absolute addresses for oldTl and 01dT2 respectively. i. rather than use. because line 9 is executed only once. you will have noticed that a number of words you have entered (Function. as "replace A by A * f.
3.Ch. see how they run. (IS) In order to use the functions you have just entered. in cell F2 place the label n=.5.kl * step / 2) / (1 + k1 * step / 2) fA = kl * step / «(1 + k1 * step / 2) * fB = (1 + k2 (1 .J8. 7: Numerical integration ofordinary differential equations step f = 353 (o1dT2 . $G$2. 7.1.$A8.$GS2.o1dTl) / n = (1 . For the same M = 0. and in G2 its value.K8).1o ! That will be good enough for ahnost any experiment.1? (22) Save the spreadsheet for further use in section 7. Likewise replace the instruction in K9 by siEulerB($K$I.$A9. 7. the error in the semiimplicit Euler method has been reduced tenthousandfold. (17) Convert the instructions in colunms N through P to the corresponding logarithms. Then try them out. On the spreadsheet. such as 1. rather than by inadequacies in the model. And that is precisely where you want to be: the computation should not add any inaccuracies to your experimental results. Depending on the speed ofyour computer.3. which should be a positive integer larger than O. the computation may now take its sweet time (after all. This allows us to break up a long instruction so that it will be visible on the monitor screen (or the printed page) while being interpreted by the computer as a single line. in order to indicate a line continuation. (21) Go back to exercise 7. J8) . (19) Try flt = 0. what value of n do you need in order to get the errors down to the same order of magnitude as those shown in Fig. the plot of the concentration errors should look like Fig. $A9. . which toggles you back to the spreadsheet. and rerun Solver and SolverAid. For ahnost any realistic noise the accuracy ofyour results will now be limited by that noise.1 and write the corresponding functions eEulerA and eEulerB for the explicit case.1 with n = 1000. There can be no text on that line beyond the continuation sign. and see what happens with the concentration differences in colunms N through P. but you get rewarded with absolute errors that are all smaller than SxlO. exit the editor with AltvFll (Mac: OptuFll). (20) Reset the values in Kl :K3 to new guess values.k2 = = * * step / 2» step / 2) / (1 + k2 * step / 2) For i 1 To n B A A * fA + B * fB = A * = f Next i siEu1erB B End Function (14) Note the use of a space followed by an underscore at the end ofline 1.1 and various values for n.1. and copy this instruction down to row 108. $A8. in each of the 100 cells you make the For . With n = 100. By using onehundred times smaller steps. 10. (16) Replace the instruction in J9 by siEulerA ($K$l. (18) Run the spreadsheet with At = 0..1. Next loop do 1000 complete calculations).. and 100. so that you need not change the scale of the graph every time you change the value ofn.$K$2.
when the decay ofB to C is very much faster than its generation from A. For instance.15) through (7.. the simulation may fail. For purely mathematical solutions.16) equal to 0/0.1.1.E08 e 2. This complication can readily be avoided by making the difference between k2 and k\ negligibly small rather than zero. And when this is accommodated by taking more steps. in a numerical simulation the specific values of the rate constants often do matter. (There is a trivial exception to this statement..1.. 7. When k2 is not much larger than k\. kl and k2• In our examples we have so far assumed that kJ and k2 are of a similar order of magnitude..) By contrast. whereas k2D. but that is not necessarily the case.3) yields b ::::: k j a/k2.1..00 1. such as (7.. We split the case into two parts. On the other hand..354 R.. the particular values of the rate constants make no difference. We can . Below we will consider what to do in such a situation. will be given by the steady state approximation db/dt::::: 0.1 and n = 100 for an actual step size of 0. 7. when k2» k\.. However. Specifically. in such an extreme case there is often a good approximation that can be used instead.t might then be much greater than 1.t« 1.E08 ' . the computation may become far too slow to be practicable.... because setting k2 equal to k\ makes (7.. Advanced Excel for scientific data analysis 4. de Levie. to a good approximation...EO 6. which upon substitution into (7. we make sure that the step At size is appropriately decreased.....17). because it is based on kjD.3: The differences between the numerically integrated and exact solutions for the semiimplicit Euler method with M ~ 0... the concentration b will be small and. with k2 »k j this would lead to quite lengthy computations..' o 2 4 t 10 Fig.4 Extreme parameter values The reaction scheme we have adopted here has two rate constants.
The steady state approximation B = kl * A / k2 End I f If (k2 / kl > 1 And k2 / kl <= crit) Then m = CLng(Sqr(k2 / kl)l n A n = m * n step = step / m (1 + kl * step / 2) f = (1 . 'semiimplici t Euler method for B.k2 * step / 2) / (1 + k2 * step / 2) For i = 1 To n B = A * fA + B * fB A = A * f Next i End If siEulerBB = B End Function (3) The change from Integer to Long in the dimension statements reflects the fact that single precision integers can only count up to i 51 = 32 767. The following modified function will compute b for any value ofk2 • Exercise 7. and test how far you can increase the value Ofk2 (while keeping kl constant at. The changes with respect to iEulerB are shown in bold.1: (1) Return to the spreadsheet of exercise 7.oldTl) / n f = (1 .kl * step / 2) / End I f If k2 / kl <= crit Then fA = kl * step / « 1 + kl * step / 2) * (1 + k2 * step / 2) ) fB = (1 . Function siEulerBB (kl. oldTl. .3. fA As Double.4.1. step As Double Dim f As Double. oldB) As Double Dim A As Double. mAs Long CLng(n) oldA B = oldB step = (01dT2 . (2) Add the function iEulerBB listed below to the VBA module. n.kl * step / 2) / (1 + kl * step / 2) If k2 / kl > crit Then For i = 1 To n A = A * f Next i . 7: Numerical integration ofordinary differential equations 355 set the switchover point such that the answers provided by the two methods coincide to within the desired accuracy. Such failure is most readily seen in the plot of log 8 vs.Ch. 01dT2. fB As Double Dim i As Long. modified so that 'itwill switch automatically to the steady state 'approxirnationwhen k2/kl becomes larger than a . where & is the inaccuracy obtained by comparison with the exact solution. B As Double. kl = 1) before the program starts to fail. t. given val ue. crit. k2. which may not suffice for the product ofn times crit. say. here called cri t (for cri teri urn) . oldA.
5 The explicit RungeKutta method The combination ofthe semiimplicit Euler method with the increased efficiency (within a given column length) of custom functions can integrate virtually every ordinary differential equation to any desired accuracy. again set the values in K1:K3 back to those in B1 :D3. a time scale that is far more detailed than needed for the decay ofB into C.356 R.. This is why it is convenient to let the function calculate the step size in every cell. Here. (10) Save the spreadsheet. and fast where possible.... reacts to form C. then finds b with the steady state approximation. fast on the open highway on a clear day with little traffic. just the way you would drive your car: slow near pedestrian crossings or in bad weather. You might try an initial value for crit of 1000. calculates a in the usual way.. (7) Replace siEulerB by siEulerBB (including the added variable crit in its argument). and almost always will produce model curves more than adequate for comparison with experimental data. it is practical to change the data spacing after the concentration of A has decayed to nearzero. It will serve to solve almost any problem simple enough to be done on a spreadsheet. in tum.Otherwise. then. by creating the corresponding function eEulerBB. (6) Enter a label and value for crit.. It allows you to go slowly where needed. The final If .t« 1. Advanced Excelfor scientific data analysis (4) For extremely large k2values the function uses the steadystate approximation b = k j alk2 . depending on the local change in t between successive rows. Then condition contains code similar to that in siEulerB. except that the value of n will now depend on whether the second If . for the same initial conditions (7. (8) Now test whether the function yields satisfactory results for k2 > crit x k j • This is most readily done by examining the plot oflog E vs.5). de Levie. For ease of testing. (9) Implement and test an equivalent extension to include large k2values for the explicit Euler method. . t just before and after k2 crosses the value of crit x k2 • You must of course change the values of k2 in both D2 and K2. 7. and make sure that it yields the same results as before for values of k2 not exceeding the product of k j and crit.. except that the step size is enlarged for k2 > k j • (5) The first If .1. because the time scale of the simulation was chosen such that kjl":t. the calculation is as before. we essentially have two decoupled reactIOns: first A decays to B which. Then statement increases the value of n when I < k2 I k\ < crit.. though at a much more leisurely pace. Then condition was met. Store a value for crit in G3. The second If . Staying with constant time intervals !1t then becomes very inefficient.. Then statement singles out the case k2 / k\ > crit. When k2 «kl.
We will here illustrate the most popular. n. and three for the corresponding errors e. 7: Numerical integration ofordinary differential equations 357 Still.1 to accommodate additional columns: three for a through c. and on the value and availability of your time.1.Ch. the explicit fourthorder RungeKutta method uses the relations 1 Yn+' =Yn +(K. How to balance personal time versus computer time depends.. When applied to a single ordinary firstorder differential equation dt such as encountered in (7.5.5.5.8) by making small steps from t to t + M with the slopes specified by the differential equations at time t. and enter the following function: 'explicit fourthorder RungeKutta method for A Function e4RKA(kI. although they tend to take more time to implement. while the Euler methods are conceptually simple.1. In the present section we explore another approach that leads to algorithms that use the computer more efficiently. there are several possible RungeKutta formalisms. on the anticipated amount of use of the computation.6) K2 == t!.1: (1) Extend the spreadsheet used in exercise 7.5. The RungeKutta approach instead uses slopes appropriately averaged over the interval I1t.3) (7. oldA) As Double Dim A As Double.1. they are computationally relatively inefficient. In the explicit Euler method. on the speed of your computer. fourthorder explicit RungeKutta method.t F(yn) (7.5. Even for a given order. we solve equations of the type of(7.5.1. but the resulting differences in performance are too inconsequential to concern us here. +2K2 +2K3 +K4) (7.5.5) (7.1) where K. == t!. oldT2. or some combination of these.2) 6 dy ==F(y) (7. (2) Switch to the module of the Visual Basic Editor.6) through (7. step As Double . of course. This yields a method that needs fewer steps for a given accuracy. oldTI.2).t F(Yn + K 1 /2) K3 =dt F(Yn +K2/2) K4 =I:!.t F(Yn + K 3) Exercise 7. reaches higher accuracies for the same number of steps.4) (7.
5. because the expression for db/dt depends not only on b but also on a.7) the corresponding relations for the explicit fourthorder RungeKutta method are Yn+1 = Yn +!(KYI 6 +2Ky2 + 2Ky3 +K y4 ) (7.15) (7. Zit (7. KA2=step* For b the situation is more complicated.358 R.e.5.5.14) + Kzl 12) (7.5. zn + K: 2 /2) Kz4 = I1t G(Yn + K yJ .5. zn +KzI/2) Ky3 = I1t F(y" + K y2 12.. Dim KA3 As Double.11 ) (7.5.5.oldTl) / For i = 1 To n KAI = step * kl * A KA2 = step * kl * (A KA3 = step * kl * (A KA4 = step * kl * (A A = A + (KAl + 2 * KA2 Next i e4RKA = A End Function n + KAI / + KA2 / + KA3) 2) 2) KA3 + KA4) / 6 + 2 * (3) In the defmition ofKAI you will recognize the/unction F(y) of (7. i.5. de Levie. For an ordinary firstorder differential equation of the form dy = F(y.5.12) (7.9) where KYI =l1t F(Yn' zn) Ky2 =11t F(Yn +KYI/2.16) (7.10) (7. the expressionK2 = b.z) dt (7. zn + K zJ ) which relations are used in the spreadsheet function e4RKB.17) K zJ = I1t G(Yn + KY2 12.5.1) as kl * A.5.8) (7.t x F (yn+ K/2) in (7. . Advanced Excel for scientific data analysis Dim KAI As Double.5.5.13) (7. with A replaced by (A+ KA1I2) . while the variabley is there specified as A. zn + Kz2 12) K y4 = 111 F(Yn + K y3 ' zn + K z3 ) K:l = I1t G(Yn' zn) K:2 =l1t G(Yn + KYI 12.4) is coded in the function statement as kl * (A+KA1/2). Therefore. Dim i As Integer KA2 As Double KA4 As Double n = CInt(n) A = oldA step = (oldT2 .
27) (7.k2 * B) * (kl * (A + KAl / 2) * * * * .5. oldTl.24) (7. zn) Ky2 Ky4 (7. KBl As Double.t F(tn +!1t12.z) (7.20) where Kyl =!1t F(tn' Y n .5. oldB) As Double Dim Dim Dim Dim Dim Dim n A A As Double.5. zn + K z3 ) Exercise 7.5.1 (continued): (4) Add the code for the function c4RKB.5. = /). oldT2.Ch. those regions that are different from e4RKA are shown in boldface.19) (7. 'explicit fourthorder RungeKutta method for B Function e4RKB (kl.25) (7.y.5.22) (7.oldTl) n / n kl * A kl * (A + KAI / 2) kl * (A + KA2 / 2) kl * (A + KA3) * (kl * A . Yn' zn) Kz2 =!:J.23) (7.18) the corresponding relations for the explicit fourthorder RungeKutta method are Yn+l =Yn+!(KYI +2KY2 +2Ky3 + K y4 ) 6 (7. i As Integer As Double.28) = /). zn + Kz2 12) Kz4 = At G(tn + !1t. step As Double KA2 As Double KA4 As Double KS2 As Double KB4 As Double = CInt(n) = oldA B = oldB step = (oldTI For i 1 To KAI step KA2 step KA3 step KA4 step KBl step KB2 step .5. Yn +Ky2/2. zn +KzI/2) Ky3 =!1t F(tn +M/2.5.5. Zn + Kzl 12) K z3 =!:J.t F(tn +!1t.26) (7. B KAI As Double.t G(tn + At 12. zn +Kz2/2) Y n + K y3' zn + K z3 ) Kzl =!1t G(tn.21) (7.5. KB3 As Double.5.k2 * (B + KBl / 2» . as before. Y n + K yl 12.t G(tn +!1t 12. Y n + K y3. KA3 As Double. n. 7: Numerical integration ofordinary differential equations 359 When the ordinary firstorder differential equation has the form dy dt = F(t.5. Y n + Ky2 12. Yn +K yI /2. oldA. k2.
e. as done earlier. Multiple steps per cell may still be needed to accommodate values of k2 much larger than kJ.E+OO 4. (10) You should now find results similar to those shown in Fig. (8) Similarly.E07 E 4. deposit the instruction =e4RKB () with the appropriate arguments. (6) Refer to the numerical values of ao. i. b. 07 O. the much greater accuracy of the RungeKutta method makes it possible to use rather large steps and still have quite acceptable accuracy. i. for steps ofI1t = 0.17).5. and c with their exact solutions.k2 * (B + KB2 / (A + KA3) . 7. In this example.4. Starting from the custom functions given here.l.. where the addresses within the brackets refer to the parameters listed in the function argument.5. And in the next three colunms compute the algorithmic errors by comparing the results ofthe explicit fourthorder RungeKutta expressions for a. in the cell to its inmlediate right.E07 o 2 4 6 8 t 10 Fig. de Levie.. Copy both instructions down. The algorithmic errors are now already quite small for n = 1. 7. Advanced * * * Excel for scientific data 2» analysis KB3 step * (kl KB4 step * (kl B = B + (KBl + 2 A = A + (KAl + 2 Next i e4RKB = B * 2) . and Co in the top cells of the first three new colunms.e.1 and n = 1.1.1: The differences s between the numerically integrated and exact solutions for the explicit fourthorder RungeKutta method with At = 0. the time and effort needed in order to apply the RungeKutta method to another set of differential equations . based on (7.360 R. (9) In the third added colunm calculate C by difference.1. bu.k2 * (B + KB3» KB2 + 2 * KB3 + K(4) / 6 KA2 + 2 * KA3 + KA4) / 6 (A + KA2 / End Function (5) Return to the spreadsheet with AltvFll (Mac: OptuFll). see section 7. 8.1. so that there is hardly a need for using multiple iterations per cell. (7) In the second row in the new colunm for a place the instruction =e4RKA (). for a step size ofO.
by using a socalled continuously stirred reactor.. In a catalytic reaction. 42 (1920) 1595.Ch.2) B+C k3 k )2C 2 C . Chern. The efficiency of many technologically important chemical processes. Soc. many biochemical processes depend on nature's catalysts.g. Similarly. 1.6. such as solid surfaces or water droplets. Formally. such as the production of gasoline or of nitrogenbased fertilizer. Ber. Ges. or the MnOzcatalyzed reduction of permanganate.6. Chern.+) products (7. Am. Nat'/. we will assume that the concentration a of A is kept constant.g. the Landolt clock reaction (Landolt. The simplest example of such a process is the reaction A + B + 2B. and that the concentrations of the reactants and products are homogeneous throughout the reaction vessel. such a reaction can be depicted as A ~ B where C denotes the catalyst.1.g. USA 6 (1920) 410) is based on the reaction scheme A+B~2B 1 (7. Sci. One typically exeludes from such chemical reaction schemes the (often catalytic) effects of macroscopic bodies. e. 7. Lotka..Acad. for certain combinations of concentrations and rate parameters. 7: Numerical integration ofordinary difJerentiaequations 361 are relatively small. the catalyst speeds up (or retards) a reaction without being consumed itself. in order to affect the reaction rate.3) In order to obtain stationary oscillations.6. can give rise to steadystate oscillations. 19 (1886) 1317). since you only need to change the specific formulas in the definitions for the various Kvalues. as A + C + B + C. the enzymes. C cannot be a mere spectator. which can be written alternatively as A ~ B. e. depend on catalytic processes. The Lotka oscillator (A.1) (7. Proc. There are many known examples of autocatalytic reactions.6 The Latka oscillator 1 We will now apply the above methods to the Lotka model of two coupled autocatalytic reactions that. so that daldt = 0. Of course. Deut. In an autocatalytic reaction. The corresponding rate expressions for band c then read .. the reaction product itself serves as a catalyst. e. but must be involved in the reaction.
6. Below these place labels for a. kk2.6.5) dc =k2 bck3 c dt Below we will use the explicit Euler method.2).9) Exercise 7. 7. because they do not lead to the steadystate oscillations one should expect when the concentration a is kept constant. k 3.6.t. Leave space at its top for a row of figures. k 2. and 0. In a cyclic process. assuming that B21 and C21 refer to bo and Co respectively. and thereby caused the runaway behavior shown in Fig.0 deposit the initial values b = 1 and c = 2.4) (7. Even though the value of Mused. For the explicit Euler method we approximate (7. 2.1 Cnl . kId.01 deposit the instructions ~B21+ (kkl*a*B21kk2*B21* C21) *dt and ~C2l+ (kk2 *B21 *C21kk3 *C21) *dt for band c.01.t cn = cnl + de = cn_1 + (k2 bn. thereby quickly leading to quite significant deviations. a small but systematic error can accumulate in successive cycles.8) (7. (2) Name these parameters as a. Advanced Excel for scientific data analysis db =k1 abk2 bc dt (7.k3 cnl)/1t (7. is considerably smaller than lIkfor the largest kvalue used (l/k:F 0. kl. they should look similar to those shown in Fig.1 + (k 1 a bn .6) (7.1 + lJ. in section 7.6. Note that the concentrations band care . 5.. the highest colunm label is IV.362 R.1 . By reducing M we can verify that it was indeed too large. (7) Copy these instructions all the way down to t = 10. and in section 7.1: (1) Start a new spreadsheet.k2 bn .1.01. kkl is fme since Excel has only 256 columns. and /':.6.2.5) as t1b =k1abk2 bc I1t (7.6. (5) For t .7 we will solve the same problem with the semiimplicit Euler method. these results are clearly unsatisfactory. we use custom functions. kk3. (4) Fill the column for t with 0 (dt) 10. 0. and their values. and dt. 7.6. (6) For t = 0. (3) Deposit column headings for time t and for the concentrations band c.1 Cnl) lJ.e.6. Note that kl cannot be used as a name because it is a valid cell address.7) so that bn = bn .4) and (7.6.b = bn .1. such as 1.8 we will use the explicit fourthorder RungeKutta approach. de Levie.6.. (8) Plot your results. i.6. Because the column is already fairly long.
1: The top ofthe spreadsheet for exercise 7.Ch.6. And because we here have an open system (in which we must continually supply A to keep its concentration a constant) there is no convenient mass balance equation to eliminate either b or c.7109 Fig.6.9616 0.03 0.8810 1. 7: Numerical integration ofordinary differential equations 363 mutually dependent. Next loop.02 0. so that both must be computed inside the For .. 7.1 (continued): (9) Replace the cell instructions for band c by the custom functions shown below. Exercise 7.. and verify that they indeed work more efficiently. The following custom functions will work A I 2 3 4 5 6 7 8 6 4 8 9 10 0 =+0 2 11 12 13 Lotka oscillator 4 kl k2 k3 = = = 6 2 2 5 t a= 1 dt = 0.7664 1.9291 0.9148 2.9446 0.9800 0.1.01 0.9400 1.04 0.0000 0.05 1. .01 10 explicit EIIler 18 19 20 21 22 23 24 25 26 b c 0.0000 1.8231 1.00 0.6.
kk2 c . oldb. 7.kk3 * * b c) * * c) * dt / dt / n n Dim b As Double. kk2.2.kk2 . de Levie. c As Double Dim i As Integer b = oldb c = oldc For i = 1 To n b = b + (kkl c = c + (kk2 Next i eEb = b End Function Function eEc (a.01 and n = 10 for effective time increments of 0. kk3.kk3 * * b * c) * dt / n c) * dt / n For n = 10 this yields a stationary oscillation. . Dim b As Double. oldc. dt. see Fig. kk2.364 R. kkl. 7. c As Double Dim i As Integer b = oldb c = oldc For i = 1 To n b = b + (kkl c = c + (kk2 Next i eEc = c End Function * * a b * * b . Excel for scientific kk3. n) * * a b * * b c .2: The results for an explicit Euler simulation ofthe Latka oscillator with lit = 0. dt. As Double oldb. n) data analysis Function eEb(a. 8 6 4 2 o + o 2 4 6 8 t 10 Fig.6. As Double kkl.6. Advanced oldc.001.
Thin line: results from explicit Euler method with ~t= 0.001. factors of2 until you are satisfied with the consistency of the result.2. n = 20.6.01.7. 7 The Latka oscillator 2 In a semiimplicit simulation we use (7.=kJ a(b + M/2) .k2 (b + I1b/2) (c + I1c/2) (~t kJ a/2+k2c/2)tJJ + (k2 b/2)l1c=k abk2 bc J (7.c 4 2 + a ta C ~2 4 + 6 b Fig.1 continued: (10) Make a phase diagram by plotting c as a function of b. In the steady state it shows a closed loop.2.k3 (c + !l. say. In such a diagram. Edit => Paste Special.c/2) 11( Upon neglecting terms containing the product!1b Ac this yields Ab . displaying c as a function ofb. You can readily see whether you have reached a steady state from a phase diagram in which you plot c as a function of b. and compare the results.1) I1t I1c (7.= k2 (b + I1b/2)(c + !l. 7. Values) to another spot on the spreadsheet. say.7.01. the time t is an implicit parameter. 7: Numerical integration ofordinary differential equations 365 How will you know whether this nvalue is sufficient? Copy the numerical values (with Ctrluc. as illustrated in Fig. 6 . n = 1.3) .2) . n = 10 for an effective At of 0.c/2) .6.Ch.2: The phase diagram.6. Compare your result with Fig. a limit cycle. Keep increasing n by. Thick line: same with At = 0. 7. 7. Exercise 7. then mn the simulation with.7. 7.6.
kzbc k z bcll I I (7. G = k2 bc . 7.5) db=~k7z_bc_~~~e____~__ l/_d_t__kz~b_/72_+_k~3/~2 I I l/Mkl a!2 + k2 e/2 kz e/2 k z bi2 1/ dl kz bj2 + k3/2 and IIdl kl a/2 + k z el2 de= kz el2 . de Levie.7.01. db and Ac.7. kk3. Exercise 7.4) Thus we have two equations. and a is a constant.4). using custom functions based on (7.7.5. which are most readily obtained by matrix algebra as k1ab .3) and (7. dt.6) from which we can compute bn as bn .7.7.1: (1) Implement the semiimplicit Euler method on your spreadsheet. oldb. n) _ . 1IMk.k2 b/2 + k3/2 1 I k1abk2be k zbck3c I (7.7. (7. Advanced Excel for scientific data analysis (k z C/2)db+(~t k z b/2+k3/ 2)dC =kz bck3 C (7. Therefore we replace the custom function e4RKB of section 7. with M = 0. Such results.001. aj2+kz e/2 k2 h/2 k2 e/2 1/b. we can further improve the numerical accuracy of these results with custom functions that subdivide the interval dl into smaller steps.7. for a column of 1001 rows. olde.7) through (7.8 The Latka oscillator 3 For applying the fourthorder RungeKutta method we combine (7. are visually indistinguishable from those obtained with the explicit method supplemented with functions to reduce their effective dt to 0.7. so that F = k1ab .k 3c.366 R. If desired.7.17) with (7.7.5.5 by Function eRKb(a. kkl.6) for band c respectively. and two unknowns. kk2.kzbc.4) and (7.t .5) and (7.1+ db and en as Cnl+dc.5).
kk2.kk2 * (b KB4 = step * .kk3 * (c b = b + (KBl C = c + (KCl Next i eRKb = b End Function (kkI * a * b . KB2 As Double KB3 As Double. As Double Dim Dim Dim Dim Dim Dim kkl.kk2 * (b KB4 = step * .kk2 * (b KCl = step * KC2 = step * . dt. oldb.kk2 * (b KB3 = step * .kk2 * b * c) (kkl * a * (b + KBl / 2) + KBl / 2) * (c + KCl / 2)) (kkl * a * (b + KB2 / 2) + KB2 / 2) * (c + KC2 / 2)) (kkl * a * (b + KB3) + KB3) * (c + KC3)) (kk2 * b * c . i As Integer As Double. step As Double KBl As Double.kk2 * (b KCl step * KC2 = step * (kkl * a * b .kk2 * b * c) (kkl * a * (b + KBl / 2) +KBI/2) * (c+KCl/2)) (kkl * a * (b + KB2 / 2) + KB2 / 2) * (c + KC2 / 2) ) (kkl * a * (b + KB3) + KB3) * (c + KC3) ) (kk2 * b * c . n) b As Double.kk3 * c) (kk2 * (b + KBl / 2) * (c + KCl / + KCl / 2)) (kk2 * (b + KB2 / 2) * (c + KC2 / + KC2 / 2)) (kk2 * (b + KB3) * (c + KC3) + KC3) ) + 2 * KB2 + 2 * KB3 + KB4) / 6 + 2 * KC2 + 2 * KC3 + KC4) / 6 2) 2) and Function eRKe(a. kk3. KB4 As Double KCl As Double. step As Double KB2 As Double KB4 As Double KC2 As Double KC4 As Double 367 step = dt / n b = oldb c = olde For i 1 To n KBl = step * KB2 = step * .Ch. c As Double.kk3 * (c KC4 = step * . KB3 As Double. 7: Numerical integration ofordinary differential equations As Dim Dim Dim Dim Dim Dim Double b As Double.kk2 * (b KB3 = step * . KC2 As Double KC3 As Double.kk3 * (c KC3 = step * . KC4 As Double i As Integer step = dt / n b = oldb c = oldc For i 1 To n KBl = step * KB2 = step * . oldc. KC3 As Double.kk3 * c) (kk2 * (b + KBl / 2) * (c + KCl / 2) . KCl As Double. c KBl As Double.
* (c Advanced Excel for / scientific / 2) data analysis + KCl 2)) / 2) = step * (kk2 * (b + KB2 kk3 * (c + KC2 / 2) ) * (c + KC2 = step kk3 * = b + c = c + Next i eRKc = c (kk2 * (b + KB3) * (c + KC3) + KC3)) (KBI + 2 * KB2 + 2 * KB3 + KB4) / (KCl + 2 * KC2 + 2 * KC3 + KC4) / (c * 6 6 End Function Exercise 7. and of C. Figures 7.1) is catalyzed by the reaction product B.1) .. while the conversion ofB into C is catalyzed by C. c versus b. through reaction (7.1. as in the phase diagram of Fig.1).2. Another way to display this behavior is to plot. KC3 KC4 b de kk3 Levie.9 Stability While the fourthorder RungeKutta method leads to higher accuracy. b may recover while.g. In that case. The conversion of A into B in reaction (7.8.1 represent the same information in different formats. they are out of phase with each other.6.7.2).6. see Fig. 7. Such alternative representations are of course readily made on the spreadsheet.6. viz. We will illustrate this by numerically integrating the differential equation dy dx =i +1 (7. it will speed up the decomposition of B. judging by the constancy ofthe various repeat cycles.9.6. 7. This is clearly visible in Fig. in which case the process may become cyclic.6. Depending on the numerical values of the rate constants involved. c remains low. which shows a limit cycle. e.2 and 7. the semiimplicit method can be more stable.2. When the concentration of C increases. builtin graphing capabilities.7. in the time and frequency domains respectively. so that the corresponding concentration b will decrease. through reaction (7. In this example. using the above custom functions.368 R. the concentrations band c will reach their maximum values at different times: while both are cyclic. 7. with its convenient. 7.6. initially. However. the RungeKutta method is almost as satisfactory as the semiimplicit Euler method under otherwise identical conditions. It is the interplay between the two autocatalytic reactions that causes the oscillatory behavior. the decrease in b will lead to a decrease in the rate of production both ofB.1: (1) Implement the explicit fourthorder RungeKutta method on your spreadsheet.
implement the semiimplicit Euler method with the command (say in cell B6. using a custom function such as =siEulerY(AS.. k2 As Double Dim k3 As Double. and below this a column for x. oldY) Dim X As Double. Y As Double. (5) In the next ycolumn compute y with the explicit fourthorder RungeKutta method.01) 3.1: (1) Start a new spreadsheet. oldX2. (2) In the first ycolumn. and four columns for y.7. For the semiimplicit Euler method we rewrite (7.9.1 Function e4RKY(oldXl.C5) 'explici t 4th order RungeKutta for exercise 7.oldXl) For i 1 To n =0 =0 / / n «l/step) Y) Y=Y+ (Y*Y+1) Next i siEulerY = Y End Function (4) Place a corresponding value of n in cell Cl.1) as dy III so that = (y+ dy/2)2 + 1= y2 + ydy+ (dy)2 /4+ 1 (7. Exercise 7. such as 'semiimplicit Euler method for exercise 7.A6.2) d r:::::lIdx~Y y +1 2 (7. Fill the xcolumn with the numbers a (0.=oo = 369 yo = O. assuming that the value ofYo is placed in cell B5) =B5+(B5 A 2+1)/(l/(A6A5)B5).9. oldY) As Double Dim Y As Double. and the instruction incellC7. 7: Numerical integration ojordinarydifferential equations withy.9. with space for values ofn at the top. and the top cells in the ycolumns with zeroes for Yo.Ch. n. step As Double Dim kl As Double.1 Function siEulerY(oldXl. (3) In the next ycolumn use a custom function to compute y with smaller time increments.9.3) where we have again linearized y 2 by neglecting the term (dy)2/4.9. oldX2.$C$1. step As Double Dim i As Integer n =0 CInt (n) =0 Y oldY step (oldX2 . k4 As Double Dim i As Integer x = oldXl . n.
The RungeKutta method apparently cannot get past the discontinuity iny atx = n12:::::: 1. but this time with n = 10 (in cell EI). the semiimplicit Euler method can combine accuracy and reliability.t of 0. and even semiimplicit ones.$D$1.3. It is sometimes suggested that the RungeKutta method is all you need for the numerical integration of ordinary differential equations with onepoint boundary conditions. We merely illustrate this here for a particular differential equation.01 or multiple steps (as with n = 10). The semiimplicit Euler method has no problem with the integration.3. Figure 7..9. 7.2 and 7.t = 0. By now you may have recognized the function we have just integrated: it is y = tan(x)which. and the use of smaller effective step sizes lJ.370 Y R.9. but fails completely when it doesn't. 7. while this same hurdle doesn't faze the semiimplicit Euler method.57. But in Fig.1 illustrates what you will find. but it reflects a rather general property: implicit methods. and show that the RungeKutta method has far smaller algorithmic errors when it works. or with a custom function and an effective lJ.0000l. as illustrated in Fig.t = 0. but stops at x = 1.9.D5). as you can readily verify. 7.9. regardless of whether we use f.0111000 = 0. are more stable than explicit methods. Having the exact solution allows us to compute and plot the errors. and in cell D6 the instruction =e4RKY(A5. = = de Levie. is indeed the solution to (7. (7) Use the same custom function in the last ycolumn. When a custom function is used to reduce the effective step size.9. (8) Plot the results for y as a function ofx obtained with these two methods.A6. The RungeKutta method starts out fine.5708. either for a simple oneline instruction and lJ.oldX1) For i = 1 To n k1 = step * (Y A 2 + 1) k2 = step * (( (Y + kl / 2) "2) + 1) k3 = step * (( (Y + k2 / 2) "2) + 1) k4 = step * (( (Y + k3) "2) + 1) Y = Y + (kl + 2 * k2 + 2 * k3 + k4) / x = X + step Next i 6 e4RKY = Y End Function (6) Place the value n = I in cell D I .9.1) withyo =0. Advanced Excel for scientific data analysis oldY / n n CInt(n) step = (oldX2 .01. The results so obtained are shown in Figs.t I n makes the corre .2 the oneline semiimplicit Euler instruction eventually outperforms the RungeKutta method.
~ ·20 40 ! x o Fig..01.01. Broad gray band: results from the explicit fourthorder RungeKutta method for I1t = 0.01 with n = 1000. Solid dots: results from the semiimplicit Euler method for I1t = 0.9. Broad gray band: results from the explicit fourthorder RungeKutta method for M = 0. 0 l og ·5 It. 7: Numerical integration ofordinary differential equations 371 sponding inaccuracies quite acceptable. which fails for x > n12..01. Note that the latter fails for x> 1. the semiimplicit Euler method is often a better bet. Solid dots: results from the semiimplicit Euler method for M = 0.15 0 :r Fig.1: The functiony found by numerical integration of(7. Solid dots: results from the semiimplicit Euler method for 111 = 0.. Because of its greater stability.57. 7. except when you already know the function to be well behaved over the entire range ofinterest.9.1 . 7. Broad gray band: results from the explicit fourthorder RungeKutta method for I1t = 0. .3: The (logaritlnns ofthe absolute values ofthe) relative errors Ii r in the numerical simulation.9.9. 7. l · 10 .Ch.2: The (logarithms of the absolute values of the) relative errors S r in the numerical simulation. 7.1) with Yo = O. 40 J' 20 0 ___ .01 with n = 10.. 1 & ·s ·10 ~ 15 o x Fig. l og 1. see Fig. yielding relative errors smaller than 107 ..01. The cusps in the latter curve correspond to sign changes in £ rex).9.3.
Since in science and technology (as distinct from mathematics) such an initial value is never known exactly.. Figure 7.6e x d'C (7. A clear example. This is why the predictive power of the farmer's almanac is not much worse than the longterm weather forecast. tum out to depend critically on the precision ofthe (often imprecisely known) initial conditions.1. 7.10. and somewhat resembles a needle balancing on its point.2) For the initial conditionYr=o = 1 we find4 = 0.10.10. Advanced Excelfor scientific data analysis 7. when the solution is highly dependent on the precise value of the initial condition.1 0.x +Ae5x (7. taken from section 9.372 R. . Numerical Methods.2 illustrates how the semiimplicit Euler and the explicit fourthorder RungeKutta methods fare with this equation.1) which has the general solution y=e. No numerical process handles data with complete precision. pathological equations such as (7. foryo = 1 ± & equation (7.• However. andAnalysis (McGrawHill 1983). This is illustrated in Fig.= 5y.1) will eventually give problems when integrated numerically with the simple methods described here. e. a situation that is unstable to any perturbation.2) yields A = ±s'.10. the solution can become uncertain. de Levie. R. which depends on equations that. the term in e5x will eventually dominate the solution no matter how small j&L as long as it is not exactly zero.10 Chaos Apart from problems caused by the method used for numerical integration. with the solutiony = eX + £ e5x • As x increases.g. Rice.10. Software. is the differential equation ely . when integrated over long periods to yield a prediction. no matter how small. difficulties can be caused by the differential equation itself This can occur. Consequently. so that the solution is y x = e.1B of 1. A prototypical example of this phenomenon is the weather forecast.
.. Broad gray curve: t:= 0.. from left to right... and n = 10 (or /1(= 0... x 6 Fig. For comparison.. 2 X 10 5 (d)..001 and 0.6e x with the semiimplicit Euler method (A through C) and the explicit fourthorder RungeKutta method (curves D through F) for /1( = 0. _104. from left to right. . 7. the thin black curves below the line y = ex .2: Numerical integration ofdy/dx = Sy .. while curves D through Fused n = 1..001) respectively..•• ..... a • B j b C . All these results are displayed as individual data points.01.= 103 (a)... and 1014 respectively. with thin gray lines for c.0. Likewise.1.I:. 7: Numerical integration ofordinary differential equations 373 y o \ 1 \ o 2 \ \ \ \. effectively making /1t equal to 0. 1O{l. for c. the curve y = e x + e e 5x for c... 4 '.• y 0 • ' . where y = eX. °. . and 2 x 1013 (I).10.lOS. 10 R. n = 10.... ~ D • d I e F e! • 0 2 4 X 6 Fig..= 10 2 . = 0 is shown as a 7 thick gray line. A .. 104.6e x for various values of the initial condition Yo = 1 + e. 2 x 109 (e). 7.1 1012 .. 10 .. for e= _102.00001.lO ·1O._1012.and 1014respectively..1: The solutions for dy/dx = 5y ...Ch. (b). a:::: e e • Ie • . . Thin black curves above the line y = e x.. Curves Band C used custom functions with n = 100 and n = 104..1. ~ E \ \ ..10. 10 11 (c). lO{l. 10. . and 0.....
For higher stability. but it yields rather crude results. Since step size is such an important parameter in achieving accuracy. Wiley 1979. There are. fully implicit methods may be needed (implicit RungeKutta methods are described in. The present chapter is merely meant to illustrate how to approach such problems on a spreadsheet. chapter 15 of the Numerical Recipes. For more complicated systems or routine uses a spreadsheet may well be too slow and/or too limiting. The explicit fourthorder RungeKutta method is much more efficient. already have it at your fingertips. and the most popular of the explicit RungeKutta methods. e. and they become quite complicated in nonlinear cases. These may well be desirable for fully automated software systems. my reader. and that you have direct access to its code and therefore complete control over its opera . These are described in standard textbooks on numerical integration of ordinary differential equations. so that the step size can be adjusted manually. Numerical Solutions to Differential Equations..e. adaptive (i.g. Jain. We have encountered three methods: the explicit and semiimplicit Euler methods. both conceptually and in terms of its practical implementation. Implicit methods are inherently more stable. of course. The semiimplicit Euler method combines relative simplicity with often quite acceptable stability and accuracy. and if desired can readily be incorporated into the spreadsheet.374 R. selfcorrecting) methods have been developed to automatically adjust the step size in order to keep the inaccuracy within prescribed limits. K.g. where the intermediate results are directly visible. An additional advantage of the spreadsheet over dedicated software is that you. and is readily implemented in Excel with custom functions.. e. de Levie. in which case the user should explore the capabilities of more specialized software. such as when a function changes rapidly.. 1984) but they do not appear to have significant advantages in linear systems. Advanced Excel for scientific data analysis 7. M. many other methods we have not encountered here. but are usually unnecessary on spreadsheets. Ifneeded. The explicit Euler method is the simplest. there is no problem incorporating adaptive step size control along the lines described in. the transparency and ease of visualizing intermediate results may well make the spreadsheet a useful first stage for exploring the underlying algorithms. especially when we again use custom functions to reduce the step size.11 Summary In this chapter we have looked in some detail at a few simple methods for the numerical integration of ordinary differential equations with onepoint boundary conditions. But even then.
such as Laplace transformation.6 through 7. as in the example of section 7. at least in principle. which contain parameters of quite different orders of magnitude. as they often do with socalled stiffsystems of differential equations.9.g. e. This is not a requirement for numerical integration.8 illustrates an application where no closedform solution is available. Why is it that numerical methods can integrate differential equations that have no known algebraic solutions? A main reason is that the most general analytical methods. e. in which the differential equations are so sensitive to their boundary conditions that numerical integration becomes extremely difficult. Obviously. chemical kinetics. Semiimplicit methods do better there.9. Fortunately. in the background. which may not be able to get past discontinuities. Such stiff systems are often encountered in. In order to demonstrate the properties of the few methods illustrated here they have first been introduced for differential equations with known solutions.Ch. such use also has its downside: functions operate invisibly. We have encountered some limitations of explicit methods. as illustrated. and/or to match the simulation to the experimental data. 7: Numerical integration ofordinary differential equations 375 tion. or can incorporate nonlinear methods such as Solver. Chaotic systems differ. which can either linearize the nonlinearities. In section 7. and therefore make the spreadsheet operation somewhat less transparent. 7. are effectively restricted to linear differential equations. A disadvantage is that you must know what you are doing.10 we briefly encountered chaos. most differential equations of practical importance in the physical sciences are well behaved within the range of their practical application. and that you may have to spend some time coding and troubleshooting. .12 Forfurther reading Virtually any college or university library will have books on numerical methods that provide more details than can possibly be provided here.. In this chapter we have introduced custom functions in order to use the spreadsheet 'real estate' more efficiently. although the results may look quite similar. and the example in sections 7.8 and 7. that is not where one would use numerical integration. usually in more detail than necessary with prepackaged software.g.. in sections 7. where the phenomena are inherently subject to random effects. from stochastics. However.
K. Dover 1965. Cambridge University Press 1986. 1984. an extremely useful yet inexpensive book that every scientist should have on his or her desk The Numerical Recipes by Press et aI. de Levie. Polonsky in chapter 25 of M. Abramowitz & I.376 R. Numerical Solution of Differential Equations. Many useful formulas for numerical integration are listed by P.. Advanced Excel for scientific data analysis as well as different perspectives. And a quite extensive collection of methods can be found in M. . NBS 1964. Jain. 1. Wiley 1979. Davis & I. devotes three full chapters to the numerical integration of differential equations. another highly recommended book. Handbook ofMathematical Functions. Stegun.
use AltuF11 (Mac: OptuF11). and return the result to the spreadsheet. use a split screen to place the module next to or below the spreadsheet. resembling earlier versions of Fortran such as Fortran77. If Option Explicit appears at the top of your macro module. In early versions of Excel. Macros are written in VBA. followed by (on the Visual Basic Toolbar) Insert ~ Module.11. In order to open a new module. This chapter will demonstrate how to copy spreadsheet data into a macro. modules were like spreadsheets without row and column lines. in order to suit your personal computing needs. you can import preexisting higherlanguage programs into Excel. and now is a competent higherlevel language. disable it by placing an apostrophe in front of it ('commenting it out'). C.) is helpful though not absolutely required. Fortran. but as used here is interpreted line by line. so that you can switch between them merely by moving and clicking the mouse. Thereafter. Subsequently it will illustrate writing a few nontrivial macros. but starting with Excel 97 the modules are hidden in a Visual Basic Editor. toggle back and forth between spreadsheet and module with AltuFll (Mac: OptuFll) or.Chapter 8 Write your own macros Macros make it possible to extend the already quite considerable range of capabilities of Excel. easier yet. The computer code of a custom macro resides in a module. VBA contains a subset of Visual Basic. reachable with AltuF11 (Mac: OptuFll). which stands for Visual Basic for Applications. Moreover. or with Iools ~ Macro ~ Visual Basic Editor. etc. at . Visual Basic is a higherlevel language developed (via Borland's TurboBasic and Microsoft's QuickBasic) from the original Dartmouth Basic (for Beginner's Allpurpose Symbolic Instruction Code). It can be compiled. Earlier exposure to some computer language (Basic. plus instructions specifically designed for interacting with its host application. so that you need not reinvent the wheel. as already mentioned in section 1. It has lost its original line numbers. manipulate them. here Excel.
Unfortunately. alphabetical) order. B As Double. VBA does not allow you to use generic dimension statements. Instead.1 Reading the contents o/a cell Start by letting a macro read a number. to be interpreted/rom right to left. so that you can benefit from its typocatching feature. Instead of cell Val ue you could have used any other pa . and is ignored by the computer. A function also has brackets. c. at least by name. in some logical (or. then make it display that number to make sure it got it right. within type. or even before or after them. (2) The editor recognizes the term Sub (for subroutine) and shows it in blue. Open a spreadsheet. (3) The third line does all the work. you can change to suit your own taste).. open a module. which defines it as a comment line. B. A comment is meant for the convenience of the user. sorted by type and. and type the following code: Sub Read() 'Read the cell value cellValue = Selection. Comments can be placed anywhere inside a macro. C As Variant. Reserved words carmot be used as names of variables. as a reserved word. Dimension statements must occur before the dimensioned parameter is used. and identifY the subroutine as a macro. (3) The second line starts with an apostrophe. Despite the (symmetrical) equal sign. here Read. but these are seldom empty. B As Double specifies B as doubleprecision but dimensions A by name only. The computer screen will show it in green (assuming you use the default colors which. and to mean: take the value of the highlighted selection. Then reactivate it. such as sub 1 or mySub. and letting the user know that the editor is working.378 R. like almost everything else in Excel. de Levie. for substantial macros it is best to place all dimension statements at the top of the macro. i. In that case you must dimension every parameter used. except following a line continuation. each parameter must be dimensioned individually: Dim A. and preferably (because that makes the computer mn more efficiently) as Dim A As Integer. but variations on them. Advanced Excel for scientific data analysis least until you have read most of this chapter. or even to bundle dimension statements on a single line with a single declaration. and then assign it to the parameter named cellValue. thereby identifying it as a comment. the editor reads it as an assignment. by statements such as Dim A. Comments need not occupy an entire line: any text to the right of an apostrophe is considered a comment.e. The empty brackets following the name are necessary.Value MsgBox "The cell value is " cellValue & End Sub Notes: (1) The first line defmes the start of the macro and its name. can be. otherwise. 8.
Ch. 8: Writeyourown macros
379
rameter name, such as y or unknown or ThisIslt, but not a word the editor recognizes as an instruction, such as value. (Capitals or lower case makes no difference here. Except for literal quotes, VBA is not case sensitive, i.e., it ignores the difference between capitals and lower case letters.) By adding the prefix cell we avoid this problem; any other prefix, suffix, or other name modification such as myValue, thisval ue, or Value3 would also do the trick. Parameter names cannot contain any empty spaces, but we can improve their readability by using capitals, as in cellvalue or Thisls It, orbyusing underscores. (4) The fourth line is our check that the macro indeed reads the highlighted value. Upon execution ofthe macro a message box will appear in the spreadsheet, with the message The cell value is exactly as it appears within the quotation marks in the instruction, followed by its read value. The empty space at the end of the quoted text string separates that text from the subsequent parameter value. The ampersand, &, both separates and ties together (concatenates) the two dissimilar parts of the line: text and contents displayed. The text is helpful as a reminder of what is displayed, but is optional: MsgBox cellValue would also work. (5) The last line specifies the end ofthe macro, and will be shown in blue. Recent versions of Excel write this line for you automatically as soon as you enter a line defining a macro name, so you may not even have to type this line.
In order to run this macro, exit the Visual Basic Editor with AltuFl 1 (Mac: OptuF11), place a number somewhere on the spreadsheet, and enter it. Select Iools ~ Macro ~ Macros, and doubleclick on Read. You should now see a message box that shows the value just entered in the spreadsheet. Change this number, and check that the macro indeed reads it correctly. Verify that it only reads the contents of the active cell, the one identified by a thick border. Now go back to the Visual Basic Editor with AltuFll (Mac: OptuF 11), make the changes indicated below in bold, then run the macro. (The editor does not recognize bold characters, which are only meant for you, to identify the changes.)
Sub Read() 'Read & change the cell value cellValue = Selection. Value MsgBox "The cell value is " & cellValue cellValue = cellValue * 7 Selection. Value = cellValue End Sub
Notes (continued) : (6) The fifth line is, again, an assignment, to be interpreted as: take the old value of cellValue, multiply it by 7, and make that the new value of cellValue. Again the equal sign acts like an arrow pointing from right to left, as in cellValue <= cellValue * 7.
380
R. de Levie,
Advanced
Excel
for
scientific
data
analysis
(7) The sixth line is again an assigrnnent: it takes the new value of cellValue, and writes it in the highlighted cell ofthe spreadsheet. This is therefore a writing instruction, to be compared with the reading instruction cellValue = Selecti 0 n . Value. In both cases, the directionality of execution is from right to left.
Try the macro, and verify that it indeed keeps multiplying the value in the highlighted cell by 7. Play with it by, e.g., changing the instruction in cellValue + 2, or whatever suits the fifth line to cellValue your fancy. It is usually undesirable to overwrite spreadsheet data. Therefore, modify the macro as indicated below so that it will write its output below its input, then run this macro.
Sub Read() 'Read & change the cell value cellValue = Selection. Value 'MsgBox "The cell value is " & cellValue cell Value = cellValue * 7 Selection.Offset(l,O) . Select Selection.Value = cellValue
End Sub
Notes (continued): (8) Verify that you can write the output next to the input cell with Offset (0, 1) instead of 0 f f s et ( 1 , 0 ) , and that you can place it anywhere else on the spreadsheet (as long as you stay within its borders) with 0 f f set (n, m) where nand m are integers. A negative n moves the output up, a negative m moves it to the right. (9) You can now delete the line specifying the message box because, by displaying the answer, the spreadsheet shows the macro to work. Instead of deleting the line containing the message box, you can also comment it out by placing an apostrophe in front of the line, so that the editor will ignore it. That was done here. In this way you can easily reactivate the message box in case you need it during troubleshooting. But don't forget to remove such auxiliary lines in the final, fmished version ofthe macro: the fmished painting need not show the underlying sketch.
There is more to a cell than the value it displays: there may be a formula that generates this value, and you may also want to read the cell address. In order to extract all these pieces of information from the spreadsheet, modify the macro as follows:
Sub Read () 'Read the cell address, formula, and value
cellAddress = Selection.Address MsgBox "The cell address is " & cellAddress
Ch. 8: Writeyourown macros
cellFormula = Selection.Formula MsgBox "The cell formula is " & cellFormula cellValue = Selection.Value 'MsgBox "The cell value is " & cellValue cell Value = cell Value * 7 Selection. Off set (I, 0) . Select Selection.Value = cellValue
381
End Sub
Notes (continued): (10) Again, address and formulaarereserved terms recognized by the Visual Basic Editor, and therefore cannot be used without modification as parameter names.
8.2 Reading & manipulating a cell block
Reading the contents of a highlighted cell block or array (terms we will here use interchangeably) is as easy as reading that of a single cell, but using a message box to verify that the array was read correctly may be somewhat more tedious. Open a spreadsheet, open a module, and type the following code:
Sub
ReadArrayl ()
arrayAddress = Selection.Address MsgBox "The array range is " & arrayAddress arrayValue = Selection.Value MsgBox "The value of cell (1,1) is & arrayValue(l,l) MsgBox "The value of cell (5,2) is " & arrayValue (5, 2) arrayFormula = Selection. Formula MsgBox "The formula in cell (1,1) is & arrayFormula (1,1) MsgBox "The formula in cell (5,2) is .. & arrayFormula (5, 2)
End Sub
Note: (1) The array elements are always specified as row first, then column. As a nmemonic, use the RCtime of an electrical circuit: R(ow) followed by C(olunm). This conforms to the standard way indices are assigned in matrices, as in
Test the above macro as follows. Return to the spreadsheet with AltvFll (Mac: OptuF11), and deposit, say, 1, 2, 3,4, and 5 respectively in cells Al:A5, and 6, 7, 8, 9, and 10 in Bl:B5. In cell Dl deposit the instruction =sqr (Al), and copy this instruction to Dl:E3. Then highlight D 1:E5, and call the macro.
382
R.
de
Levie,
Advanced
Excel
for
scientific
data
analysis
Notes (continued):
(2) In Excel, the instruction for taking the square root ofx is sqrt(x), but in VBA it is sqr(x). There are more of such incongruencies between the two, because they started off independently, and were subsequently joined. We have listed several ofthese differences in section 1.16, and occasionally will alert you to them as we encounter them.
Now that we know how to read the data in an array, we will use the macro to modify them. Enter and run the macro shown below:
Sub Cubel () 'Cube all array elements For Each cell In Selection. Cells cell. Value = cell. value 3 Next cell
A
End Sub
Notes (continued):
(3) Here you encounter a For ... Next loop, which performs an operation repeatedly until all the cells have been acted on. Note the indentation used to make it easier for the user to identify the instructions inside the loop. The editor will ignore such indentations. (4) Cell is not a term the Visual Basic Editor knows, but cells is. See what happens when you replace cell by .;ells in the above macro.
Also enter and run the following macro:
Sub Cube2 () 'Cube all array elements Dim Array2 As Variant Dim r As Integer, c As Integer Array2 = Selection. Value For r = 1 To 5 For c = 1 To 2 Array2(r, c) = Array2 (r, c) Next c Next r Selection.Value Array2 End Sub
Notes (continued):
A
3
(5) Array is a recognized term, and therefore cannot be used as a parameter name, hence Array2 or some otherwise modified name. (6) The dimensioning in the two lines following Sub cube2 () is not strictly necessary (as long as Option Explicit is not used), but it is good to start the habit of dimensioning early. (7) Here we use two nested loops: for the first value of c the inner loop is executed until all values of r have been used, then the process is repeated for subsequent values of c.
Ch. 8: Write your own macros
383
(8) You can let the Visual Basic Editor find the lower and upper bounds of the ranges with For r ~ LBound(Array2, 1) To UBound(Array2, 1) and For c ~ LBound (Array2, 2) To UBound (Array2, 2), in which case you need not specify these bounds when you change the array size. (9) Another way to leave the array size flexible is to let the macro determine it explicitly, as in Sub Cube3 () 'Cube all array elements Dim Array2 As Variant Dim c As Integer, cMax As Integer Dim r As Integer, rMaxAs Integer Array3 = Selection.Value cMax = Selection.Columns.Count rMax = Selection.Rows.Count For r = 1 To rMax For c = 1 To cMax Array3 (r, c) = Array3 (r, c) Next c Next r Selection. Value Array3
End Sub
A
3
Test Cubel versus either Cube2 or Cube3, as follows. Return to the spreadsheet. In cells Al through AlO as well as in cells BI:BIO deposit the numbers I through 10. In cells Cl and DI deposit the number 1, in cell C2 the instruction =C1 +1, and copy this instruction to C2:DlO. Highlight Al:AlO and call Cube2 or Cube3, then highlight C1:CIO and apply Cube2 or Cube3 again. Thereafter, highlight B 1:B 10 and call Cubel, then do the same with Dl:DlO. The results obtained in columns A through C should be identical, but those in column D will be way off. In fact, the program will almost certainly crash. (When the overflow error message appears, just press End.) What has happened here? When the contents of cell D2 are cubed to the value 23 = 8, the contents of cell D3 are changed to 8 + 1 = 9, and when that is cubed we obtain 9 3 = 729. Then D4 is changed to 729 + 1 = 730, and subsequently cubed to 730 3 = 389,017,000, whereupon D5 is modified to 389,017,001, which is again cubed, and so on. By the time we have reached cell D7 we have exceeded the numerical capacity of the spreadsheet, and the program overflows! The problem with Cubel is that it does not give us any control over the order in which it operates. In the present example, that order is not what we had intended. Obviously, Cubel has the appeal of a more compact code. However, the more cumbersome code of Cube2 or Cube3 is more reliable, because
384
R.
de
Levie,
Advanced
Excel
for
scientific
data
analysis
it first internalizes all input the values before it computes any results. Rather pedestrian code may sometimes be preferable over its more 'clever' counterpart!
8.3 Numerical precision
In the module, make a copy ofCube3, then modify this copy as follows:
Sub Root3 () 'Take the cube root of all array elements
Dim Array3 As Variant Dim c As Integer, cMax As Integer Dim r As Integer, rMax As Integer Array3 = Selection.Value cMax = Selection.Columns.Count rMax = Selection.Rows.Count For r = 1 To rMax For c = 1 To cMax Array3 (r, c) = Array3 (r, c) Next c Next r Selection.Value Array3 End Sub
"1 / 3
Return to the spreadsheet in order to test this macro. Make two test arrays containing some simple numbers, such as 1,2, 3,4, 5 in Al:A5, and 6,7,8,9, 10 in Bl:B5, then copy these data to A7:Bll. In D4 enter the instruction =AIA7, and copy this instruction to D4:E8. Call Cube3 and apply it, three times in succession, to the data in Al :B5. Then call Root3 and apply it, again three times, to the resulting data in that same block. You should of course end up with the original data. Is that what you see in D4:E8? What is going on now? The answer to the above riddle is that Excel always, automatically, uses double precision, but VBA does not, unless it is specifically told to do so. Therefore, force it to do so, as in
Sub Root3 () 'Take the cube root of all array elements Dim Dim Dim Dim Array3 As Variant c As Integer, cMax As Integer r As Integer, rMax As Integer p As Double
p = 1 / 3 Array3 = Selection.Value cMax Selection.Columns.Count rMax = Selection.Rows.Count
Ch. 8: Writeyourown macros
For r = 1 To rMax For c = 1 To cMax Array3 (r, c) = Array3 (r, Next c Next r Selection. Value = Array3 End Sub
385
c)
"p
Verify that you can now repeatedly cube a function, and subsequently undo it by taking the cube root, without accumulating unacceptably large errors.
8.4 Communication boxes
VBA uses three types of dialog boxes to facilitate communication between the macro and the user: message boxes, input boxes, and userdefined dialog boxes. We will describe message and input boxes, which are easy to use, and can do anything you need to write your own macros. Dialog boxes are more versatile and professionallooking, and can be more userfriendly. However, they are also considerably more complex to set up (although the latest versions of Excel have reduced that complexity), and for that reason they are not discussed here. For commercial software, however, dialog boxes are clearly the way to go, because they provide the programmer maximal control over data input and output, and the user an overview of all choices made before the macro takes off.
8. 4.1 Message boxes
Message boxes can carry a simple message,
MsgBox a
where a represents a value calculated by the macro, or
MsgBox "welldone!"
or they can combine a message with specific information, as in
MsgBox "Excellent! The answer was "
&
a
In the latter example, note the use of an ampersand, &, to concatenate the two parts, i.e., to separate the textual message (within quotation marks) from the output value as well as to link them in one instruction. Message boxes can also be used for (limited) information flow from the user to the macro, as III
Sub QandAl ()
, < Space for questions Response = MsgBox ("No, the answer was " & 3 & Chr (13) & "Do you want to try again? " vbYesNo)
386
R.
de
Levie,
Advanced
Excel
for
scientific
data
analysis
If Response End Sub
vbNo Then End , < Continue with questions
Notes: (1) Chr ( 13) continues on a new line, and is therefore equivalent to a typewriter 'carriage return'. Alternatively you can use the somewhat shorter vbCr. Other often used characters and/or their alternative abbreviations are Chr (9) or vbTab for tab Chr (10) or vbLf for linefeed Chr (11) or vb VerticalTab for tab Chr (12) or vbFormFeed for page break vbCrLf for carriage return plus linefeed Chr (14 9) for bullet Chr (150) for en dash () Chr (151) for em dash () (2) The line continuation symbol is a space followed by an underscore. It cannot be followed by any text, including comments. (3) In the above example, the message box will display two buttons, labeled Yes and No respectively, instead ofthe usual OK button. (4) The instruction starting with If Response = is a oneline If statement, and therefore does not require an End If. An alternative, more explicit form is
Sub QandA2 ( )
, < Space for questions > Msg = "No, the answer was 3. Try again?" Style = vbYesNo Title = "Quizz" Response = MsgBox (Msg, Style, Title) If Response = vbNo Then MsgBox "Sorry, this question netted" &Chr(l3) & "you no additional points." , < Continue with questions > End Sub
continued line starting with If Response
(5) In this case, End I f is still not needed because the compiler considers the = as one.
8.4.2 Input boxes
Input boxes can transfer numerical information to the macro:
Sub
DataInputl () "Number")
yourChoice = InputBox ("Enter a number", MsgBox "The number is " & yourChoice End Sub
Notes:
(1) Again we use a message box to verify that the input box works. (2) If you want to use a longer message, and therefore want to break and continue the line starting with yourChoice =, the quotation marks must be closed
Ch. 8: Write your own macros
and reopened, and the ampersand must be used to concatenate (separate and link) the two parts ofthe text message, as in
yourChoice = InputBox ("After you have finished this,"
&
387
"enter a number"
1
"Number")
or you can make it into a narrower, taller box with
yourChoice = InputBox ("After you have finished this," & Chr (13) & "enter a number", "Number")
Input boxes can have default values, and can be followed by extensive verification schemes such as
Sub DataInput2 () Message = "Enter an integer between 1 and 100;" Title = "Integer" Default = "25" inputValue = InputBox(Message, Title, Default) If inputValue < 1 Then MsgBox "The selected number is too small." End End I f If inputValue > 100 Then MsgBox "The selected number is larger than 100." End End I f If inputValue  Int(inputValue) <> a Then MsgBox "The selected number is not an integer." End End I f MsgBox "You entered the number " & inputValue End Sub
The above macro is rather unforgiving of data entry mistakes, because any entry error forces the user to start anew. A friendlier approach gives the user a number of chances to get it right, as m
Sub InputANumber() Tries = a MaxTries = 5 Message = "Enter an integer" & Chr(13) & "between 1 and 100;" Title = "Integer" Default = "25" Do myValue = InputBox(Message, Title, Default) Tries = Tries + 1 If Tries > MaxTries Then End If myValue < 1 Then MsgBox "The selected number is too small." If myValue > 100 Then MsgBox _
388
R.
de
Levie,
Advanced Excel for
scientific
data
analysis
"The selected number is larger than 100." If myValue  Int(myValue) <> 0 Then MsgBox "The selected number is not an integer." Loop Until (myValue >= 0 And myValue <= 100 And myValue  Int(myVa1ue) 0) MsgBox "You chose the number .. & myValue End Sub Notes (continued):
(3) This method allows the user to correct faulty entries without aborting the macro. The maximum number MaxTries oftries must be specified beforehand. (4) Comparison with our simple starting macro, Read (), indicates the large part of a macro that may have to be devoted to data input controls. This often happens, regardless of whether the input is defined by highlighting, or by using an input box. The amount of code devoted to data input is even larger when using a dialog box as the input stage. (5) The editor will again interpret the above I f statements as oneliners, which do not require End I f statements. Note that such an End I f statement must be written as two words, whereas E loSe I f should be written as one word.
Convenient data entry of arrays often involves the highlight & click method, in which we read a range, as in
Sub InputARange () Dim myRange As Range Set myRange = Application. InputBox (Prompt:="The range is:", Type:=8) myRange.Select End Sub
where Type: =8 indicates what input information should be expected. Type: =8 denotes a range, i.e., a cell or block reference rather than, say, a formula (Type: =0), a number (Type: =1), a text string (Type: =2), a logical value (i.e., True or False, Type: =4), an error value (such as #N/A, Type: =16), or an array (Type: =64). For more than one possible type of input, add the various numbers: Type: =9 would accept either a number or a range. In order to verify that we have indeed read the range correctly, we might add the temporary test code
Dim c As Integer, cMax As Integer Dim r As Integer, rMax As Integer Dim myValue As Variant cMax = myRange.Columns.Count rMax = myRange.Rows.Count MsgBox "The range is " & myRange. Item (1, 1) .Address & ":" & myRange.Item(rMax, cMax) .Address myValue = Selection. Value
Ch.8: Writeyourown macros
For c = 1 To cMax For r = 1 To rMax MsgBox "Address number " & r & " , " & c & " is " & Selection.ltem(r, c) .Address & " and contains the value " & myValue (r, c) Next r Next c
389
Notes (continued): (6) The nested For ... Next loops will display all cell addresses and their corresponding numerical values, one at a time, because the OK button on the message box must be pressed after each display. (7) The output first lists the contents of the first column, from top to bottom, then that ofthe second column, etc. If you want the output to read the rows first, just interchange the lines For c = I To cMax and For r = I To rMax, and similarly Next c and Next r. You are in complete control.
Another way to test for proper reading is to copy the input values to another array, to be printed out elsewhere on the spreadsheet. This method makes comparison of the original and its copy somewhat easier. Here is the resulting macro:
Sub InputARange() Dim c As Integer, cMax As Integer Dim r As Integer, rMax As Integer Dim myValue As Variant, myRange As Range Set myRange = Application.lnputBox (Prompt:="The range is:", Type:=8) myRange.Select cMax = myRange.Columns.Count rMax = myRange.Rows.Count ReDim myValue (1 To rMax, 1 To cMax) For c = 1 To cMax For r = 1 To rMax myValue(r, c) = myRange.Value(r, Next r Next c
c)
Selection. Offset (0, cMax + 1). Select Selection.Value = myValue Selection. Offset (0, cMax  1) .Select End Sub
Test the above macros by placing a small block of arbitrary numbers somewhere on the spreadsheet. Highlight it, then use the macro to read the range addresses and their cell values.
390
R.
de
Levie,
Advanced
Excel for
scientific data
analysis
8.5 Case study 1: the propagation o/imprecision
Socalled 'error' propagation deals with the transmission of experimental imprecision through a calculation. Say that we have a function F(x) which is computed from a single parameter x. We then want to calculate the imprecision ±M' in F resulting from the (assumedly known) imprecision ±4x in the parameter x. For the usual assumption Ax « x we have !1F dF 0::::(8.5.1)
ill
dx
so that the magnitude ofM' is given by (8.5.2) or, in terms of standard deviations, (8.5.3) Spreadsheets cannot compute the algebraicformula for the derivative dF/dx(as, e.g., Mathematica or Maple can), but they can find its numerical value, which is all we need. We do this by going back to the definition of the differential quotient as
dF
=:
dx
~tO
lim M' = lim F(x+ill)F(x) /u /lrtO I1x
(8.5.4)
Therefore we calculate dFldx by computing the function F twice, once with the original parameter x, and subsequently with that parameter slightly changed from x to x + tir, using the Excel function Replace. We then divide their difference by the magnitude of that change, Ax. When Ax is sufficiently small (but not so small that Ax itself becomes imprecise), (8.6.4) will calculate the value of dFldx without requiring 8 any formal differentiation! A value for Axlx between 10{) and 10 satisfies the requirements that Ax « x while Ax is still much larger than the 14 truncation errors of the program, which in Excel are of the order of 10 • We will write the macro in two parts: first the data input stage, then the actual calculation. Here we go:
Sub Propagation() , Read the xvalue
Value . Read the resulting change in F FRange.lnputBox ("The formula F is:".Select NewFValue = Selection. . NewXValue MsgBox "The new value for x is 11 & NewXValue . Replace NewXValue.Select NewXValue = XValue * 1. Value 1 Verify that x. XValue . Reset x XRange.000001)) MsgBox "The standard deviation in F is " & SFValue . Formula FValue = Selection. comment out the verification section. InputBox (Prompt:="Thevalue of x is:".Select Selection. and insert the following code on the linesjustabove End Sub. Change x XRange.Select FFormula = Selection. 8: Writeyourown macros Dim XRange As Range Set XRange = Application. Value MsgBox "The new value for F is " & NewFValue . Compute the standard deviation SF in F SFValue = Abs«NewFValue .Ch.lnputBox (Prompt:="Thestandard deviation sis:". s. Test this section.FValue) * SValue / (XValue * 0.000001 Selection. Type:=8) XRange. Read the corresponding standard deviation s Set SValue = Application. Type:=8) FRange.Replace XValue. and F are read correctly MsgBox MsgBox MsgBox MsgBox End Sub "The "The "The "The value of x is 11 & XValue standard deviation in x is 11 & SValue formula has the value 11 & FValue formula reads 11 & FFormula Note: (1) Set XRange uses a more elaborate version ofthe input box.Select XValue = Selection. I 391 Type:=8) Read the formula F and its value Dim FRange As Range Set FRange = Application. When the inputpart ofthe program works correctly.
except that it can also handle functions of multiple parameters. see. de Levie. nontrivial scientific macro! The macro Propagation in the MacroBundle is essentially the same as the one you have just written. and will write its results onto the spreadsheet unless that action would risk overwriting valuable data. can often be a preferred alternative because. de Levie. as long as there is one and only one root in the specified interval. See whether you can now follow its logic. reliably and without external assistance. though in principle somewhat less efficient. Chern..392 R. We start with a set of data y(x) as a function of a monotonically changing variable x in an interval within whichy(x) crosses zero once. such as F(x) = x 3 or Vx. Educ. and much longer. 6 (2001) 272.g. for which dF I dx = 3x2 or 1 / (2 Vx) respectively. delete all message boxes except for the last one. While such a general expression is conceptually and formally quite simple. We will label the first value ofy(x) as Yfirs/. but the principle of the method is the same. (The test will of course use numbers rather than symbols. such as a 'universal' buffer mixture. can use either standard deviations or the covariance matrix as input. but that often requires some manual guidance. A bisection method. where Yfirs/ and Ylas! have different signs so that the product Yfirs/ X Ylas/ . you have now written a useful. just as you would break up a long chapter into paragraphs with subtitles. and retains some semblance of readability by being broken up into smaller sections. The pH of any acidbase mixture can be formulated in general terms. each with their own comment headings. 8. its explicit form in terms of the proton concentration [H+] will be quite complicated when applied to a multicomponent mixture of monoprotic and polyprotic acids and bases. e. Advanced Excel for scientific data analysis Again. In such cases we can use Solver (or GoalSeek) to find the root.10. Congratulations. reflecting the complexity of the chemical equilibria involved. and its last value as Ylas/. Those 'extras' make the macro somewhat more complicated. bisection will always find it.) When everything works well. An example might be to compute the pH of a complicated mixture. R. now with some numbers and equations for which you know how the uncertainty propagates. The macro is shown in section 9. test the program. Avoid numbers such as 0 or 1 that often yield noninformative answers.6 Case study 2: bisection Often we need to find the root ofa complicated expressiony(x) within a welldefined interval ofxvalues.
Xroot As Double Xfirst As Double. ifyfirstxynew is positive. If the product Yfirst x Ynew is negative. and arrange its input format to suit your own taste. and the rightmost column the independent variable x. Flast As Double Fnew As Double. you can readily modify the macro. on the other hand. Thus we reduce the size of the interval X/asf . and the input process has been reduced to highlighting the data array and calling the macro. The macro shown below performs just one step.Ch. in that order. at which point Xnew can be taken as the sought answer. Consequently the macro need not 'know' the equation used for y as a function of x. there is no need for input boxes. In the present example the input block should contain two columns. the root lies in the interval between xfirs! andxnew . by all means change the macro to suit your taste. nc As Integer.xfirst. Xnew will replace xfirst.Value . or a single multiinput dialog box. even though it is somewhat less convenient for making graphs. SpareValueArray As Variant nc nr = = Selection. That assignment was chosen here to be consistent with the data arrangements in the custom least squares macros of the MacroBundle. We then compute Ynew for Xnew = (Xjirst+ Xlast) / 2. of which the leftmost must contain the function y(X) . so that you can see the progress of successive bisections by calling it repeatedly. which subsequently computes the correspondingyvalues that are then read back into the macro. SpareFormulaArray As Variant Dim ValueArray As Variant. If you prefer to use input boxes. Fres As Double Xdif As Double.Count . and we replace X/ast by Xnew. Xiast As Double. Formula ValueArray = Selection. and therefore can be directly applicable to any data set that includes one zero crossing. nr As Integer Dim Dim Dim Dim Ffirst As Double. 8: Writeyourown macros 393 must be negative.Columns. Formula SpareFormulaArray = Selection. Since the computer only needs the input data. The downside ofthis approach is that the input block must be organized in a set way so that the macro can interpret the data unambiguously. and this process can be repeated until the interval is sufficiently small.Rows. But then. Sub RootFinder () Dim i As Integer. The macro writes new xvalues into the spreadsheet.Count Selection. Read the input from the highlighted block FormulaArray = Selection. Xnew As Double Dim FormulaArrayAs Variant.
Formula = FormulaArray End I f . When a function exhibits two or more distinct zero crossings.Xfirst) * (i .Xnew) * (i . 2) Xlast = ValueArray(nr.Xfirst Xnew = (Xfirst + Xlast) / 2 FormulaArray(2. 2) = Xnew Selection. In order to facilitate such applications to parts of a data set. make the xincrements sufficiently small to separate the roots. the output is provided in a message box. 1) Selection. such as y = 2 Ix .1) Next i Selection.a1 .Value • Bisection Ffirst = ValueArray (1. and restores the original data before the screen is finally updated to display the root. and then apply RootFinder separately to xintervals that each contain only one root. 2) = Xnew + (Xlast . de Levie.. 2) = Xfirst + (Xnew . holds the screen image constant during the calculation. Value Fnew = ValueArray(2. Formula ~ SpareFormulaArray If Ffirst * Fnew > 0 Then For i = 1 To nr FormulaArray(i. with both positive and negative values for XX) bracketing the root at y = O. i. Display result MsgBox "Xroot = " End Sub & Xnew Note that RootFinder requires a zero crossing.1) Next i Selection.394 R. Therefore it will not find the root of functions that merely touch the abscissa. 1) Flast = ValueArray(nr.1) / (nr . for whichy > 0 with a root atx = a.e. Formula = FormulaArray End I f If Ffirst * Fnew < 0 Then For i = 1 To nr FormulaArray (i. saves the original data. . 1) Xfirst = ValueArray(l. Formula = FormulaArray ValueArray = Selection. The version incorporated in the MacroBundle makes a few initial checks. The point of exit of the DO loop depends on the resolution criterion F res . which for optical results with a particular problem may have to be changed in value or made absolute rather than relative. uses a DO loop to automate the process. Advanced Excel for scientific data analysis SpareValueArray = Selection. 2) Xdif = Xlast .1) / (nr .
Val ue as input statement. ~ Excel already provides a Fourier transformation routine. Consequently there is no need for input boxes.7. more importantly. You therefore would have to work hard merely to get the output in a form useful for. Flannery. Left for you to write are. The socalled fast Fourier transformation is most efficient when applied to a number of data points that is an integer power of 2. While such defects can be corrected or circumvented. the computer needs no further information. B. Press. the Numerical Recipes by W. Fourier transformation is in principle a simple matter: once the input data have been provided. subsequent calculations or a graph. 8. There are many places in the literature where a Fourier transform subroutine can be found. in this case. Teukolsky. are therefore restricted to 2N data where N is a positive integer. it is sometimes easier to avoid them altogether by starting afresh. Selection. Here we will use a particularly convenient source. S. its output is coded as labels. as already illustrated in section 8. and most software packages. 7 Case study 3: Fourier transformation The next example will illustrate a somewhat different approach to writing macros. from which the data need to be extracted with =IMREAL () and = IMAGINARY (). e. . plus any error checks to prevent the most common operator mistakes. and an output statement. under Tools Data Analysis ~ Fourier Analysis. 8: Writeyourown macros 395 8. and not very serviceable: the Excel routine is not properly scaled (transformation followed by inverse transformation does not return the original) and. A.. and have the Fourier transform routine in the middle. If you leave out the latter. including this one.6. the wheel is rather wobbly. you may be dismayed later to fall into a trap of your own making. H.Ch. so why reinvent the wheel? The reason is that.g. the VBA code to read the data into the macro and to return the results to the spreadsheet. You will take the core code from the literature.1 A starter macro We start by putting together a rudimentary yet working Fourier transform macro by providing a simple input statement. That is what we will illustrate here. P. then. a routine taken from the literature. Value = dataArrayas output instruction. and the input process can be reduced to highlighting the data array and then calling the macro. together with whatever that routine requires. Such a macro might just contain dataArray = Selection.
and afterwards to decode its singlecolumn output into two columns for the spreadsheet. 1) dataArray(r. Advanced Excel for scientific data analysis and W. at the same time. and copy it onto your computer. Open a new spreadsheet. we will start with two input columns. in a format that you can use as is because it is fully compatible with VBA. and returns the answer in the same format. 2) = = Term(2 * r . T. 1) Term(2 * r) = dataArray(r.396 R. 1. Read the input Dim dataArray As Variant dataArray = Selection. or C++) by Cambridge University Press. Fourier () Determine the array length Dim r As Integer. Rearrange the output iSign) For r = 1 To rMax dataArray(r. . Since such a sequence is unsuitable for graphs. 2 * rMax. one each for the real and imaginary signal components respectively. This book not only gives many useful routines. i.1) = dataArray(r. which requires that the input data are arranged as alternating real and imaginary terms. Rearrange the input ReDim Term(l To 2 * rMax) As Double For r = 1 To rMax Term(2 * r .Count . 2) Next r . Value . And. as input for the subroutine. We will then use the macro to rearrange the data from two colunms to one.Rows. de Levie.. in an accompanying booklet complete with diskette. Likewise we will use two output columns. as we will shortly see. rMax As Integer rMax = Selection. Cambridge University Press 1991. Therefore. describe a subroutine called FOUR1. Vetterling. You will find that Press et al. Sprott has provided these routines in Basic. but also presents a very lucid explanation of their uses and limitations. and type: Sub . Pascal. C.1) Term(2 * r) . also get yourself a copy ofthe Numerical Recipes (preferably the Fortran77 version that was the source for Sprott's Basic programs) for useful (and sometimes. Call the subroutine Dim iSign As Integer iSign = 1 'Call Fourl (Term. quite essential) background information. Moreover. get hold of the diskette of Sprott's Numerical Recipes: Routines and Examples in BASIC. published in several versions (such as Fortran.e.
dataArray (r. i. but in general it is a good practice. and putting them back into separate columns. Value. this part can be checked out first. it can make the computer run more efficiently. you will have dimensioned all variables. . When in doubt about its proper data type. provided that you type them everywhere in lower case. Therefore. By initially commenting out the subroutine call. For another.g.2. or use its slightly modified form as shown below. Although it is good programming practice to use the dimension statement to specify the data type (e. Value = dataArray. When you see the computerinserted capitals pop up on your screen you will know that the Visual Basic Editor has recognized the name. except in the dimension statement. etc. We already encountered this type of code at the end of section 2. 8: Writeyourown macros Next r . Select in order not to overwrite the input data. Rows. at least dimension the variable by name.e. You need not do so (as long as you do not use Option Explicit). and call the subroutine.Offset(O. (2) The macro reads the entire input array with the simple instruction dataArray = Selection. As Integer or As Variant). one each for rows and columns. Count is a convenient instruction to find the length of the input columns. you do not need do so to get this advantage. Now place the subroutine FOURl after the macro. (7) Offset the highlighted array with Selection. The everalert Visual Basic Editor will then convert any lower case letters to the corresponding capitals in accordance with the dimension statement. (4) An Array has two dimensions. Offset (0.. as done here. because Option Explicit will catch most misspelled variable names. For one thing. (5) SpecifY any as yet undefined parameters that Fourl () may need (here: iSign). 2).Value = dataArray End Sub 397 Notes: (1) Selection. 1mj. where Ren and 1mn are the real and imaginary parts ofthe nth data point. (8) If you have followed the above. (9) The typo alert works whenever you use variable names that contain at least one capital.Ch. 1m2. (6) Unscramble the output by taking the alternating real and imaginary components from their single file. 2). Select Selection. Re2. it can alert you to some typos. unless your typo specifies another parameter name you have also dimensioned. and return the result to the spreadsheet with Selection. its the leftmost one. which from now on we will follow in all our examples.. separately..5. 1) refers to the cell at row number r and column 1 of the array. Most typos will be caught this way. (3) The next few lines put the input data in the required format: Rej. Write the output data Selection.
mmax As Integer 1 = 1 To nn Step 2 > i Then tr = Term (j) ti = Term(j + 1) Term(j) = Term(i) Term(j + 1) = Term(i + 1) Term(i) = tr Term(i + 1) = ti End I f m = Int(nn / 2) While m >= 2 And j > m j = j . nn.wi * Term(j + 1) ti = wr * Term(j + 1) + wi * Term(j) Term(j) = Term(i) . ti As Double.398 R.tr Term (j + 1) = Term (i + 1) .m m = Int(m / 2) Wend j = j + m Next i If j For i mmax 2 While nn > mmax istep = 2 * mmax theta = 2 * [Pi () 1 / (iSign * mmax) wpr = 2 * SIN(O. de Levie. wr As Double wpi As Double. istep As Integer.5 * theta) A 2 wpi = SIN (theta) wr = 1 wi = 0 For m 1 To mmax Step 2 For i m To nn Step istep j=i+mmax tr = wr * Term(j) . Dim Dim Dim Dim Dim j = tr As Double. theta As Double wtemp As Double. Advanced iSign) Excel for scientific data analysis Sub Four1(Term. wpr As Double i As Integer.wi * wpi + wr wi wi * wpr + wtemp * wpi + wi Next m mmax istep Wend End Sub .t i Term(i) = Term(i) + tr Term(i + 1) = Term(i + 1) + ti Next i wtemp = wr wr wr * wpr . wi As Double. As Integer m As Integer.
(The word Call in the line calling the subroutine is optional. as in theta = 2* Application. in the second row and at the bottom of the first output column.5.11. especially the use of double precision. As already emphasized in section 8. 8 in the second row. we merely invoke the spreadsheet function Pi () by placing it between straight brackets. You should find two nonzero points. Scaling: The Fourier transform ofy = cos(Jr x18) for x= 0 (1) IS should yield two nonzero points. b.5) respectively. (1 1) Instead of spelling out the numerical value of Jr.5j at 1/8 respectively.5. A more general way to do this is to precede the function by 0. each of magnitude 0. because it makes the macro easier to read. and realizing that the second column represents the imaginary components. Normalization through division by 16. Remove the apostrophe in front of the Call statement. We should build this division into the macro. Keep in mind that any numbers smaller than about 10. upon consulting a standard book on Fourier transformations. Pi 0 / (iSign*mmax). see section 5. viz. (12) On the same line you may notice a sign change. However. and test the entire macro. and +0. Here that number is 16.Sj (wherej = vIl)atf= +1/8. The cosine is symmetrical around x = 0. of magnitude O.) As test data you might first use YRe = cos( Jr x18) and Ylm = 0 for x = o (1) 15.Ch. This convenient trick works for all builtin Excel spreadsheet functions we may want to use in VBA.14 are likely to be zeros corrupted by computational (truncation and roundoff) errors. Then try YRe = sine Jr x18) and Ylm = 0 for the same x values.2 Comments & embellishments a. you will find that these signs are just the reverse from what they . because the Numerical Recipes use an engineering sign convention for Fourier transformation that is the opposite ofthe common scientific one used here. and +8 in the bottom row. VBA uses single precision unless specifically instructed otherwise.Sj and +0.5) (where) = vi 1) at f = +1/8 and 1/8 respectively. makes these O. the reciprocal of the number of data points transformed. Sign: The Fourier transform ofy = sin(Jr x18) for x = 0 (1) 15 should also be two points. 8. but their magnitudes will be 8 instead of 0. while the sine is not. so that this indeed explains the discrepancy. 8: Write your own macros Notes (continued): 399 (10) The first (and major) modification here is the dimensioning.7. but is highly recommended. You obtained two nonzero points. if you had carefully read the section on FOURI in the Numerical Recipes you would have seen that you must still provide a normalizing factor. in the second output column (which contains the imaginary parts of the output). However.3. and 8/16 = 0.
The following code. Checking the input block dimensions: The Fourier transform subroutine requires that there are 2N data points. Here are such drivers: Sub ForwardFT () Dim iSign As Integer iSign = 1 Call Fourier(iSign) End Sub Sub InverseFT () Dim iSign As Integer iSign = 1 Call Fourier (iSign) End Sub Notes: (1) The difference between a macro and a subroutine is that a macro does not exchange information in its (therefore empty) argument (in the space between the brackets).) (3) The header of the main program must now be changed to Sub Fourier (iSign). A subroutine does. iSign must be dimensioned twice. (2) There is slightly more to combining forward and inverse transforms. c. will accomplish all of these goals. This correction was already made in the modified version shown. (An alternative is to use normalizing factors 1/~N for both the forward and inverse transform. but it is easier to make two small macros. Another look at the Numerical Recipes will show that it uses uncommon definitions of the forward and inverse Fourier transforms. and is easily fixed by placing a minus sign in front of the term iSign in the equation defining theta. and that there are at least 2 rows. and then call Fourier as a subroutine. It is convenient to check ahead of time whether the input range indeed contains such a number of data. I Check the array width Dim cMax As Integer . ForwardFT and InverseFT. d. de Levie. to be placed immediately below the heading ofFourier(iSign). that set i Sign to +1 and 1 respectively. We can use the very same macro for inverse transformation if we change the sign of iSign. opposite from the usual convention. And because the two driver macros are independent. it is time to consider some conveniences. Advanced Excel for scientific data analysis are supposed to be. Driver macros: Now that we have calibrated the macro. We can also make sure that there are only 2 columns. Consequently the problem is in the subroutine.400 R. That can of course be done with an input box. because (in the common formalism we adopt here) the normalizing factor IJN only applies to the forward transform. in the above example both driver macros tell Fourier to use their particular value of iSign.
(4) We here divide Length rather than rMax in order to keep the latter intact for further use. There are often several ways to get the job done. . Then we divide Length repeatedly by 2. by letting the macro take a quick look at that area to make sure it can be used for output. and ends. rMax As Integer n As Integer 0 Dim Dim Dim Dim n = . and redetermine it the next time we need it. Selection _Rows _Count finds the number of rows of the highlighted array.Columns. which should be 2. one for the real component ofthe function to be transformed. Alternatively we could use rMax." . e. Length.Count finds the number of highlighted columns. This should be at least 2. Ifthere is a nonzero remainder. . 8: Writeyourown macros cMax = Selection. (3) The next check makes sure that rMax is an integer power of2. Here is an example of code that will do this. It can be inserted just after the code discussed under (d). Length As Integer rMax = Selection. or before the data output. that we initially set equal to rMax.Rows.Count cMax 0 2 Then MsgBox "There must be 2 End End If If 401 input columns. Columns . Make sure that the output will not overwrite valuable data outputArray As Variant.Count If rMax < 2 Then MsgBox "There must be at least 2 rows. rMax. the number of rows was not an integer power of2.Ch. the other for its imaginary component." End End If Length = rMax Do While length> Length = Length Loop 2 If Length <> 1 Then MsgBox "The number of rows must be" & Chr (13) & "an integral power of two. Otherwise the macro alerts the user ofthe problem. until it becomes smaller than 1. Checkingfor overwrite: It is convenient to place the output of the macro immediately to the right of the input data. z As Double c As Integer. We now make sure that this region does not contain valuable data that would be lost if overwritten." End End If Notes: (1) Selection. (2) Similarly. This is accomplished here by using a second variable. Check the array length Dim rMax As Integer. cMax As Integer r As Integer. and the input is rejected.
It is used here only to see whatever is in the cells where the output should come. Offset (0. Advanced Excel for scientific data analysis Se1ection. f. Therefore it is convenient to extend the block to include the independent variable x. c) If IsEmpty(z) Or z = 0 Then n = n Else n = n + 1 End If Next c Next r If n > 0 Then answer ~ MsgBox ("There are data in the" & Chr (13) & "output space.402 R.Offset(O. or wavelength and wavenumber. to the place where the output will be deposited. the next lines check every cell in that area.: In the Fourier transfonnation F(/):::: I" '" 21rjft I(t) e dt the product of the parameters f and t must be dimensionless. Select moves the highlighted area 2 cells to the right. vbYesNo) If answer = vbNo Then Selection. asking the user whether the data in the output array can be overwritten. and to provide that in a third output column. (2) The next line makes an output array ofthe same dimensions as the input array. 2). and the macro is ended. Can they" _ & Chr(13) & "be overwritten?". the variable n is incremented by one. n was set to zero. Ifthe answer is no. (4) Since Fourier transformation is often applied to fairly large data files. Offset <0. Select End End If End If Notes: (I) Selection.Select OutputArray = Selection. the highlighted area switches back. Given t. and vice versa. g. Whenever a cell in this range is found not to be empty or zero. de Levie. to compute its inverse. Ifnot. since they would be awkward and timeconsuming to use. 2) . we can let the computer calculate f. (3) Now that we have read what is in the output space. to give the user time to move the endangered data out ofharm's way. t runs from 0 to . and at the end we see whether it is still zero. no niceties such as an alternative output via message boxes is provided.Value For r = 1 To rMax For c = 1 To cMax z = outputArray(r. Traditionally. 2). then the message box is activated. Using a zerocentered scale: A related question is that of the scale to be used for the independent variable. as it is in the usual pairs of time and frequency. Convertingtime tofrequencyetc. Initially.
And make sure to distinguish be . but you must repeat the type declaration (as in Dim A As Double. Selection. then continues to run from fmaxl2to 0. We here adopt more rational scales that are continuous. 8: Write your own macros 403 'max. it is convenient to suppress the screen updating until the entire output set has been computed. the entire screen will be updated. Often we will have graphs on the screen in order to visualize the functions. Screen updating should be suppressed only after all input and dialog boxes have been used. the actual fast Fourier transform algorithm can then be shared with ConvolveFT. as in a Fourier transformation of a large data set. it may be convenient to compartmentalize that section of code as a separate subroutine. On the other hand.has a discontinuity. This is a true nuisance of VB A. while f starts at 0. runs tillfmaxl2. However. Suppressing screen display. and then ReDimension them as soon as the macro knows the corresponding array size. When a macro recalculates a data point. j. ConvolveFT could have used a more efficient fast Fourier transform algorithm that can simultaneously transform two real signals rather than one complex one. In the present example. This is accomplished with the instruction Application. ScreenUpdating = False. The macro is getting too long to be typed without errors.) h. Even if you do not know their dimensions yet. even when these statements are on the same line. Modularizing: When you anticipate that a part of the code may also be useful in other macros. since it blocks the convenience of entering ranges into input box windows by the highlight & click method. I. for a computation involving many points. or saving development time by using already existing code modules. Dimensioning: By now you will be ready to dimension your parameters. which can be called by different macros. C As Double) for each and every parameter. This illustrates the tradeoff between optimizing code for each specific application. Value = dataArray. and dimensioning has other benefits: the user can quickly see to what type each parameter belongs. which can be placed just before the output statement. Here are some things to keep in mind. and are centered around zero for both t and f (The final version tolerates both types of input. Usually this is precisely what we want.Ch. dimension array variables As Variant. In dimensioning several parameters you need not repeat the dimension statement Dim for parameters on one line. B As Double. and the computer can operate more efficiently.
Fortunately. You will now appreciate the difference between a leanandmean macro and a wellthoughtout but potbellied one. just dimension it by name but leave the type unspecified. _ Add(cw * 2. cw * 5.4 V'x in B7:B27. though perhaps not optimally efficient.Height Set ch = ActiveSheet. just as an illustration of how much you can manipulate Excel. (For another example see the custom macro Mapper.4 Vxfor x = 0 (5) 100. Advanced Excel for scientific data analysis tween arrays that start counting at 0 and those that start at 1. In the latter. Alternatively. But if you need to make many graphs for a report or a paper. and by then making that the default setting. 8. Of course you can readily make a graph with the Chart Wizard. so that the mouse can stay home for a while. If you are unsure about the proper dimension type.) For the sake of the argument. rh * 1. say that you want to plot the function 2. as described in section 1. The program will run fine. it seldom makes a perceptible difference unless the macros are very computation intensive.6. and want them to be compatible in size and style. First reserve the location of the graph on the spreadsheet with Sub MakeGraph () .ChartObjects. we will write a macro to specify a simple graph. rh As Double cw = Columns (1) .404 R. it may be convenient to have a standard format available. You can do this by creating a graphjust the way you want it to look. rh * 5) End Sub . with today's processor speeds. de Levie. Create an embedded graph in the cell grid Dim ch As ChartObject Dim cw As Double. One can only hope that the added effort pays off in terms of convenience of use and absence of frustrating program hangups.7. where you have placed values for x in cells A7:A27. and computed the corresponding values for y = 2.Width rh = Rows (1) . more code is usually devoted to embellishments than to the primary function of the macro. All of the above embellishments have been incorporated in the Fourier transform macro which you already encountered in chapter 5. and which is fully documented in chapter 10.8 Case study 4: specifYing a graph As our next example we will make a macro to generate a graph.
the rest is finetuning and embellishment.MajorUnit = 20 End With With ch.SeriesCollection. See what happens when you change the parameter valuesto.Range ("A7 :B27") End Sub This short macro will give you a graph of the data in A7:B27. Define the data range: ch. and the specific data to be plotted: Sub MakeGraph () . Define the axis labels: . _ CategoryLabels:=True . rh * 16) . at the chosen place on the spreadsheet.Add(cw * 3. Otherwise we need to specify the placement and dimension of the graph in points.Width rh = Rows(l) . RowCol:=xlColumns. Create an embedded graph in the cell grid Dim ch As ChartObject Dim cw As Double.Chart. rh * 4. rh * 16).ChartType = xlXYScatter .SeriesCollection.Range("A7:B27").. facilitating its subsequent copying into a Word document. with autoscaled axes. rh * 4. Nowdefine the data type.Add Source:=ActiveSheet.Ch.MaximumScale = 25 .Chart. add instructions such as: . rh As Double cw = Columns (1) .g. Select the graph type: ch. SeriesLabels:=True.Chart.Axes(xlCategory) .Chart. cw * 8. cw * 8.Chart. Here are some options: . 8: Write your own macros 405 By setting cw Columns (1) . Insert data series: ch.MaximumScale = 100 . where 1 point = 1172" ~ 1/3 mm.MajorUnit = 5 End With N OW that the basic choices have been made. Y = "Value") With ch.Height Set ch = ActiveSheet. If you want to specify your own axes.ChartObjects.MinimumScale = 0 .e. Add(cw * 3.Aaa Source:= ActiveSheet. Insert graph axes: (X = "Category".Axes(xlValue) . Height we make the graph fit the spreadsheet grid. Width and rh Rows (1) .MinimumScale = 0 .
SeriesCollection (I) . etc.Chart.Chart..MajorUnit ~ 20 · HasTi t1e = True With . 3 = red. Size = 12 .g. 0.Orientation = xlUpward End With End With . green and blue. 4 = green.Size = 16 . as illustrated in section 11.MarkerStyle = xlCircle · Smooth = True . Specify a graph title With ch.MarkerSize = 7 With . If you don't want the legend box: ch.Font.Border · ColorIndex = 7 · Weight = xlHairline .Caption = "time / s" · Font. e. Define the points and line in the graph With ch.MarkerForegroundColorIndex = 1 . 0).MarkerBackgroundColorIndex = xlNone .Font.FontStyle = "Italic" . Advanced Excel for scientific data analysis With ch. Size = 12 End With End With With ch.MaximumScale = 25 .AxisTit1e .MaximumScale = 100 .MinimumScale = 0 ..Clear .Font.LineStyle = xlContinuous End With End With I Do without gridlines: . pure red would be coded by . Font . 2 = reversed (white on black background).Font. .Color = RGB (255.AxisTitle · Caption = "signal/A" · Font.HasTitle = True With .Colorlndex = 4 End With The colors are 1 = black.Axes(xlCategory) .Name = "Times Roman" .406 R.Chart.Legend. 5 = blue. de Levie. giving you 16 color options.Chart.MinimumScale = 0 . Alternatively you can use the RGB system that lets you select any color combination using 256 shades of red.7.ChartTitle · Caption = "Sample Chart #1" .Chart.MajorUnit = 5 .Axes(xlValue) .
Axes(xlValue) . Subscript = True End With 407 . xlSecondary) · HasTi tle = True .Axes(xlValue) . Length:=l) . Define markers for a second data set: With ch.Chart.Chart.SeriesCollection(2) .MarkerBackgroundColorIndex = 8 .Chart.MarkerStyle = xlTriangle · Smooth = True .MarkerForegroundColorIndex = 5 .Caption = "log conc" End With With ch.FontStyle = "Italic" .Add Range("C7:C27") .Chart.ColorIndex = xlNone • Place tickmarks: ch.Axes(xlValue.Chart.MajorTickMark = xlTickMarkCross ch.AxisGroup xl Secondary With ch.Interior.TextBoxes.Top=60 End With .TickLabelPosition = xlTickLabelPositionNextToAxis (and do similarly for xlCategory) Introduce a second data set: ch.Axes(xlCategory) .AxisTitle .Chart.TextBoxes With .Interior.Ch.Chart.Size= 12 · Orientation = xlUpward .Add(164. Add a secondary vertical scale: ch. Length:=4) . 50) .Font.ColorIndex = 2 ch.Characters(Start:=l.Axes(xlValue.Chart.SeriesCollection(2) .AxisTitle.PlotArea.HasMajorGridlines = False ch.MarkerSize = 5 End With .Chart.Chart. 116.HasMajorGridlines = False . Define the background color of the graph: ch.Font · Name = "Times New Roman" · Size = 12 End With .Font.SeriesCo11ection. Length: =1) · Font. 96.AutoSize = True · Text = "K1=3" End With With ch.ChartArea.Characters(Start:=l.Chart.Chart.Axes(xlValue) .Chart.Characters(Start:=2. Add a textbox and specify its text (note that the numerical values are in points) With ch. 8: Write your own macros ch. xlSecondary) .
in which. Sy. 124.14. Finally. Xl. You can do this by placing the following lines at the beginning of the program: 'Suppress screen updating: Application. Or we can ask how many possible combinations there are when we disregard their order.. 12. add error bars. cMax As Integer For c = 1 To cMax For c2 = c + 1 To cMax MsgBox c & " .ScreenUpdating = False 8. A computer is a perfectly suitable tool to sort through this type of problem. Here we will illustrate the latter type.. You can highlight specific points with different markers and/or colors. 123.. X3. 1. de Levie.. Advanced Excel for scientific data analysis and so on. the specific twodigit combinations (from among the four numbers 1 through 4. suppress the screen updating during its execution. etc. in which exchanging indices makes a difference. and we will here use the example of multivariate least squares analysis to illustrate it. Then we can ask in how many ways we can order them. The same logic is used in the following fornext loop Sub Permute () Dim C As Integer. where the latter are suggestively labeled y. Say that we have block of data organized in rows and columns. e.2. 4231. Say that we have four numbers. and 34. X2. Below we will illustrate how this might be approached. for a least squares fit of the yvalues to equation of the type y = ao + alxl + a2X2 + a. as 1234.g. etc. where we may want to see the effect of setting one or more terms a i equal to zero. and disregarding their order) are 12. 14. and further assume that we want to calculate the standard deviation of the fit. This is only a sampler of the many possibilities. There are several types of possible permutations.408 R. once the macro makes the graph you want. cMax = 4 c2 As Integer. 13. every time using the very same four ingredients but only changing their order. as in the problem discussed in section 3. 1243. etc.9 Case study 5: sorting through permutations In various areas of science one encounters the problem of permutation.'x3 + . whatever. but you get the idea. and 4. 24. " & c2 Next c2 Next c End Sub . and merely delete one or more digits: 1. 3. 23.. since this will both clean and speed it up.
Rows.Columns. Compute the output for 1 variable: ccMax = 4 ReDim xArray(l To rMax. the rest is simple. cc As Integer. 1 To 1) As Double ReDim outputArray(l To rMax. 1) Next r Initialize the outputArray For r = 1 To rMax For c = 1 To cMax outputArray(r.Count ReDim yArray(l To rMax. Down As Integer r As Integer.Ch. c3 As Integer c4 As Integer. whereupon we repeat the process for the next permutation. 0) . X3. cMax As Integer ccMax As Integer. Offset (2. and the contents ofthe various columns Xl. Value cMax = Selection. In the custom macro LSPermutel we write the data in the ycolumn into the yArray. Write column headings outputArray (1. We then use the subroutine LLSS (shown in section 9. into xArray. Sub LSPermutel () Dim Dim Dim Dim Dim Dim Dim c As Integer. 0). outputArray .Count rMax = Selection. 1 To cMax) .Offset (rMax + 1. c2 As Integer.Select 1 To ccMax) As Double .. etc. Down = Down + rMax + 1 . 2) = "Sy:" Selection. 1) = "Indices:" outputArray (1. X2. c) Next c Next r Down 0 Selection. 8: Write your own macros 409 Once we have established the principle of using nested fornext loops to do our bidding.Value = outputArray Selection. 1) = inputArray(r.5) to perform the multiparameter least squares analysis and to return the corresponding Sy values. rMax As Integer StDevY As Double inputArray As Variant xArray. yArray. Select Down = Down + 2 . Read and dimension arrays inputArray = Selection. Fill the yArray For r = 1 To rMax yArray(r.
1. 1) = c . c2) xArray(r.1 outputArray(l. 4) Next r Call LLSS(ccMax." & c2 . 2) = StDevY Selection.1 outputArray(l. 0) . " & c2 . 0) .1 & " . 1 ) 1 xArray(r. outputArray (1.410 R. Offset (1. c) inputArray(r. 2) inputArray(r. xArray. 0) . c) Next r Call LLSS(ccMax.1 StDevY) & fI. 1 To ccMax) As Double For c = 2 To cMax For c2 = c + 1 To cMax For r = 1 To rMax xArray (r. yArray. 1) = c . 2) StDevY Selection. StDevY) outputArray(l. 3) inputArray(r.Offset(l.Select Down = Down + 1 Compute the output for 3 variables: ccMax = 6 ReDim xArray(l To rMax. Compute the output for 2 variables: ccMax = 5 ReDim xArray(l To rMax. 0) .Value = outputArray Selection. c3) xArray(r. c) xArray(r. xArray. Value = outputArray Selection.Select Down = Down + 1 . 1.Select Down = Down + 1 Next c Selection. 1) 1 inputArray(r. c2) Next r Call LLSS(ccMax.Offset(l. rMax. rMax. Offset (1. rMax. xArray. 1 To ccMax) As Double For c = 2 To cMax For c2 = c + 1 To cMax For c3 = c2 + 1 To cMax For r = 1 To rMax xArray(r. 1) c .Select Down = Down + 1 Next c2 Next c Selection. 0) . 2) xArray(r. de Levie. Advanced Excel for scientific data analysis For c = 2 To cMax For r = 1 To rMax xArray(r. yArray. 1. Offset (1.1 & ". Value = outputArray Selection. n & c3  1 outputArray (1.Select Down = Down + 1 . 2) = StDevY Selection. 2) = inputArray(r. yArray. 3) inputArray(r. StDevY) outputArray(l. 1) 1 xArray(r.
xArray. 2) inputArray(r.Select Down = Down + 1 Next c4 Next c3 Next c2 Next c Selection. c2) xArray(r. c3) xArray(r. however.1 outputArray(l. Select 411 • Compute the output for 4 variables: ccMax = 7 ReDim xArray(l To rMax. . Offset (1. that this differs from the Excel convention for making graphs.Value = outputArray Selection. StDevY) outputArray(1. and use the latter columns for making the graph. 8: Writeyourown macros Next c3 Next c2 Next c Selection. 0) . and immediately thereafter move it back to where it is required for the custom macro.1 & " r n & c3 .and ycolumns separately before calling the Chart Wizard or inside it.Select End Sub Here. Note.Ch. rMax. 5) = inputArray(r. Simple ways around it are (1) to make a copy of the ycolumn somewhere to the right of the xcolumn. which assumes that the leftmost column specifies the abscissa (horizontal axis) normally associated with an independent variable. 2) = StDevY Selection. 4) = inputArray(r. c) xArray(r. as in all the custom least squares macros in the MacroBundle. 1) = 1 xArray(r. 1. 0) ." & c2 . yArray. This convention is used because the usual least squares analysis involves only one dependent variable y but has no limit on the number of independent variables x. 3) = inputArray(r. or (3) to specify the x. (2) to move the ycolumn by cutting & pasting before making a graph. c4) Next r Call LLSS(ccMax. Down = Down + 1 0). Offset (1.1 & ". 1) = c . Offset (Down. we assume that the leftmost column in the highlighted block of input data is the dependent variable. 1 To ccMax) As Double For c = 2 To cMax For c2 = c + 1 To cMax For c3 = c2 + 1 To cMax For c4 = c3 + 1 To cMax For r = 1 To rMax xArray(r.1 & 1t r n & c4 .
but not in Excel 5 or Excel 95. we have used a quickanddirty output.412 R.2 and 11. When macros are used only occasionally. 8. and wipes out all data that happen to be in its path.5 contains a cleanedup version. Yet. one merely doubleclicks on them. disks get lost. A quickanddirty approach is still on display in section 9. Section 9.10 Case study 6: raising the bar Accompanying this book is the MacroBundle. the standard facilities of Excel are optimal: the custom macros do not usurp monitor 'real estate' but are listed in the macro dialog box accessible with Tools ~ Macro ~ Macros. or singleclicks followed by Run. The present section is not about those macros. instead single . These installation procedures apply to all custom macros. but in printed form the longevity of the material in chapters 911 is at least linked to that in chapters 1 through 8. de Levie. they are also printed in chapters 9 through 11. It is quick because it doesn't take much thought to encode it. it is often useful to get a working macro first. where we merely move the highlighted block down each time we have a line to write. even to readers who cannot afford to buy this book.oupusaladvancedexcel.5 for the treatment of the points near the extremes of the data set. These procedures will work in Excel 97 and more recent versions. and magnetic and optical formats change. Advanced Excel for scientific data analysis At the other end of the macro. and to polish it up later. and then displaying them. with more bells and whistles. and once you understand how this is done you can readily modify the macros in section 11. for your convenience as well as for the sake of permanence: web sites can and do disappear. A more compact treatment can be based on Gram polynomials. Having them available on the internet makes them most widely accessible. Still. LSPermute. but about installing them on your machine in an efficient way. something that neither the printed page nor an enclosed compact disk or floppy will allow. for up to 6 variables. The web also makes it possible to correct and upgrade them when necessary. If in doubt about their operation. and dirty because it takes more space than is needed for the actual output data. Nothing is permanent. because it writes each line sequentially rather than computing all results first. It also takes more time to display its results. To operate them. not just those of the MacroBundle. a collection of custom macros that can be downloaded freely from the web site www.3 to include your own creations.
Controls (" &CustomMacros") .10. The advantage of this approach is that it facilitates access to the custom macros without consuming valuable spreadsheet space. The code shown below will roughly halve the number of mouse clicks needed for calling custom macros. and after reading them you can get back to the spreadsheet with AltvF 11 (Mac: OptvF 11) and then call the macro into action. Caption = "&Sol verAid " .Add(Type:=msoControlPopup. 8: Writeyourown macros 413 click on them and then click Edit. Create a menu item for LS Set Menu2 = MenuM. Delete . Locate the new menu item between Tools and Data on the Menu bar Set MenuN = CommandBars(l) . Caption = " &LS " . Before:=MenuN.Add(Type:=msoControlPopup) Menu2. because the standard toolbar has plenty of unused.Caption = "&CustomMacros" .FindControl(ID:=30011) Set MenuM = CommandBars(l) .Controls. 8. even on a small monitor screen. which will get you to the top of the macro text in the Visual Basic Editor. Sub InsertMenuM () Dim MenuN As CommandBarControl. Temporary:=True) MenuM.1 Adding a menu item As our example we will add a menu item called CustomMacros to the standard toolbar. Create a menu item for Sol verAid Set Menul = MenuM. Here we will illustrate both approaches.Controls. or by adding a special toolbar to display them. and give it a submenu listing several macros.Index. Create submenus for LSO and LSI respectively . Delete possible earlier menu insertions to prevent conflicts On Error Resume Next CommandBars (1) .0nAction = "Sol verAid" End With . extra space available. Menu21 As CommandBarControl Dim MenuM As CommandBarPopup. Menul As CommandBarContrQl Dim Menu20 As CommandBarControl. Menu2 As CommandBarPopup .Controls. more convenient ways to access them can be provided by embedding them as a menu item in the standard toolbar.Add(Type:=msoControlButton) With Menul . If you need to use the macros frequently.Ch. The user instructions should be there.
as can be seen in section 11. but also (and often faster. since your fmgers do not have to leave the keyboard to grab the mouse) with AltvS or IS for SolverAid. thereby defining the shortcut hotkeys. remove the inserted menu item with Sub RemoveMenuM() On Error Resume Next CommandBars (1) . this trick will not work if your spreadsheet is set to Break on All Errors.Controls. However. which bypasses any errors encountered by the delete instruction.Add(Type:=msoControlButton) With Menu21 · Caption = "LS&l" · QnAction = "LSI" End With End Sub Notes: (1) In order to avoid inserting multiple copies. AI~. You can then select the macro not only by clicking on the toolbar button. Delete End Sub The MacroBundle can be configured to use this approach. the above macro already does the latter. In fact. A popup menu shows an arrow. we start with deleting preexisting versions of the added menu item. by setting the Error Trapping to Break on Unhandled Errors instead. (4) The ampersand & in the captions is used to indicate that the next letter should be underlined. with the instruction Temporary: =True in Set MenuM. etc. Advanced Excel for scientific data analysis Set Menu20 = Menu2.414 R.3.Add(Type:=msoControlButton) With Menu20 · Caption = "LS&O" .Controls ("&CustomMacros") . de Levie. (3) The difference between a Button and a Popup is that the latter refers to further choices: you cannot select a macro LS. (2) It is also possible to insert the toolbar automatically every time Excel is opened. and to delete it as part of closing Excel down. . but only the specific choices LSO or LS 1. so we add the dodge On Error Resume Next. You can undo this in the Visual Basic Editor under Tools => Options. However. General tab.000ction = "LSO" End With Set Menu21 Menu2.LuO or ILO for LSO. (6) When you are done experimenting with this macro.Controls. (5) The OnAction instruction does the actual work of calling the macro when you click on that particular menu item. we will then get an error when the menu bar is in its original form.
Create a control button for LS Set Button2 = CommandBars("M") .OnAction = "LSO" End With Set Button2l With Button21 Button2. On Error Resume Next CommandBars("M") . Create submenus for LSO and LSI respectively Set Button20 = Button2. Create a commandbar Set TEar CommandBars.OnAction = "SolverAid" End With . Delete earlier version of M.BeginGroup = True End With . even faster access to the custom macros can be provided by creating a new toolbar to display them.Ch.Caption = "LS&O" .name = "M" · Posi tion = msoBarTop · Visible = True End With .Add(Type:=msoControlButton) With Buttonl · Caption = "&Sol verAid " .Add With TEar . 8: Write your own macros 415 8.Controls.Controls. Button2. Sub InsertToolbarM() Dim TBar As CommandBar Dim Buttonl. .Add(Type:=msoControlButton) Button2l if it exists. TooltipText = "Highlight array" & Chr (13) & "before pressing" & Chr(l3) & "LSO or LSI" . In order to emphasize the similarities between these two approaches.Style = msoButtonCaption . Button20.. .Controls.1.10.Add(Type:=msoControlPopup) With Button2 · Caption = n &LS .Controls.2 Adding a toolbar At the expense of occupying a row on your monitor screen. we will here illustrate adding a toolbar for the very same macros used in section 8.9. to prevent conflicts .Add(Type:=msoControlButton) With Button20 .Style = msoButtonCaption . Create a control button for SolverAid Set Buttonl = CommandBars("M") .Delete .
we could insert the toolbar automatically when Excel is opened. using On Error Resume Next to avoid problems in case the added toolbar did not already exist. (6) To remove the added toolbaruse Sub RemoveToolbarM() On Error Resume Next ComrnandBars(nM n ) .) Alternatively. This feature is especially useful when you need to check the order in which various arguments should be entered. General tab. as illustrated in section 11. and set the Error Trapping to Break on Unhandled Errors.11. and let it be deleted just before closing Excel. and which ones should be left undefined but given explicit .Style = msoButtonCaption . 8. and waiting a few seconds.000ction = "LSI" End With End Sub Notes: (1) You will notice that this macro mirrors many aspects ofInsertMenuMO. You can then choose from that list by doubleclicking. (2) Calling for the installation of an already existing toolbar produces an error. or an equal sign in an assignment. the approach illustrated here can readily be extended.Delete End Sub Again. (Again. Advanced Excel for scientific data analysis . We therefore delete a preexisting toolbar first. methods only act on specific properties. (5) While it is technically possible to have an inserted menu in the menu bar and an extra toolbar (since they work independently of each other). such overabundance provides no additional benefits. likewise. but you can get a listing of them displayed on the Visual Basic Editor screen by typing a period behind a property or method. this will not work ifyour spreadsheet is set to Break on All Errors. For every property there is a list of the specific methods you can invoke. In that case go to the Visual Basic Editor and.1 Editing tools Visual Basic distinguishes between properties and methods.416 R.2. (3) ToolTips repeat the caption unless you specify them otherwise. de Levie. under Tools => Options. Caption = "LS&ln . (4) The instruction BeginGroup = True merely inserts a vertical separator in the toolbar.11 Tools/or macro writing 8. The available options are not always obvious.
in which you can specify the name and a short description of the macro you are going to record./vevh) orCtrlvJ. In order to finish a word. and the file where to store it. but it will at least get you going. The macro recorder lets Excel show you how to do it. lueuq) or CtrluI."Label") when we do not want to The same list will appear when you select Edit :::::. Just click OK and start the Excel operations for which you want to know the VBA code. the Macro Recorder can show you a way to do it. 8: Writeyourown macros space. It records all keyboard and mouse actions in VBA notation. When you are done. call the macro recorder (with Tools ~ Macro ~ Record New Macro). as in MgsBox ("Message". find the latest Macron (they are numbered sequentially). The proper arguments are displayed with Edit:::::. 8. This will produce a Record Macro dialog box.Ch. use Edit :::::. It may not be the optimal way. You can now see how Excel codes these instructions. All these are optional: the macro will be given a default name and stored in a module.> Macros. lueuw) or Ctrluspace. Macros can do many more things than you can do on the spreadsheet alone. it will show a list from which you can choose./ueum) or CtrluShiftvl.11.> Complete Word (Altvevw.> List Constants (Altueus. luevs) or CtrlvShiftvJ. a shortcut key combination. modify the OK button.> List Properties! Methods (Altvevh. A list of constants for a particular property is shown with Edit:::::. Say that you don't know how to specify in VBA the color of text.> Macro:::::. In order to record. go to Tools => Macro => Record New Macro. even though you know how to do it in Excel.2 The macro recorder The Macro Recorder records keystroke sequences. Then select Tools ~ Macro :::::.> Parameter Info (Altveum. and Edit it. so that repetitive operations can be automated. 417 . In order to show the proper syntax for an instruction. type Edit:::::. and can therefore be useful when you are writing a macro and have no idea how Excel codes some particular action. press the Stop Recording button or select Tools:::::.> Stop Recording. but as long as what you need falls in the latter category. using the Font Color icon on your formula toolbar or drawing toolbar. On the spreadsheet. If Excel knows more than one possible word ending. and you can always improve on it later. go to the . point to a cell.> Quick Info (Altveuq.
with Debug ~ Compile VBAProject on the Visual Basic Editor menu bar. the absence of underlining. Before you shoot. etc. Advanced Excel for scientific data analysis Font Color icon. . and can identify many typos. then stop the macro recorder.418 R. This is helpful because it localizes the problem. You may find the instruction Selection. during compilation. just disregard it: since the MacroRecorder did not know what specific information you were looking for. it simply shows you everything. and voila. with a single click select the justmade macro. click on Edit. de Levie. ColorIndex = 3.) Instead we will focus on faulty coding. but you can also test for them without leaving the Visual Basic Editor module. The Visual Basic Editor will often alert you that a line you just typed contains a syntax error. which you can incorporate into your own macro. or upon running the function or subroutine. but this you typically find out only while testing or applying the program. Murphy or one of his friends will most likely find it for you. Now go to Tools ~ Macro ~ Macros. The most important of these is faulty logic. and (3) you check whether the necessary caps appear magically as you press Enter at the end of each line. such as the font used. Some errors the Visual Basic Editor only recognizes upon compiling the code. its size. Don't be discouraged by such informational overload. This instruction may be accompanied by other. and the rules governing VBA are sufficiently impenetrable to keep it that way. Consequently you may spend at least as much time troubleshooting as you will spend writing code in the first place. Font. Using Option Explicit also operates at the singleline level. There are several possible sources of trouble. select your color. You typically find out about them when you try to run the program. as long as (1) you dimension variable names to contain capitals. peripheral information. it is helpful to look at the trouble you are aiming at. 8. and there is little guidance this book can provide to avoid it.12 Troubleshooting Writing functions and macros is not trivial. (2) you type your code only in lowercase (except for quotations). (If you don't. a potential sources of trouble that can show up at various times: after writing individual lines. you see how Excel does its own coding.
Excel contains many aids to troubleshoot VBA code. because it is difficult to imagine what you (or some other user of your macro) might try to put into an input box at some unguarded moment. and can therefore be caught only by extensive testing over a wide range of input parameters. The Visual Basic Editor also contains additional troubleshooting tools. Extensive documentation may help to prevent them. because this will generate as many message boxes as there are repeats in the loop. In recent versions of Excel this will create a yellow ToolTiplike message listing the value of that variable at the moment the procedure bombed. and at the same time report crucial intermediate results. read (and note down) that message. try not to place such message boxes inside loops. A second. say. a variable value was not specified. a method we have not used here in order to keep matters simple. which can be helpful in writing and debugging macros. etc. Now that you know what kinds of trouble you are shooting. because you can let the program report which choices it has made. manually. often caused by parameter values that make the program divide by zero. In this book more emphasis has been given to the ideas behind the various methods. and every one of them will have to be acknowledged. HartDavis. These depend on the input parameters.Ch. and to applying them appropriately. and displays a box with an error message. back to the main topic. etc. then click Debug. 8: Writeyourown macros 419 Then there are the runtime errors. simple method is to insert temporary message boxes at crucial points in the code to see which hurdles are taken. For these see chapter 3 in G. Finally. . text where numbers or ranges are expected. but only for those willing to read it. triggered by entering. Perhaps the best way to avoid this type of entry errors is to use dialog boxes. etc. take the logarithm or the square root of a negative number. For yours truly the simplest of these are the most useful: whenever a test run fails. However. which you write down before moving to the next variable. and (before fixing anything) place the mouse pointer on a variable at or close to where the Visual Basic Editor highlights the code. This can be very helpful in programs that allow for various routes. This will often allow you to identify what went wrong: perhaps an index fell outside its allotted range. These are perhaps the hardest errors to anticipate. such as its Code and Immediate windows. than to the prevention of fool's errors and the generation of 'industrialstrength' macros. there are the fool's errors.
While writing a new macro. Exit with OK and Run. 8.. Click on Options. 1999. and from then on the button will call it.13 Summary This chapter has focused on the actual writing of macros. Microsoft Excel 2000 Power Programming with VBA. Most owners of Excel do not realize that they have a tiger in their tank. etc. to some other purpose. By erasing the shortcut key. depress the lefthand mouse key. Click on View => Toolbars :::::> Forms. The same properties will allow you to change its label (with Edit Text) and its appearance (with Format Control). Advanced Excel for scientific data analysis Mastering VBA6. de Levie. Finally a practical hint. in order to check the proper functioning of what you have written. i. This is one ofa few cases where Excel is casesensitive. Walkenbach. which will display a set of tool icons. Repeatedly going through the sequence Tools => Macro :::::> Macros etc. the button icon shows that it is activated. It is one of the bestkept secrets of Excel that you are not restricted to the functionalities provided.420 R. in which case it may be helpful to make a button to activate a macro.Sybex1999. and assign a lowercase letter or capital as Shortcut key. by rightclicking on the button (to get its properties) and selecting Assign Macro. which has the shape of a raised rectangle. and while debugging it. and the Assign Macro dialog box will appear. Excel will revert to its default assignment. To do this. singleclick on the macro name. An alternative to using a button is to assign a shortcut keycode to a macro. Release the mouse key. Here is how you can do that. will soon lose its charm. IDO Books. (But you could assign CtrluShiftuc. If you have not yet saved the macro. Click on the button icon. Click on the macro name. CtrluC. Move the pointer (which now shows as a thin +sign) to where you want the button.) Use the same procedure to reassign a shortcut key. and/or chapter 7 in 1. when you then move the pointer away. you will need to exercise the macro often. but can make it do your own bidding. a powerful programming language that can readily be learned simply by following. Thereafter. Note: In earlier incarnations of Excel the button icon was part of the Drawing toolbar. and draw a rectangle in the shape of the desired button. in the Macro dialog box.e. Make sure not to redefine common keystroke combinations you may want to keep. the macro is callable with Ctrlun or Ctrl~ where n or N is the shortcut key. such as Ctrluc. Start with the spreadsheet. you can do the assignment later. bor . select Tools:::::> Macro :::::> Macros and.
This is not to suggest that the MacroBundle is only useful as a set of examples. In terms of availability. With Excel and VBA. and the most critical aspect of macros is that they work correctly.g.. i. They are somewhat oldfashioned.g. which is how it should be. in that they use subroutines only where that has a clear advantage. and trying out examples such as those provided here and in the MacroBundle. to both aspiring and professional practitioners of science.Ch. e. At any rate. efficiently written code could save substantial amounts of execution time. modifying. We can therefore leave it to the computer professionals to fret over code efficiency and. give reliable answers. The functions and macros in the present book were mostly written by the author. and boasted all of 32 KB of memory (of which the top 4 KB were reserved for accessories such as the paper tape punch and reader). when two or more macros can exploit the same subroutine. that is eminently feasible even when. and by having a few good books on VBA programming handy to consult in case you run into trouble.. and the time needed for elementary computer steps is now measured in nanoseconds rather than microseconds. and ease of use. when yours truly bought his first laboratory computer in 1974. and use them as you might the Macro Recorder: they will get you going. writing numerically efficient code is no longer as important as it was. yes. good for cutting and pasting as you assemble your own functions and macros.. and therefore have a particular style and format.e. and after you have written several macros yourself you will find your own style and format. don't worry. With today's computer speed and abundant working memory. elegance. Consequently. almost all computer tasks described in this book can be performed within a fraction of a second. e. modular fashion. but it also provides a set of convenient tools that make scientific data analysis much more accessible. spreadsheets are without peer. It can serve that purpose. ease of learning. as scientists. 8: Write your own macros 421 rowing. and the tools in the MacroBundle (plus the tools you will make yourself) can make the spreadsheet into a quite powerful computational aid. a PDP11 that occupied the space of a clothes cabinet. Recent personal computers are likely to have gigabytes rather than kilobytes of memory. our main focus is elsewhere. . Then. You may want to write your own macros in a more modem.
Que. T. For earlier versions of Excel you may need an older book. A. 2nd ed. Vetterling. Microsoft Excel2000 Power Programming with VBA. 1999. Cambridge University Press 1991. Using Excel Visual Basic. The original version. as well as an update for parallel computing with Fortran90. Numerical Recipes. 1996. for Fortran77. 1986 edition) was machinetranslated into Basic. Flannery. C.14 Forfurther reading There are many books that can help you find your way in VBAExceL If you have Exce12000. . Pascal. Another excellent source of ideas and help is Mastering VBA6 by Guy HartDavis. Advanced Excelfor scientific data analysis 8. such as John Webb. de Levie. H. a good guide is John Walkenbach. It can be found on the diskette that accompanies 1. Press. and is directly usable in Excel. now in its second edition (1992). and should be read for its very clear explanations. A complete set of the software routines (accompanying the first. Cambridge University Press. P. S. IDG Books. Sybex 1999. C and C++. Sprott.422 R. B. For the scientific part of macro writing an excellent source is Numerical Recipes. comes closest to Basic as used in VBA. the Art ofScientific Computing by W. Routines and Examples in Basic. which comes in several flavors: Fortran. Teukolsky & W.
Since these functions and macros were all written or modified by the author.g. when two or more macros can exploit the same. modular fashion. The macros are listed here roughly in the order in which they appear in this book. Specifically. substantial stretch of code. covers Fourier transformation and related techniques. The present chapter contains the macros that deal with least squares analysis.Chapter 9 Macros for least squares & for the propagation of imprecision This is the first of three chapters that list the custom functions and macros used in this book. web sites as well as magnetic and optical media formats tend to go out of fashion. such as convolution. These functions and macros can be downloaded freely from the web site www. something the printed page does not allow. Still. deconvolution. in that they only use subroutines where that has a clear advantage. and the only way to ensure that these programs are available to the readers of this book is to incorporate them in the printed text. the second. e. The second group. which is the easiest way to incorporate them into your spreadsheet. don't worry: after you have written several macros yourself you will find your own style and format. including the terms and conditions for using and distributing these macros. they have a particular style and format. You may want to write your own macros in a more modem. which is how it should be.. and Gabor transformation. and with the propagation of uncertainty. Moreover.oupusa. they are somewhat oldfashioned.org/advancedexcel. and the toolbar and menu installation tools to make them conveniently available on the spreadsheet. this is the easiest way to browse for parts that you may want to incorporate into your own creations. . in chapter 10. that they have sufficient clarity and documentation to be usable by others. Corrections and additions may also appear in the above web site. while chapter 11 contains a miscellany of macros and functions that do not fit the above two categories. At any rate. The most critical aspect of macros is that they work properly for you.
such a relatively narrow strip of code can be placed conveniently alongside the spreadsheet. ESLfixed and ELSauto. you may share these macros with others. When shown in their Excel modules. and the comment lines will be shown in green (or whatever color you have assigned to comment lines). ForwardFT and InverseFT. will be appreciated. When these macros are used in studies that are published. If these codes are read by optical character recognizers and then entered into an Excel module. preferably with reference to this book and! or its web site. by its professional practitioners as well as by students and teachers. But that gives them the benefit that.1 General comments (1) Many macros (such as in LSO and LSI. Likewise. In order to squeeze the macros on these pages without using an illegibly small or narrow font. and MapperO through Mapper3) have used small driver macros to set one parameter in order to get multiple uses from a common subroutine. . comments will typically be displayed in green. 9. They are meant to be freely used in science and engineering. advertisement. The first time you use a macro. or whenever you want to refresh your memory about its function or operation. Moreover. as long as their use is not linked to any commercial activity. using a split screen. subject to the same restriction to noncommercial use. or promotion.424 R. The copyright gives the individual user explicit freedom to use and copy these macros. de Levie. as long as the entire warranty and copyright notice is included. in the style of section headings. and (2) as examples of how to write scientific macros. read the comments in its heading. The macros are presented here for two applications: (1) as directly useful tools to solve common scientific data analysis problems. and more terse (often singleline) comment lines throughout its text to indicate the function of the next block of code. They have been copyrighted in order to protect them against commercial exploitation. the Visual Basic Editor will ignore these subtleties. for the sake of better readability (within the short lines of this book) we have boldfaced the actual code. Advanced Excel for scientific data analysis Each macro contains extensive explanatory comments in its heading. or any parts thereof. more lines of code have been broken up than one would normally do in writing them. acknowledgment of their use. including sale. making them readily identifiable as such. In these final three chapters (printed in monochrome) we have printed all comments in italics.
it is essential that the macro restore the input array to its original form. If you want more safeguards. If . (5) In order to use this method (of highlighting the input data).Ch. This briefly lists its purpose. and any required subroutines. the user can still use an input box to specify the input data block. these macros do not contain professionalstrength input checks. or overwriting existing data.g. (9) Before first use of one of these macros. After executing its instructions but before ending.. so that they can be used in conjunction with Option Explicit. (4) Where convenient we have used the simple input method of highlighting a block of data and subsequently calling the macro. such as entering input arrays of incorrect size or dimension. otherwise formulas might be replaced by numerical results. e. the macro will automatically reset screen updating and display the results. This is why those macros start by copying both the data and the formulas of the highlighted region. even when the macro is called first. entering letters where they expect numbers. because different versions of Excel treat them differently. or learn to create your own dialog boxes. (3) Where literature routines are used. as with matrix inversion and fast Fourier transformation. 9: Macrosjor least squares &forthepropagation ojuncertainty 425 (2) All macros shown here are dimensioned. its input and output formats.. read the information in the comment lines immediately following its header.g. and end by returning the formulas to the spreadsheet. and consequently may give trouble in some versions when fully dimensioned. Some parameters (specifically strings. (6) Because of an early decision to keep macro writing simple by avoiding dialog box construction. (7) Macros that return data directly to the spreadsheet suppress screen updating during the computations. while some more general subroutines (e. This is indicated in the little yellow notes that appear when the pointer dwells on the corresponding MacroBundle Toolbar menu items. Prevention is built in only for some of the most likely user errors. make sure to convert them to double precision by proper dimensioning. such as Answer) are dimensioned by name only. You can readily trip them up by. (8) Subroutines and functions that are used mostly with one or a few macros are typically shown in combination. However. those for matrix operations) are collected at the end of this chapter. put them in. and are in no way foolproof.
the dependent input variables Yi are collected in the vector Y. fitting data to an arbitrary mono. de Levie. R. The covariance matrix can be used with the Propagation macro (see section 9. as recommended.426 R. 1981.g.or multivariate function with linear coefficients.10) to evaluate the propagation of uncertainty in derived results. LSO for when the fitted curve must pass through the origin. either because the data were computed according to some formula.1) where the superscript T denotes a transposition. The specific nomenclature used is explained in blocks of macro comments. Smith. e.2 LS The macros LSO and LS 1 serve as the workhorses of least squares analysis in chapters 2 and 3 of this book. In short. the covariance matrix. LS has two distinct forms. The covariance matrix V is calculated from (9. (10) It is also recommended that a firsttime user of a macro tests its performance first on a data set for which the answer is known. . and the superscript 1 a matrix inversion. LS I for when there is no such constraint. Many lines of code are devoted to prevent overwriting spreadsheet data and other standard precautions. N ew York 1966. and (optionally) the linear correlation coefficient matrix. Applied Regression Analysis. Such a test provides familiarity with the macro. The coefficients are then found in the vector b where (9. They provide the fitted parameters with their standard deviations.2) where N is the number of data points Yi. Draper & H. and the independent variable(s) Xi in the matrix X. and may also indicate its capabilities and limitations. Wiley. 9. or because it has already been analyzed by a similar analysis elsewhere.. which also contains as its first column a set of zeros (for LSO) or ones (for LSI). in chapter 2 ofN. all necessary subroutines will be available. Advanced Excel for scientific data analysis you have loaded the entire MacroBundle.2. The algorithm used is based on the standard statistical matrix formalism as discussed. and P the number of coefficients used in the fitting.2.
and call LSl or LSO. p. In order to start the process. arranged as follows. while p = 0 forces the fi t to pass through the origin. their standard deviations. i. I I " " . The macros set the input parameter p for the subroutine LeastSquares: p = 1 causes a general unweighted least squares fit to the data. " '1' . I""""""""""""""""""""".(c) t. r""""""" The function of the following two drivers is merely to set the value of one parameter. SUBROUTINES: This macro requires the subroutines Multiply. the standard deviation of the fit. Labels are provided when space for them is available. make sure that the output area (two lines below the input data block) is free of valuable data. The first column must contain the dependent variabley.. The second (and any subsequent) column (s) must contain the independent variable (s) x.1\"""" I I • I I I I I I 1'" I I I 1 I IIII IIIIIIIIIIIIIIIAA~AAAAA~AAAAAAAAAAAAAAAAAA~IIIIIIIIIII "" 11 1 1 "" I.""'". I'" "'11 I"'" 1 I'" l "'" l. . " . and the covariance matrix (except for fitting to a proportionality. " t " " " " " " " .A I I I 1 I LINEAR LEAST SQUARES A 1 . which involves only one coefficient) plus (optionally) the matrix of linear correlation coefficients. D de Levie 2002 Oct_~. 9: MacrosJor least squares &forthepropagation oJuncertainty 427 l' . I. equal to either one or ." r. . highlight the entire input data block. 1 PROCEDURE: Before calling the macro. it assumes that y = 0 for x = o. PURPOSE 1 1 The macros LSl and LSO compute the parameters and their standard deviations for an unweighted least squares fit to data in 2 or more columns. and INPUT: The input data must be organized in columns. e. Transpose 1 Invert. 1 OUTPUT: The macro provides the fitting coefficients.Ch. v2. r. .
i As Integer. "Least Squares Fit") . I" I I"" I.Rows. II' 1".Count cMax = Selection.428 R. Root As Double. I I I r. in order to choose between a general least squares fitting (p = 1) or one that forces the curve through the origin (p = 0) . Advanced Excel for scientific data analysis 1 1 zero. AC.Count u = 1 If area was not highlighted If rMax = 1 And cMax = 1 Then hAnswer = MsgBox("You forgot to highlight" & Chr (13) & "the block of input data. qArray xtArray. de Levie. I I I" I' I I". SSR As Double Dim StDevY As Double. for an unweighted least squares fit through the origin Sub LSO () 1 Dim p As Double p = 0 Call LeastSquares(p) End Sub """"""""""""""""""""""""""""" . ytyArray Determination of the array size: Begin: rMax ~ Selection. DimbArray. DimXArray. hAnswer. vArray As Variant Dim vOArray As Variant Dim rnyRange As Range DimAA. N As Integer Dim rMaxAs Integer." & Chr(13) & "Do you want to do so now?" . M2. '" I'" t I ' I I t" Sub LeastSquares(p) Dim cMaxAs Integer. Answer. M As Integer. ytArray. outputArrayAs Variant Dim lccArrayAs Variant.Columns. ?". YArray. j As Integer Dim j j As Integer. vbYesNo. piArray. Dim Ml. Sub LSI () 1 for a general unweighted least squares fit Dim p As Double p = 1 Call LeastSquares(p) End Sub I. u As Double. btqArray pArray. jAnswer btArray. varY As Double Dim DataArrayAs Variant. I r.
points is sufficient to define the problem: If rMax < cMax Then MsgBox "There must be at least" & cMax & " input" Chr (13) & " data to define the problem. "Least Squares Fit" End End I f 429 Check that rmax > cmax. Type:=8)myRange. Dimension the arrays: ReDim YArray(l To rMax. 1 To cMax . Selection." & Chr(13) & "one for Y. 1 To cMax) As Double ReDim piArray(l To cMax. "Least Squares Fit" End End I f & . 1 To cMax) As Double ReDim qArray(l To cMax. 1 To rMax) As Double ReDim pArray(l To cMax.". 1 To rMax) As Double ReDimvArray(l To cMax.1 + p) As Double ReDim lccArray (1 To cMax. 1 To 1) As Double ReDimbArray(l To CMax." . 1 To 1) As Double ReDimXArray(l To rMax. 1 To cMax) As Double . 1 To cMax) As Double ReDim ytArray(l To 1. 1 To rMax) As Double ReDim ytyArray(l To 1. InputBox(Prompt:= "The input data are located in:". Read the dataArray. 1 To 1) As Double ReDim xtArray(l To CMax. 1 To cMax) As Double ReDim btqArray(l To 1.Select End I f GoTo Begin End I f Check that the number of columns is at least 2: If CMax < 2 Then MsgBox "There must be at least two columns.Ch. 1) . 1 To cMax) As Double ReDim vOArray(l To cMax . 1 To 1) As Double ReDim M1 (1 To rMax. 1 To 1) As Double ReDim btArray(l To 1. 1) = DataArray(i. DataArray = then fill yArray and xArray. so that the number of data .1 + p. 1 To rMax) As Double ReDimM2(1 To rMax. Value For i = 1 To rMax YArray(i. and one or more for X. 9: Macrosfor least squares &forthe propagation ofuncertainty If hAnswer = vbNo Then End If hAnswer = vbYes Then Set myRange = Application.
j» Then MsgBox "Xvalue (s) missing". qArray) qArray. and" or i indicate inversion The various arrays and their dimensions (rows. rMax. Multiply(piArray. = pArray cmax) (X' X) " = piArray cmax. 1) qArray X' Y cmax. Advanced Excel for scientific data analysis 1 To rMax If IsEmpty(DataArray(i. piArray) Multiply(xtArray. cMax. 1» Then MsgBox "Yvalue(s) missing". where I or t denote transposi tion. cMax. j) Next i Next j I I Compute b = (X' X) " X' Y. 1) b = bArray cmax. rMax. cMax. Transpose(XArray. j) = DataArray(i. Multiply(xtArray. 1. rMax. cmax) X' X cmax. columns) are: 1) rmax r Y =yArray cmax) xArray rmax r X rmax) X' = xtArray cmax. 1) = CDbl(p) Next i For j = 2 To cMax For i = 1 To rMax XArray(i. xtArray) XArray. 1. "Least Squares Fit" End End If Next i For j = 2 To cMax For i = 1 To rMax If IsEmpty(DataArray(i. "Least Squares Fit" End End I f Next i Next j I I I For i Fill the first column of xArray with zeroes (for p = or ones (for p = 1). the rest with the data in the xcolumn (s) 0) For i = 1 To rMax XArray(i. cMax. cMax. bArray) I Call Call Call Call Call I Check against overwriting spreadsheet data 0 M = If (p For i o And cMax 1 To 3 2) Then . pArray) YArray. cMax.430 Next R. Invert (pArray. cMax. de Levie.
Ch. " & "Can they be overwritten?".b I X' Y. cMax.Offset(l. Select outputArray = Selection. 1) cMax . ytArray) Transpose (bArray. Select outputArray = Selection. 1. YArray. 0) . the covariance matrix. "Overwrite?") If Answer = vbNo Then End Else For i = 1 To 2 + P + cMax Selection. "Overwrite?") If Answer = vbNo Then End End If I The additional arrays and their dimensions (rows. btqArray) I Cal cula te SSR = Y I Y . I SSR = ytyArray(l. qArray. 1.Offset(2 . rMax.btqArray(l. 1. 1. Value For j = 1 To cMax If IsEmpty(outputArray(rMax.Offset(l. 1) varY = SSR / (rMax  . 0) . rmax) Y' ( 1. Value For j = 1 To cMax If IsEmpty(outputArray(rMax. 1. as V = (X X)" times varY. cMax. " & "Can they be overwritten?". 1) = ytyArray Y' Y ( b' = btArray 1. Select If M > 0 Then Answer = MsgBox ("There are data in the " & "three lines below the" & Chr(13) & "input data array. 0) .Offset(3. j») Then M = M Else M = M + 1 End If Next j Next i Selection. vbYesNo. 0) .p . 9: Macros/or least squares & j!o1rpropagation a/uncertainty 431 Selection. Select If M > 0 Then Answer = MsgBox ("There are data in the " & 2 + P + cMax &" lines below the" & Chr (13) & "input data array. b ' X' Y = btqArray 1) Call Call Call Call Transpose (YArray.p + 1) . rMax. 1. cmax) ( 1.cMax. and vArray. vbYesNo. then the variance of y as varY = SSR/ (rMaxcMaxp+1). columns) are: ( ytArray 1. ytyArray) Multiply (btArray. j» Then M = M Else M = M + 1 End I f Next j Next i Selection. btArray) Multiply(ytArray.
0) .Select Next j ActiveCel1.Font.1 To cMax ActiveCell. Select Next j ActiveCell. Prepare the output format For j . Select If P = 1 Then If (IsEmpty(ActiveCell) Or ActiveCell.lta1ic = True ActiveCell.Bold = True ActiveCell.Offset(rMax.Value Then "Coeff:") . 0) . p) .Font.ltalic = True ActiveCell.Offset(1. 2.0ffset(1. 1) .Offset(O.Font.432 R.Offset(O.Offset(1.Font. Select Next i ActiveCell.ScreenUpdating = False ActiveCell. 1) (AC = "A" And P = 1) Then GoTo NoLabel ActiveCell. cMax) .cMax. Advanced Excel for scientific data analysis StDevY Sqr(Abs(varY» For i = 1 To cMax For j = 1 To cMax vArray(i. 1) . Select (p = 0 And cMax = 2) Then For i 1 To 2 For j = 1 To cMax ActiveCell.Font.Offset(3.p .Offset(O. Prepare the output labels.Bold = False ActiveCell. 0) .Bold = False ActiveCell.Address Mid(AA. 1) . cMax) .Offset(O. Select End If . suppressing them when space for them is unavailable or data will be overwritten = = AA AC If ActiveCell. = de Levie. Select Next j ActiveCell. j) Application. cMax) .Font. Select Next i ActiveCell. j) = varY Next j Next i * piArray(i.Offset(2 . Select .ltalic = True ActiveCell. Select Else If For i = 1 To 1 + P + cMax For j = 1 To cMax ActiveCell.
Italic = True · Font. Select If P = 0 And cMax = 2 Then GoTo NoLabel ActiveCell.Offset(3.Value Then GoTo Step2 Else "StDev:" ) ActiveCell.Offset(l.Ch.Offset(l. p) .Offset(l.Value GoTo Step3 Else "CM:") Then ActiveCell. 0) . Select If p = 1 Then If (IsEmpty(ActiveCell) Or ActiveCell. Bold = True · Font. p). p) . p) . Select . p) .Offset(O. 9: Macrosfor least squares &forthe propagation ofuncertainty GoTo Stepl Else 433 ActiveCell.Select GoTo NoLabel End I f End I f Step3 : With ActiveCell · Value = "CM:" · Font.Offset(3. Italic = True · HorizontalAlignment End With = xlRight ActiveCell. Bold = True · Font. Select GoTo NoLabel End I f End I f Step2: With ActiveCell · Value = "StDev:" · Font. Select If p = 1 Then If (IsEmpty(ActiveCell) Or ActiveCell. p) . Italic = True · HorizontalAlignment End With = xlRight ActiveCell. Bold = False · Font. Colorlndex = 11 · HorizontalAlignment = xlRight End With ActiveCell.Offset(3. Select GoTo NoLabel End I f End If Stepl: With ActiveCell .Value = "Coeff:" · Font.
Select Next i Application. The user specifies .Value "<lE20" Else ActiveCell. vbYesNo.Va1ue = vArray(i. Select For j = 2 .cMax) . de Levie. Provide as optional output the array of linear .Offset(O. 1 .Value Sqr(vArray(j. Select Selection. Select Next j ActiveCell.Value = StDevY If P = 0 And cMax = 2 Then GoTo Las tLine ActiveCell.cMax) .p) .1 & " cells.Select Next j ActiveCell. 1) .434 R. correlation coefficients.P . j» End If ActiveCell.Offset(1.ScreenUpdating = True . j) < 1E40 Then ActiveCell. the cell block in which to write this array If P = 0 And cMax = 2 Then GoTo LastLine jAnswer = MsgBox ("Do you want to see the " & "matrix of linear correlation" & Chr (13) & "coefficients? It will need a " & "block of " & cMax + p . Select Next j ActiveCell.Value = bArray(j.Offset(l.ColorIndex = 11 ActiveCe!I.Offset(l. Advanced Excel for scientific data analysis NoLabel: ActiveCell.P To cMax ActiveCell.Font.InputBox(Prompt:= "The array should be located in:". 0) . Make sure that the selected block has the correct size .Offset(O. 1 . 1) ActiveCell. Select For i = 2 .Offset(1.Offset(O.P To cMax If vArray(j. 1 . Select For j = 2 .".1 & " by" & cMax + p .P .P To cMax For j = 2 .P . "Least Squares Fit") OutlineMatrix: If jAnswer = vbYes Then Set myRange = Application.Offset(O.P To cMax ActiveCell.Select . j) ActiveCell. Type:=S) myRange.cMax) . 1) . 1) . 1 .
j) / Root Next j Next i Selection. j + 1) Next j Next i For i = 1 To cMax . j» lccArray(i. Borders (xlEdgeRight) . j) = vOArray(i.Borders(xlEdgeTop) . "Least Squares Fit" GoTo OutlineMatrix End If If Selection. Value lccArray End If If P = 1 Then For i = 1 To cMax For j = 1 To cMax vOArray(i. j) / Root Next j Next i 435 . Borders (xlEdgeBottom) . j) Next j Next i For i = 1 To cMax For j = 1 To cMax Root = Sqr(vArray(i. LineStyle = xlDashDot Selection.LineStyle = xlDashDot Selection. Borders (xlEdgeLeft) . LineStyle = xlDashDot Selection.Borders(xlEdgeTop) .Weight = xlThin Selection. Rows. " .Weight = xlThin Selection.LineStyle = xlDashDot Selection. Borders (xlEdgeLeft) .Weight = xlThin Selection. j) = vArray(i.Count <> cMax + p .1 Root = Sqr(vOArray(i. Please correct. j» lccArray(i.1 vOArray(i.Ch.1 For j = 1 To cMax .1 & " rows. i) * vOArray(j.1 Then MsgBox "The selected range does not have " _ & cMax + p . j) = vArray(i. Count < > cMax + p . i) * vArray(j.1 Then MsgBox "The selected range does not have " & cMax + p .Columns. 9: Macrosjar least squares &jorthepropagatian ajuncertainty I f Selection.Weight = xlThin Write the array of linear correlation coefficients If P = 0 Then For i = 1 To cMax . j) = vArray(i + 1. Please correct". Borders (xlEdgeBottom) .1 For j = 1 To cMax .1 & " columns.Borders(xlEdgeRight) . "Least Squares Fit" GoTo OutlineMatrix End If Draw a box around the reserved area Selection.
. 1 • 11 11 . . The user specifies the highest order to be displayed.4. " . increasing (positive integer) orderm. and Transpose. . PURPOSE: and LSPolyl compute the parameters and their standard deviations for an unweighted least squares fit of a dependent variable y to a power series in x of . I 1 1 1 " Jc) R de Levie . . . 1 1 . . . . . Advanced Excel for scientific data analysis Selection. . I . . . T. . T I I I. . . calls the subroutines Invert. . . 1 . . " " " . . . . I . INPUT: The input data must be arranged in a block of two adjaThe macros LSPolyO . until it reaches a userselected maximum polynomial order. in turn. Value End If End If LastLine: End Sub = lccArray 9. 1 . . de Levie. . . then automatically adds the nexthigher term in the polynomial and repeats the process. . . .436 R. . . " 'A LEAST SQUARES POLY A 1 I AI"'" ? I""". I 1 I I I I I I IIIIII1IIIIIIIIIIIAAAAAAAAAAAAAAAAAAAAAAAA~IIIIIIIIIIII1II . LSPoly is meant for use in cases where the length of such a polynomial is not known a priori. Oct 1 2002 . SUBROUTINES: The macros LSPolyO andLSPolyl call the subroutine 'LeastSquaresPoly which. .3 LSPoly This routine is not intended for routine fitting of data to a known polynomial such asy = ao + a\x + a2x2 or Y = a2x2 + a4X. I I I I I I I I I r I I 1 I I I I I I I I I I I I I I I I I 1 I I III I I I I I I I I? I I ". . Multiply. .\L~ • . . 1 1 . 1 1 . LSPoly first fits the data to the lowestorder polynomial in integer powers of x. Use the custom macros Orthol or OrthoO in order to decide what polynomial order might be optimal. 1 1 1 • . " t 1 1 . for which LSO or LS 1 should be used instead. . . and the user wants to screen a number of different polynomial lengths and compare the resulting statistics. . 1 1 . . . . . . . I I I """"""" I I I 1 1 A A A A A A A A AA A A A A A A A A A AhA A A A 1 111111 I I I 1 I T 1 1 1 " . ..
(3) the standard deviation of the fit = for each val ue of m. The second row displays the coejJi. . NumberForma t ="0. . The second and third . a wider column to display its result. If you find cells filled with ####### the macro needs . and each additional order will use a block that is one column wider than its predecessor. 437 cent columns. and insert a larger number in Column width . . . . rows also list the values for the Fratios for 5% and 1% . corresponding standard deviations. change the instructionActiveCell. Click on the . . . lest those data be overwritten. etc. . The top . If the macro doesn't seem to respond. OUTPUT: The macro provides: (1) the coefficients of the least squares fit of y to the power series a (l)x + a (2)x A2 + = Sum[a (i)xAi] for i = 1 (l)m for LSPolyO.01 for each value of m. . or to a (0) + a (l)x + a (2)XA2 + =Sum{a(i)xAi] for i =O(l)m forLSPolyl. (2) the corresponding standard deviations s for these coefficients. . 9: Macrosfor least squares &forthepropagation ofuncertainty . (4) the Fratios for alpha 0. letter(s) at the top of the column (5). The output will be displayed under the input data. and the standard deviation of the fit. row lists the order of the polynomial. OOOOE+OO". "0.Ch. rightclick. Please move valuable data from the output region before running this macro. and call the macro LSPolyO or LSPolyl . . dents of the fitted polynomial.. by order. . Highlight the data in these two columns. OOOE+OO". For each order the output consists of three rows.OOE+OO" to . . If you want the output to display more digits. Then specify the maximum polynomial order for which you want the results displayed. . with the dependent variable y in the leftmost column. array. and will gradually expand to the right as results for higher orders occupy increasingly more space. The size of the macro output will be determined by your choice of the maximum polynomial order. . WARNINGS: The output is displayed below the highlighted input and therefore might escape your notice. The firstorder output will occupy a block of 3 rows of 5 columns for LSPolyO (6 for LSPolyl). look below it. . and the independent variable x to its right. Acti veCell. . respectively. .05 and 0. and the third row the .NumberFormat = "O. . For easy reading the rows are colorcoded .
F As Variant. btArray. pArray. I Dim cMax As Integer.438 R. Sub LeastSquaresPoly(p) II. I"""'" r" r. equal to either one or zero. J"""" t. btqArray. r"""""" . . vOArray As Variant Dim myRange As Range Dim bArray. lccArrayAs Variant Dim outputArray As Variant. t " " " " " " " " " " " " " " . p. ytyArray . rMax As Integer Dim Color(l To 7) As Integer. XArray. highlight the (two columns . PROCEDURE: . for general unweighted . M2. Sub LSPolyl () r. Advanced Excel for scientific data analysis . t"". vArray As Variant. piArray Dim qArray.. z As Double Dim DataArray As Variant. i As Integer. least squares fit Dim p As Double P = 1 Call LeastSquaresPoly(p) End Sub "t". In order to start the process. f " " " " .. xtArray. de Levie. for unweighted least squares . make sure that the output area . The function of the following two drivers is merely to set the value of one parameter. call LSPolyO or LSPolyl . Before calling the macro. in order to choose between a general least squares fitting (p = 1) or one that forces the curve through the origin (p = 0) .. j As Integer Dim M As Integer. SSR As Variant Dim varY As Variant. . . " " " . StDevY As Double Dim u As Double. wide. MaxOrder As Integer Dim FRatiol As Double. . fit through the origin Sub LSPolyO () Dim p As Double p 0 Call LeastSquaresPoly(p) End Sub = """ r. YArray. ytArray. is clear of valuable data (see above under WARNING) . N As Integer. . and at least 3 rows high) input data block. J. FRatioS As Double Dim Root As Double. ". . then . Ml.. r " " " " " " " " " " " " " " " " " " " " " " " " " " " .
1 RedimensionArrays: . kAnswer 439 . 1) .Columns. Item(1.". but smaller than the number of data " "points." & Chr (13) & "one for Y. Dimension the arrays: & & ReDim ReDim ReDim ReDim ReDim YArray(1 To rMax. 1 To 1) As Double u 1 = z 0 Check that the number of columns is 2: If cMax <> 2 Then MsgBox "There must be two columns. 1 To rMax) As Double ytyArray(1 To 1. 1 To cMax) As Double ytArray(1 To 1. 9: Macrosjor least squares &forthe propagation ojuncertainty Dim Answer.Count ReDim F(1 To rMax.Rows. 1 To 1) As Double ReDim SSR(1 To rMax. so that the number of data points is sufficient to define the problem: I If rMax < 4 Then MsgBox "There must be at least 4 input" & Chr (13) & "data pairs to define the problem." Chr (13) & "14. "Least Squares Polynomial Fit" End End I f Check that rmax > 3. 1 To 1) As Double xtArray(1 To cMax. and one for X. "MaxOrder") If MaxOrder < 3 Then MaxOrder = 3 If MaxOrder > 14 Then MaxOrder = 14 If MaxOrder >= rMax Then MaxOrder = rMax . "Least Squares Polynomial Fit" End End I f DataArray = Selection. Select the maximum order: MaxOrder = InputBox ("Select as maximum order an integer between 3 and. Select . jAnswer.Count cMax = Selection." & Chr (13) & Chr (13) & "Enter the maximum order: ". Determination of the array size: rMax = Selection. 1 To rMax) As Double ." . iAnswer. 1 To 1) As Double XArray(1 To rMax. Value Selection.Ch.
2) Next i If cMax > 2 Then For j 3 To cMax For i 1 To rMax XArray (i. 1 To CMax) As Double ReDim piArray(1 To CMax. then fill yArray and xArray. 1 To cMax) As Double . 1) = CDbl(p) Next i For i = 1 To rMax XArray(i. 2) = DataArray(i. 1 To rMax) As Double ReDim vArray(1 To CMax. 2) Next i Next j End If * XArray (i. Read the da taArray. "Least Squares Polynomial Fit" End End I f Next i For i = 1 To rMax If IsEmpty (DataArray (i. with the data in the xcolumn (s) For i = 1 To rMax XArray(i. 1) Next i If cMax = 2 Then For i = 1 To rMax If IsEmpty(DataArray(i. 1 To 1) As Double ReDim btArray(1 To 1. 1 To cMax) As Double ReDim qArray(1 To CMax. 1 To cMax . de Levie.440 R. 1 To 1) As Double ReDim bArray(1 To cMax. 1 To cMax) As Double ReDimvOArray(1 To cMax . Advanced Excel for scientific data analysis ReDimpArray(1 To CMax.1 + p) As Double ReDim lccArray(1 To CMax. 1» Then MsgBox "Yvalue (s) missing". j) = XArray (i. For i = 1 To rMax YArray(i. 1 To rMax) As Double ReDimM2(l To rMax. j .1 + p. Then MsgBox "Xvalue(s) missing".1) . "Least Squares Polynomial Fit" End End I f Next i 2» End I f Fill the = first 0) (for p column of xArray with zeroes or ones (for p = 1) r the rest . 1) = DataArray(i. 1 To cMax) As Double ReDim btqArray(1 To I. 1 To 1) As Double ReDimMl(l To rMax.
cmax. 1.ScreenUpdating = False ActiveCell. Calculate SSR = Y'Y .p) FRatioS = (F(cMax. 9: Macrosfor least squares &forthe propagation ofuncertainty . rmax. I.p) StDevY = Sqr(Abs(varY» If cMax > 2 Then F(cMax. 1) varY = SSR(cMax.cMax + 1 . 1) . YArray. rMax.cMax + 1 . 1) / (rMax . 1..Ol.1) * (rMax .FInv(O. 1) = «SSR(cMax . xtArray) Multiply (xtArray. YArray. rMax. or t denote . 1) = ytyArray(1. I. cMax. rMax. 1. 1» . cmax) rmax) cmax) cmax) 1) 1) rmax) 1) cmax) 1) Call Call Call Call Call Call Call Call Call Transpose (XArray. piArray) Multiply (xtArray. qArray.b'X'Y. where . and vArray. cmax. The various arrays and their dimensions . j) = varY * piArray(i. cMax.Offset(rMax + I. 1. 1) / SSR(cMax. cMax. columns) are: Y =yArray X = xArray X' xtArray X' X pArray (X' X) " piArray qArray X' Y b bArray y' = ytArray y' Y = ytyArray b' btArray b' X' Y = btqArray ( ( ( ( ( ( ( ( ( ( ( 441 (rows.FInv(O. qArray. cmax. as varY =SSR/ (rmaxcmaxp+1).OS.p» FRatiol = (F(cMax. cMax. 1. SSR(cMax. 1. 1» / (Application. the variance . bArray) Transpose (YArray. ytArray) Transpose (bArray. rMax . pArray) Invert (pAr ray . rMax . here only use the diagonal elements. btArray) Multiply (ytArray. 1) rmax. and then varY. 1. cmax. as V = (X' X)" times varY.I.cMax + 1 . 0) . and" or i indicate inversion . 1. 1. 1. cMax. cMax. qArray) Multiply (piArray. cMax. cmax. Select and set the .p» End If For i = 1 To cMax For j = 1 To cMax vArray(i. rMax. 1. Paint color bands for up to 14 orders. transposition.btqArray(1. cMax. ytyArray) Multiply (btArray. variance matrix. compute b = (X'X)"X' Y .cMax + 1 . rMax.Ch. . the co. of y. of which we . i. cMax. the variances. 1» / (Application. j) Next j Next i Application.e. btqArray) . XArray. 1.
1) Color (cMax .ColorIndex End I f Color (cMax .1 Selection. 2) .8) ActiveCell.Interior. de Levie. Italic = True · HorizontalAlignment = xlRight .442 R.OOE+OO" Next j ActiveCell.ColorIndex Else ActiveCell.Interior. 1) .Offset(l.ColorIndex Else ActiveCell.8) Color (cMax . Select With ActiveCell · Font.Bold = False ActiveCell.Font. Italic = False · HorizontalAlignment = xlLeft · Value = "Order " & cMax .Offset(3. Select If cMax < 9 Then ActiveCell.Italic = False ActiveCell.Value = "term " & j .Font.2) . Select With ActiveCell · Font. Italic = True · HorizontalAlignment = xlCenter .ClearContents If cMax < 9 Then ActiveCell. Display the top line of the output With ActiveCell · Font. Bold = False · Font.Offset(O. Select Next i ActiveCell. Bold = False · Font.NumberFormat = "O.ColorIndex End If ActiveCell. Select . numerical Color(1) Color (2) Color (3) Color(4) Color (5) Color (6) Color (7) 38 40 36 35 34 37 39 to scientific with 3 decimal places For i = 1 To 3 For j = 1 To cMax + p + 2 ActiveCell. format Advanced Excel for scientific data analysis . Offset (0 .p End With Next j Selection.Offset(O. 1) . cMax .Interior.1 End With For j = 1 To cMax + p . 0) .Interior. Bold = True · Font.1) Color (cMax .p .
Select Selection.p + 1. 9: Macrosfor least squares &forthepropagation ofuncertainty · Value = "Sy:" End With Selection.1 Selection.p.Dev. 2) .HorizontalAlignment = xlLeft If cMax = 2 Then Selection. 1) . HorizontalAlignment xlLeft Selection.Ch. Display the center line of the output With ActiveCell · Font. HorizontalAlignment = xlRight · Value = "Coeff. 1) .Offset(O.P + 1) < 1E40 Then ActiveCell. Italic = False . Display the bottom line of the output With ActiveCell · Font. j . Value = StDevY Se1ection. 1) . Select With ActiveCell · Font.Select 443 . Bold = False · Font. Select .Value = "<1E20" .1 Selection. Bold = False · Font. Italic = True .:" End With For j = 1 To cMax + p . Italic = False · HorizontalA1ignment = xlCenter End With If vArray(j . Select With ActiveCell · Font.p . Bold = False · Font.Offset(O.Value = "St.Offset(1.Offset(O.2) . Select With ActiveCell · Font.Offset(O. cMax .Value = bArray(j + 1 . 1) End With Next j Selection. Value = "N/A" If cMax > 2 Then Selection. Bold = False · Font. Bold = False · Font.:" End With For j = 1 To cMax + p . 1) .Offset(O. Italic = True · HorizontalAlignment = xlRight · Value = "FRS:" End With Selection.Select Selection. Horizonta1Alignment = xlCenter .Offset(1. Italic = True · HorizontalAlignment = xlRight . cMax . Value = FRatio5 Selection.2) .p .
.444 R. 1111' t .1.. 1 1 . . 2) . . viz. Advanced Excel for scientific data analysis Else ActiveCell. .. 1) . .p .2) . . I ."." t t""'" '" .".!.. Bold = False · Font. when one starts withy = a\x). . v2.X3 + . ... 1 1 .1\ IA I I AAAA A A AAA A A A A A A A A A A AAAAAA I "t" A" " I I "" "" .Offset(rMax . .. • t . ... Select Selection. Select If cMax < MaxOrder + 2 Then cMax = cMax + 1 ActiveCell.. . etc. . . . . " " " " ".1\1\1\. Value = liN/A" If cMax > 2 Then Selection.""""""1\1\1\1\1\.4 LSMuiti a\x\ For least squares data fitting to a multivariate expression such as y = + a2x2 + a:...1\ I . . Select If cMax = MaxOrder + 2 Then End GoTo RedimensionArrays End Sub 9. LEAST SQUARES MULTI " " " "" 11111 11111 III II! ! I I I I I I I I I I 1 r " I "" I " I I " I I """"" I I I I I 1 I I I I 1 ". . . HorizontalAlignment= xlRight · Value = "FRl:" End With Selection.. while maintaining the order ofthe terms in the multivariate analysis... .""" "'" 'I I I I I I I I I I I I I I I I' I I I I I I I I I' I I III I I I I I I I" I I I I I I I I I I I I I I I I '" . .. 1 .Offset(O. . what are the fitting coefficients and their statistical uncertainties when successive terms are added. i. t . The macros LSMu 1 ti 0 and LSMu 1 ti 1 compute the parameters and their standard deviations for an unweighted least squares fit of a dependent variable y to a userspecified multivariate expression of the form y = alxl + a2x2 + .. . "'" """'" "'" . . then triesy = a\x\ + a2x2. cMax . . theny = a\xI + a2x2 + aJX3. . .e. use LSO or LS 1. """.HorizontalAlignment = xlLeft If cMax = 2 Then Selection. 1 •• . 0) .. 1 . . . The macro LSMulti answers a more specialized question.Offset(1. .. .Offset(O. Select With ActiveCell · Font. .. Italic = True . . ".. '/P.. ..Value Sqr(Abs(vArray(j p + 1."1\. Value = FRatio1 Selection. j . I PURPOSE: . de Levie. .. . ..P + 1») End If Next j Selection. D de Levie ~??~ IIIIII!!!!IIIIIIII!IIIII'IIIIIIII!IIIIIIIII Oct.
•. The top row lists the standard deviation of the fit. and the third row the corresponding standard deviations.. and Transpose.OOE+OO" to Acti veCell. independent variables x to its right. and (4) the corresponding Fratios for alpha ~ 0. (2) the corresponding standard deviations s for these coefficients. 9: Macros/or least squares &jorthepropagation o/uncertainty (for LSMultiO) or y = aO + alxl + a2x2 + . 1 SUBROUTINES: The macros LSMultiO and LSMultil call the subroutine LSMulti which. + a (i)xAi ~ Sum[a (i)x(i)]for i ~ l(l)m for LSMultiO. For each order the output consists of three rows. INPUT: 1 1 1 . For easy reading the rows are colorcoded. and therefore does not provide all possiblepermutations.NumberFormat = "O. and the results of two Ftests. The second and third rows also list the values for the Fratios for 5% and 1 % respecti vely. and call the macro LSMultiO or LSMultil . Multiply. 1 . (1) the coefficients of the least squares fit of y to the series a(l)x(l) + a (2)x 2 + . NumberForma t ="0. the standard deviation of the 'overall fit. The second row displays the coefficients of the fitted polynomial. with .01. A .. . the dependent variable y in the leftmost column. Highlight the data . change the instructionActiveCell. OOOE+OO". or a (0) + a(l)x(l) + a(l)x(2) + = Sum[a (i) x (i) } for i = 0 (l)m for LSMultil. OUTPUT: . s(y). for 5% and 1% probabilityrespectively. OOOOE+OO". The input data must be arranged in a block of at least 3 (for LSMultiO) or 4 (for LSMultil) adjacent columns. FR5 1 andFR1.Ch. The macro yields: the fitting coefficients a (i). and the . . . in these columns. calls the subroutines Invert. If those possible permutations are needed. "0. (for LSMultil) by gradually (one term at a time) increasing the number of terms included in the least squares analysis from left to right 445 1 1 1 Note that the macro does NOT rearrange the columns. in turn. their standard deviations sri). . The macro provides: . .05 and 0. instead use the macros LSPermuteO or LSPermutel. If you want the output to display more digits. (3) the corresponding standard deviation of the fit. etc.
r"" I' I r r" t . r. M As Integer. N As Integer. The function of the following two drivers is merely to set the value of one parameter.446 WARNING: R. StDevY As Double Dim u As Double. make sure that the output area is clear of valuable data (see above under WARNING) .. FRatio5 As Double Dim Root As Double. It Sub LSMul til () Dim p As Double p = 1 Call LSMulti (p) End Sub t for general unweighted least squares fit """"""""""""""""""""""""""""" . Sub LSMul ti (p) Dim cMax As Integer. t . r."". p.""". which should occupy at least three adjacent columns. 1 1 1 """"""""""""'1"""". MM As Integer Dim MaxOrder As Integer. de Levie. lest those data be overwritten. .. I . Itt"""""" tr"" r. in order to choose between a general least squares fitting (p = 1) or one that forces the curve through the origin (p = 0) 1 Sub LSMul tiO () Dim p As Double = 0 Call LSMul ti (p) End Sub for unweighted least squares fit through the origin p r". """. I I. F As Variant . Please move valuable data from the output region before running this macro. i As Integer Dim j As Integer. highlight the input data. z As Double Dim DataArray As Variant. equal to either one or zero. Color(l To 7) As Integer. The macro does NOT check whether that area is free of data. rMax As Integer Dim FRatiol As Double. " . Advanced Excelfor scientific data analysis 1 The macro output will take up much space below and to the right of the input data. PROCEDURE: Before calling the macro. In order to start the process. r. then call LSMultiO or LSMul til.
1 To 1) As Double XArray(l To rMax. 1 To 1) As Double . outputArray As Variant Dim SSR As Variant.". xtArray. Check that rmax > cmax+l. pArray. Value Selection. 9: Macrosfor least squares &forthe propagation ofuncertainty Dim lccArray As Variant. 1 To rMax) As Double pArray(l To MM. 1 To 1) As Double ReDim SSR(l To rMax. YArray. kAnswer Determination of the array size: rMax = Selection. ytArray.Count cMax = Selection. 1) .Columns. piArray Dim qArray.Count ReDim F(l To rMax. 1 To MM) As Double piArray(l To MM. 1 To MM) As Double ytArray(l To 1. I 447 iAnswer.Rows. jAnswer. . btqArray. varY As Variant Dim vArray As Variant. VOArray As Variant Dim myRange As Range Dim bArray. btArray. XArray. Select MM = 2 RedimensionArrays: .Ch. so that the number of data I points is sufficient to define the problem: If rMax < cMax + 1 Then MsgBox " There must be at least " & cMax + 1 & Chr(13) & "input data pairs to define the problem. 1 To 1) As Double u 1 = 0 z . 1 To rMax) As Double ytyArray(l To 1. 1 To cMax) As Double qArray(l To MM. ytyArray Dim Answer.". 1 To 1) As Double xtArray(l To MM. Check that the number of columns is at least (3+p): If cMax < 3 + P Then MsgBox "There must be at least " & 3 + P "Least Squares Multivariate Fit" End End I f & " columns. "Least Squares Multivariate Fit" End End I f DataArray = Selection. Item(l. Ml. M2. Dimension the arrays: ReDim ReDim ReDim ReDim ReDim ReDim ReDim ReDim YArray(l To rMax.
j» Then MsgBox "Xvalue(s) missing". For i = 1 To rMax YArray(i. If MM = 2 Then For i = 1 To rMax If IsEmpty(DataArray(i. the rest with the data in the xcolumn (s) For i = 1 To rMax XArray(i. "Least Squares Multivariate Fit" End End I f Next j Next i End I f Fill the first column of xArray with either . then fill yArray and xArray. 2) Next i If MM > 2 Then For i = 1 To rMax For j = 3 To MM XArray(i. 1 To rMax) As Double ReDim vArray(l To MM.1 + p. "Least Squares Multivariate Fit" End End I f Next i For i = 1 To rMax For j = 2 To CMax If IsEmpty(DataArray(i. 1) = DataArray(i. 1» Then MsgBox "Yvalue (s) missing". j) = DataArray(i. 1 To 1) As Double ReDimM1(1 To rMax. j) Next j Next i End I f . 1 To MM) As Double . Advanced Excel for scientific data analysis ReDim bArray (1 To MM.448 R. 1 To rMax) As Double ReDimM2(1 To rMax. 1 To 1) As Double ReDim btArray(1 To 1. 1) = CDbl(p) Next i For i = 1 To rMax XArray(i. 1 To MM) As Double ReDim vOArray(l To MM . de Levie.1 + p) As Double ReDim lccArray(1 To MM. 1 To MM . zeros (for p = 0) or ones (for p = 1). Read the dataArray. Next i 1) Check the input data for contiguity. 2) = DataArray(i. . 1 To MM) As Double ReDim btqArray(1 To 1.
1. Call Call Call Call Call Call Call Call Call Transpose(XArray. Paint color bands for up to 14 orders.e.b' X' Y. rMax. columns) are: Y = yArray X = xArray X' = xtArray X' X = pArray (X' X) " =piArray X' Y qArray = bArray b y' = ytArray y' Y = ytyArray b' =btArray b' X' Y = btqArray ( ( ( ( ( ( ( ( ( ( ( (rows. of y. mm. transposi tion. MM. of which we . btqArray) . 1. rMax. and vArray. 9: Macros for least squares &jorthepropagation of uncertainty . mm. piArray) Multiply(xtArray. MM. bArray) Transpose(YArray. MM. and then varY. The various arrays and their dimensions . MM. Calculate SSR = Y' Y . rMax. ytArray) Transpose(bArray. MM. MM. 1. pArray) Invert (pArray. 1. 1. mm. here only use the diagonal elements. rmax. XArray. xtArray) Multiply(xtArray. rMax. where ' or t denote . mm. the variance . the variances For i = 1 To MM For j = 1 To MM VArray(i. qArray. i. btArray) Multiply(ytArray. compute b = (X' X) " X' Y. I. = False 0) . and set the . YArray. the . covariance matrix. qArray) Multiply(piArray. MM.ScreenUpdatinq ActiveCell. j) Application. as varY = SSR/ (rmaxcmaxp+1). MM..Offset(rMax + I. 1) mm) rmax) mm) mm) 1) 1) rmax) 1) mm) 1) rmax. YArray. 1. MM. numerical format to scientific with 3 decimal places Color(1) Color (2) Color(3) Color (4) Color (5) Color (6) Color (7) For i 38 40 36 35 34 37 39 = 1 To 3 . as V = (X' X)" times varY. j) Next j Next i varY * piArray(i. ytyArray) Multiply(btArray. 1. I. I. and" or i indicate inversion 449 . I. Select .Ch. rMax. qArray. 1. mm.
Offset(0.Interior.Ita1ic = False ActiveCe11. 1) .Co1orIndex Co1or(MM Else ActiveCe11.Horizonta1A1iqnment = xlLeft Selection.Offset(0.Co1orIndex Co1or(MM Else ActiveCe11. 0) .1 Se1ection.450 R.0ffset(1. Select Next i ActiveCel1.Font.0ffset(3. Italic = True · Horizonta1A1iqnment = xlCenter · Value = "term " & j .p End With Next j Se1ection.Ita1ic = False · Horizonta1A1iqnment.0ffset(O.NumberFormat = "O. JVIlVI . 1) . 2) .Font.2) .C1earContents If MM < 9 Then ActiveCe11.= xlLeft · Value = "Set # " & MM . Select If JVIlVI < 9 Then ActiveCe11.Offset(l. Italic = True · Horizonta1A1iqnment = xlRight · Value = "Sy:" End With Se1ection. Font.Interior.Font. Bold = False · Font.P . Value = StDevY Selection.Bo1d = True .Offset(0. Advanced Excel for scientific data analysis For j = 1 To MM + P + 2 ActiveCe11.2) .Font. Display the center line of the output With ActiveCell · Font.Co1orIndex Co1or(MM End I f ActiveCe11.Co1orIndex Co1or(MM End If ActiveCe11. de Levie. Display the top line of the output With ActiveCell . Select With ActiveCell · Font.Interior. Select Se1ection. Bold = False . 1) .Se1ect With ActiveCell · Font.Interior.1 End With For j = 1 To MM + p .Bo1d = False ActiveCe11. Select . Bold = False 1) 8) 1) 8) .P . JVIlVI .OOE+OO" Next j ActiveCe11. Select .
Bold = False · Font. 2) . 2) .:" End With For j = 1 To MM + p . Italic = True · HorizontalAlignment = xlRight · Value = "FRS:" End With Selection.HorizontalAlignment = xlLeft If MM = 2 Then Selection.Offset(O.Bold = False .1 Selection.2) . Select With ActiveCell . Font. Select 451 .HorizontalAlignment = xlRight · Value = "Coeff.HorizontalAlignment = xlRight . Italic = False · HorizontalAlignment = xlCenter End With If vArray(j . 1) .Dev.Value Sqr (Abs (vArray (j .p + 1.Offset{O. j . MM . Bold = False · Font. Select With ActiveCell · Font.p + 1.Offset(1. Value = FRatioS Selection.p. Italic = True · HorizontalA1ignment = xlRight · Value = "FR1:" = .Select Selection. Font. Font. Display the bottom line of the output With ActiveCell · Font. Bold = False · Font. Font. Italic = True .Offset(O. Font.Ch. Italic = False .P .p + 1) < 1E40 Then ActiveCell.P + 1») End If Next j Selection. j . 1) . 1) . Bold = False . Select With ActiveCell · Font.Offset(O.HorizontalAlignment = xlCenter . 9: Macrosfor least squares &forthe propagation ofuncertainty .Value = bArray(j + 1 .Value "<IE20" Else ActiveCell. Value = "N/A" If MM > 2 Then Selection. Italic = True .Offset(O. Select With ActiveCell .1 Selection.:" End With For j 1 To MM + p . 1) End With Next j Selection.Value = "St.
. . variable y to a userspecified multivariate expression of the form y = alxl + a2x2 + •• (for "Thru 0". MM . = 0) or y = aO + alxl + a2x2 + • • (for the "General" case ." T I I r 1 III " I ' IA I III 1 .(c) t ' l T T T T . "" T T t r I r r t r. r ' r ' r. . for all possible combinations and permutations of unweighted least squares fits of a dependent variable y to a userspecified multivariate expression of the formy = a.Offset(rMax . T t .. de Levie. r ' r 1 r 1 t 1 _"L. When more terms need to be included.P . . + a2X2 + . calls the subroutines Invert. . .O£fset(l.J. Ar' r ' T.: R de Levie Oct. Advanced Excel for scientific data analysis End With Selection. 2(l02 . sy. i. . . for aO . . in turn. Select If MM < cMax + 1 Then MM = MM + 1 ActiveCell. the macro is readily extended to accommodate this by extend the logic used.452 R. Multiply. for all possible combinations andpermutations of unweighted least squares fits of a dependent . for up to six linear terms Xi.e. Again.x. .1.Value = "N/A" If MM > 2 Then Selection. Sy. The macro LSPermute computes the standard deviations of the fit. for a standard multivariate least squares fit. t " tIt"" f I I' r 1 r . (for ao = 0) or y = ao + a. Value = FRatio1 Selection. PURPOSE: . 0) . A LEAST SQUARES PERMUTE f' A. . t . + a2X2 (when ao is not constrained). 1) . r ' r ' 1 .. r ' r 1 r) r" r . r t • . .HorizontalAlignment = xlLeft If MM = 2 Then Selection. . .O£fset(O.. SUBROUTINES: The macro LSPermute calls the subroutine LLSS which. for up to six linear terms xi. Select Selection. r. . .". 1 . 5 LSPermute The macro LSPermute computes the standard deviations of the fit. . Select If MM = cMax + 1 Then End GoTo RedimensionArrays End Sub 9. where the value of aO is not constrained).x.2) .. . . use LSO or LS 1 instead. and . 1 • . . T t T.
outputArray As Variant XArray. ccMax As Integer r As Integer. highlight the input data array. However. and (3) in its third column the values of the standard deviation of the fit Sy for the general case. YArray Dim Down As Integer Down = 0 . and extending by rMax lines below the bottom line of the output WARNING: Make sure that there are no valuable data below the input array. make sure that the space below the input data array is clear of valuable data (see above under WARNING) . c3 As Integer c4 As Integer. c2 As Integer. with the dependent variable y in the leftmost column. c5 As Integer. assuming aO = 0. which should occupy at least two adjacent columns. PROCEDURE: Before calling the macro. I I I I OUTPUT: In order to keep the output compact. and the independent variables xi to its right. the macro only provides three columns of output information: (1) in its first column the indices i of the parameters xi considered. This macro does NOT check whether that space is empty. even though that may not be obvious from the results displayed. The output requires a space as wide as the input array.Ch. c6 As Integer CC As Integer. Sy. because these will be overwritten. rMax As Integer StDevY As Double inputArray As Variant. I I I I I r I I I I I I I In order to start the process. (2) in its second column the values of the standard deviation of the fit. the output will sweep clear an area as wide as the input array. Sub LSPermute () Dim Dim Dim Dim Dim Dim Dim C As Integer. 9: Macrosjor least squares &forthe propagation ojuncertainty I 453 Transpose. INPUT: The input data must be arranged in a block of at least 3 contiguous columns. then call LSPermute. cMax As Integer.
. . "LSPermute" End End If If rMax < cMax + 1 Then MsgBox "There are too few rows for the num" & Chr (13) & "ber of independent parameters used. Check the size of the input array If cMax < 3 Then MsgBox "There should be at least three input columns. Dimension the data array and the output array ReDim outputArray(l To rMax.". "LSPermute" End End I f If cMax > 7 Then MsgBox "This macro can only handle" & Chr(13) & "six independent parameters." . de Levie. .Columns. "LSPermute" End End I f Next i Next j For i . 1) Next r For r .454 I R. "LSPermute" End End I f Next i For j = 2 To cMax For i = 1 To rMax If IsEmpty(inputArray(i. Fill yArray = 1 To rMax YArray(r.".Rows. 1 To 1) As Double . Value CMax Selection. .Count rMax = Selection. 1) = inputArray(r. "LSPermute" End End If . 1 To 1) As Double ReDim XArray(1 To rMax.Count . 1» Then MsgBox "Yvalue (s) missing". Check for missing input data = 1 To rMax If IsEmpty(inputArray(i. j» Then MsgBox "Xvalue (s) missing". 1 To cMax) ReDim YArray(l To rMax. Advanced Excelfor scientific data analysis Read input arraydataArray inputArray = Selection.
Select Down = Down + 1 Next C Selection. 2) = inputArray(r. XArray. Select Down = Down + 3 the output for 1 variable: . 1 To ccMax) As Double For C = 2 To cMax For r = 1 To rMax XArray(r. 3) = StDevY Selection. Value = outputArray Selection. 3) "Indices:" "Thru 0" "General" Selection. 0) . 1 To ccMax) As Double For C 2 To cMax For c2 = C + 1 To cMax = . 1) C . 1. 0) .Offset(1. 1) = 1 Next r Call LLSS(ccMax.1 outputArray(l. Value = outputArray Selection.Offset(l. StDevY) outputArray(l. Offset (1 . 1) = "Standard deviation of fit" Selection. 0. 2) outputArray(1.Ch. YArray. Select outputArray(1. 1) a XArray(r. 1) outputArray(1.Offset(rMax + 1. rMax. Value = outputArray Selection. Select Down = Down + 1 compute the output for 2 variables: ccMax = 5 ReDim XArray (1 To rMax. YArray. Down = Down + rMax + 1 Write column headings 0) . C) Next r Call LLSS(ccMax. 0) . XArray. 2) = StDevY For r = 1 To rMax XArray(r. StDevY) outputArray(l. 0) . Compute ccMax = 4 ReDim XArray(1 To rMax. Offset (2 . C) Next C Next r nil Selection. 9: Macrosjar least squares &jorthepropagatian ajuncertainty Initialize xArray 455 For r = 1 To rMax For C = 1 To cMax outputArray(r. rMax. Select outputArray(1.
1. 1) = 1 Next r Call LLSS(ccMax. n & c3 . XArray. C) XArray(r. 4) inputArray(r. 0) . Value = outputArray Selection. Offset (1 . YArray. 1) = C . YArray. 3) = StDevY Selection. 1) C . XArray. StDevY) outputArray(l. XArray. Select Down = Down + 1 Next c2 Next C ° Selection. 2) = StDevY For r = 1 To rMax XArray(r. StDevY) outputArray(1. 0) . C) XArray(r. 1) XArray(r.456 R.1 & " . rMax. c3) Next r Call LLSS(ccMax. 1) = 1 Next r Call LLSS(ccMax.1 outputArray(l. rMax. rMax.Offset(l.1 outputArray(l.1 & n . Select Down = Down + 1 Next c3 Next c2 Next C ° Selection. YArray. 1) XArray(r. YArray. 2) inputArray(r. 3) inputArray(r. 0) . 3) inputArray(r. Value = outputArray Selection. 1 To ccMax) As Double For C = 2 To cMax For c2 = C + 1 To cMax For c3 = c2 + 1 To cMax For r = 1 To rMax XArray(r. 1. de Levie. 0. 3) = StDevY Selection. rMax. 0) . StDevY) outputArray(l. c2) XArray(r. Advanced Excelfor scientific data analysis For r = 1 To rMax XArray(r. 2) inputArray(r.1 & " . 2) = StDevY For r = 1 To rMax XArray(r. " S c2 .Offset(l. c2) Next r Call LLSS(ccMax. StDevY) outputArray(l.Offset(1.Select Down = Down + 1 . 0.Select Down = Down + 1 compute the output for 3 variables: ccMax = 6 ReDim XArray (1 To rMax. XArray. Compute the output for 4 variables: . " & c2 .
0. 2) inputArray (r. YArray.1 & ". 2) = StDevY For r = 1 To rMax XArray(r. c2) XArray(r. 3) inputArray(r. Select Down = Down + 1 compute the output for 5 variables: ccMax = 8 ReDim XArray(1 To rMax. 1 To ccMax) As Double For C = 2 To cMax For c2 = C + 1 To cMax For c3 = c2 + 1 To cMax For c4 = c3 + 1 To cMax For r = 1 To rMax XArray(r. Value = outputArray Selection.1 & "." & c3 . c4) Next r Call LLSS(ccMax. XArray.1 & ". 1) = 1 Next r Call LLSS(ccMax. 3) = StDevY Selection. 1) = C . 1 To ccMax) As Double For C = 2 To cMax For c2 = C + 1 To cMax For c3 = c2 + 1 To cMax For c4 = c3 + 1 To cMax For c5 = c4 + 1 To cMax For r = 1 To rMax XArray(r." & c2 . XArray. 1) = C . C) XArray(r. c2) XArray(r. n & c5 . 1. 3) inputArray(r. StDevY) outputArray(1. YArray. c4) XArray(r.Offset(l. 4) inputArray(r." & c2 . 5) inputArray(r. 1) XArray (r. rMax. c3) XArray(r. 4) inputArray(r.1 & ". StDevY) outputArray (1. 0) ." & c4 . rMax.1 & " . c3) XArray(r. 6) inputArray(r.1 ° ." & c4 . 0) .1 outputArray(1. YArray." & c3 .Offset(l. StDevY) outputArray(1. 9: Macrosfor least squares &forthe propagation ofuncertainty ccMax = 7 ReDim XArray (1 To rMax. 5) inputArray(r. 2) inputArray(r.1 & " . 0.1 & ". 1) XArray(r. c5) Next r Call LLSS(ccMax. Select Down = Down + 1 Next c4 Next c3 Next c2 Next C 457 ° Selection. rMax. XArray. C) XArray(r.Ch.
Advanced Excel for scientific data analysis outputArray(l.Offset(l. c5) inputArray(r.1 outputArray(l.Offset(l. 1. StDevY) outputArray(l. YArray. C) = inputArray(r.1 & c5 . XArray. 4) XArray(r. de Levie. 2) XArray(r. 1) =C . c2) inputArray(r.Offset(l. 0) . Select Down = Down + 1 compute the output for 6 variables: cCMax = 9 ReDitn XArray (1 To rMax. StDevY) outputArray(l. rMax.1 & fT f " & c3 . Value = outputArray Selection." & c4 . 1) = XArray(r. 1 To ccMax) As Double For C = 2 To cMax For c2 = C + 1 To cMax For c3 = c2 + 1 To cMax For c4 = c3 + 1 To cMax For c5 = c4 + 1 To cMax For c6 = c5 + 1 To cMax For r = 1 To rMax XArray(r. StDevY) outputArray(l. 7) Next r Call LLSS(cCMax. 0) . 0. 1) = 1 Next r Call LLSS(cCMax. XArray. 2) = StDevY For r = 1 To rMax XArray(r.1 & ". 0) . rMax. 1) = 1 Next r Call LLSS(cCMax. c6) inputArray(r." &_ ° ". 3) XArray(r. 6) XArray(r." & c2 . c4) inputArray(r. 5) XArray(r.458 R. YArray. Value = outputArray Selection. YArray. XArray. Select Down = Down + 1 Next c6 Next c5 Next c4 Next c3 = . rMax. 1. 3) StDevY Selection.1 & ". Select Down = Down + 1 Next c5 Next c4 Next c3 Next c2 Next C Selection." & c6 . c3) inputArray(r. 2) = StDevY For r = 1 To rMax XArray(r.1 & ". 3) = StDevY Selection.
qArray) Multiply (piArray.. piArray) Multiply(xtArray. 1. btArray) Multiply(ytArray. p. 1) .btqArray (1. varY = SSR / (rMax . YArray. ccMax. and Transpose... varY As Double pArray. piArray. qArray. ccMax. Select End Sub 459 '" . It calls the subroutines Invert. 1 To ccMax) As Double ReDim btqArray(1 To 1. ccMax. 1 To rMax) As Double ReDim ytyArray(1 To 1. 1.. Multiply. ccMax. ccMax. ccMax. ' " r"'''' . ytArray) Transpose(bArray. rMax.p + 3) StDevY = Sqr(Abs(varY» End 1) Sub . ... 1 To ccMax) As Double ReDim piArray(1 To ccMax." '"'11''''''''' l' . XArray. 1 To rMax) As Double ReDim pArray(1 To ccMax. ''''1 LLSS 1111 I 1 i " " ' ' ' ' 1 " ..Ch.""" .. ccMax. '''' 1 I"". 1 1""1'''' I I I I I I I I I I I I I I I I I 1 I I I I I I I I 1 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I Sub LLSS(ccMax. XArray. 1 To 1) As Double ReDim xtArray(1 To ccMax. ccMax. 1. 1. YArray.. 0) .. 1 To 1) As Double ReDim bArray(1 To ccMax. 1 To ccMax) As Double ReDim qArray(1 To ccMax. rMax."". ytyArray ReDimytArray(1 To 1. ccMax. 9: Macrosfor least squares &forthe propagation ofuncertainty Next c2 Next C Selection.. ytyArray) Multiply (btArray. 1 To 1) As Double ReDim btArray(1 To 1. btqArray SSR As Double.. 1. qArray... btqArray) compute the standard deviation of the fit SSR as ytyArray (1."""'" 11'''' " " ' ' ' ' " 1 . StDevY) This subroutine provides the standard deviation StDevY . 1. ytArray. Dim Dim Dim Dim btArray.. rMax..Offset(Down. pArray) Invert (pArray. xtArray) Multiply(xtArray. bArray) Transpose(YArray. 1 To 1) As Double compute the least squares fit Call Call Call Call Call Call Call Call Call Transpose (XArray... rMax. rMax. 11. rMax.. qArray xtArray. 1..ccMax .. of the least squares fit. 1 " '" t • • . 1. YArray.
. .1. . . as described in. 2 nd ed. Invert. .. . New York 1981 pp.. It also provides the fitting para. Ortho serves as a complement to LS.6 Ortho Ortho uses a simple GramSchmidt orthogonalization algorithm.14 of chapter 3. .. . de Levie. not be equidistant. 1 I 1 I . . t 1 I I I I I I I I I I I I "1 r"" 'I I I I I I I " I I r' I I f" I IA ORTHOGONAL POLYNOMIAL FIT IIIIIIIIIIIIIIIAA"AAhAAAAAAAAAAAAAA~AAAAA~A~IIIII1IIII IIII ~ . that sets the value of p. it assumes that y = 0 for X = O. . they are not displayed. Wiley. 1 . .""<""""". . Draper & H. .. The subroutine Ortho is therefore called by a macro. PURPOSE: This subroutine uses GramSchmidt orthogonalization to compute and display the orthogonal polynomials corresponding to one or more independent variables x.11 through 3. I I . Applied Regression Analysis. 1 . I " " " . > . . The macro should not be used as a standard alternative to LS because it expresses the function in terms of orthogonal polynomials in X instead of as the usual power series in x. I I I I I I I I t" I I I I. .11. Oct. . Advanced Excel for scientific data analysis 9. R. . 1 . . Rather. .. I • . e. . i. Smith..". The first column must contain the dependent variable y.'r"'. . INPUT: The input data must be arranged as follows. N. which need . Orthol or OrthoO. Since the covariance and linear correlation coefficient matrices contain only zero offdiagonal coefficients. 266267 and 275278. their standard deviations. .g. The second (and subsequent) column (s) must contain the independent .460 R. . 1 " " " " ' " 1 l' f'" I""" I I I I I I I I I I I I I I I I I I I I I' I I I I I I I I I I I I f I I 1"'". . . The latter are convenient for determining which terms are statistically significant and should be retained. and Transpose . variable (s) x. SUBROUTINES: This macro requires the subroutines Multiply. . meters. I r. . as illustrated in sections 3. e. . . The subroutine requires an input parameterp: p = 1 causes a general unweighted least squares fit to the data.l. while p = 0 forces the fit to pass through the origin. and their ratios.' R de Levie 2002 1""" r . " '" 1.'" v2. .
r. a covariance matrix is neither needed nor provided. r. A third row lists the ratio of the absolute values of the coefficient and the corresponding standard deviation. since these will be overwritten.. OUTPUT: 461 I I I I I I The macro produces its output to the right of the input. equal to either one or zero.Ch. highlight the entire input data array. 9: Macrosjor least squares &jorthe propagation ojuncertainty . but extending one more column and three more rows) contains no valuable data. It then computes the fitting parameters and their standard deviations. The function of the following two drivers is merely to set the value of one parameter. The most useful part of the output is the row listing their ratios. In order to start the process. then call either Orthol or OrthoO. in order to choose between a general least squares fitting (p = 1) or one that forces the curve through the origin (p = 0) . The standard deviation of the fit is shown directly below the label "Coeff:". It therefore repeats the y column.e. the rectangular block containing the column for y and the adjacent column (s) for the corresponding values of x. which apply to the orthogonalized independent parameters. Since these are uncorrelated. r. In general you will NOT want to use the listed coeffidents and standard deviations. Sub Orthol () Dim p As Double p 1 Call Ortho (p) End Sub for general unweighted least squares fit = . " " " " " ' " r. p. r " " " " " " . make sure that the output area (a block to the right of the input da ta block. separated by one empty column. for unweighted least squares fit through the origin I I I I I I I I I I I I I I Sub OrthoO () I Dim p As Double p 0 Call Ortho (p) End Sub = . which indicates which orders should be considered (green if ratio > 5) and which should not be (red if ratio < 1) • PROCEDURE: Before calling the macro. r " " " " " " " " r. i. r".
Columns. vbYesNo. II & Chr(13) & "one for Y. SZZ As Double u As Double. eoeffArray As Variant DataArray As Variant. Denom As Variant lecArrayAs Variant. S (0 To cMax) As Double u = 1 .". Type:=8) myRange. varY As Double bArray As Variant. iAnswer. SSR As Double. orthoArray As Variant outputArrayAs Variant. r. kAnswer .Count CMax = Selection. de Levie. Sub Ortho(p) Dim cMaxAs Integer. vOArray As Variant xArray As Variant. and one or more for X. M As Integer. en As Integer.462 R. MM As Integer Dim N As Integer. r". StDevY As Double Sy As Double. r " " " . r.Count ReDim A (0 To cMax) As Double. If area was not highlighted If rMax = 1 And cMax = 1 Then hAnswer = MsgBox("You forgot to highlight" & Chr(13) & "the block of input data.Rows. InputBox(Prompt:= "The input data are located in: ". i As Integer Dim j As Integer. stdevArrayAs Variant vArray As Variant.Select End I f GoTo Begin End I f r Check that the number of columns is at least 2: If cMax < 2 Then MsgBox "There must be at least two columns. rMax As Integer Dim Dim Dim Dim Dim Dim Dim Dim Dim Dim Dim NumAs Double. SYZ As Double. Advanced r""""""" Excel for scientific data analysis I""""" r"""""" r. . Determination of the array size: Begin: rMax = Selection. hAnswer. "Ortho") If hAnswer = vbNo Then End If hAnswer = vbYes Then Set myRange = Application. Resid As Double Root As Double. r. II & Chr(13) & "Do you want to do so now?" . yArray As Variant z As Variant Dim myRange As Range Dim Answer.
1) Next i For i = 1 To rMax If IsEmpty(DataArray(i. the first column ofxArray with zeroes (for p = 0) or ones (for p = 1). Dimension thearrays: ReDimbArray(l To cMax) ReDim Denom(l To cMax) ReDimvArray(l To cMax. Value For i = 1 To rMax yArray(i) = DataArray(i. points is sufficient to define the problem: If rMax < cMax Then MsgBox "There must be at least " & cMax & " input" Chr (13) &" data to define the problem. Check that rmax > cmax. ReDim xArray(l To rMax. Read the dataArray. "Orthogonalization" End End If 463 . so that the number of data .Ch. the rest with . 1) = CDbl(p) Next i For j = 2 To cMax . 1 To As Double 1 To cMax) 1 To cMax) As Double As Double As Double cMax) As Double . _ "Orthogonalization" End End If & . 9: Macrosfor least squares &forthe propagation ofuncertainty . j» Then MsgBox "Xvalue(s) missing". DataArray = Selection. "Orthogonalization" End End I f Next i For j = 2 To cMax For i = 1 To rMax If IsEmpty (DataArray (i. ReDim yArray(l To rMax) ReDim z(l To rMax.". "Orthogonalization" End End If Next i Next j 1» . Then MsgBox "Yvalue (s) missing". the data in thexcolumn(s) Fill For i = 1 To rMax xArray(i. then fill yArray and xArray.
"Overwrite?") If Answer = vbNo Then End End I f End If H Check against overwriting valuable spreadsheet data . 1 To en) As Double . j» MM=MM Then Else MM = MM + 1 End If . Value MM = 0 For i = rMax . Value outputArray = Selection. Advanced Excel j) for scientific data analysis For i = 1 To rMax xArray(i.Offset(O. de Levie. the 2nd if p = a ReDim z (1 To rMax. Dimension and fill Z. vbYesNo. j) Next i Next j cn = 1 = DataArray(i. Select orthoArray = Selection. . Value M = 0 For i = 1 To rMax For j = 1 To cMax If IsEmpty(outputArray(i. xArray if p = 1. j» M = M Then Else M = M + 1 End If Next j Next i If M > 0 Then Answer = MsgBox ("There are data in the highlighted & "output block to the" & Chr (13) & "right of the input data array. already transformed. initially Z is the 1st column of .464 R. the matrix of column vectors . by the output of coefficients and standard deviations Selection. data by the output of orthogonal polynomials Selection. 0) . Can they be H & "overwritten? ". cMax + 1) .Offset(4. Check against overwriting valuable spreadsheet . Select outputArray = Selection.1 To rMax For j = 1 To cMax If ISEmpty(outputArray(i.
en. rMax. 1 To rMax) As Double ZtpZ (1 To en. ZtpZi) Multiply(Z1. 1. 1 To 1) As Double I = Z1 ZtpZi 1 To 1) As Double . en. Z2) . Zi. 1 To en) As Double Zl (1 To en. = Z Z2 = = = = Z' Z' Z (Z' Z)" Z' Zi (Z' Z)" Z' ReDim Z3 (1 To rMax. en. 1) Applieation. 1) Next i DataArray(i. 1) xArray (i. rMax. Z1) Multiply(Ztp. 1. rMax. 0) . "Overwrite?") If Answer = vbNo Then End End I f End If 465 Seleetion.SereenUpdating Do False For i = 1 To rMax For j = 1 To en z(i. 1 To en) As Double ZtpZi(1 To en. Compute Zl Call Call Call Call Call = (Z' Z) " Transpose(z. " & "Can they be overwritten? ". Z (Z' Z)" Z' Zi . ZtpZ) Invert(ZtpZ. en.Ch. en. 1 To 1) As Double Z2(1 To en. en + 1) Next i Dimension the other vectors and matrices ReDim ReDim ReDim ReDim ReDim Zi Ztp(l To en. Select For i = 1 To rMax orthoArray(i. Ztp) Multiply(Ztp. z. en. en) = xArray(i. j) Next j Next i . en. the next column vector to be 1 To 1) As Double For i 1 To rMax Zi (i. transformed ReDim Zi (1 To rMax. 9: Macrosjor least squares &forthe propagation ojuncertainty Next j Next i If MM > 0 Then Answer = MsgBox ("There are data in the two bottom " _ & "lines of the" &Chr(13) & "highlighted array. ZtpZi.Offset(4. vbYesNo. Dimension and fill Zi.
j) * xArray(i. j) Next i bArray(j) = Num / Denom(j) Next j . Display the orthogonal array Selection. Z3) . 1. Compute the coefficients For j Num = = 2 . Dimension and compute Ztf = Zi . Advanced Excel for scientific data analysis Call Multiply(z. j) = orthoArray(i. j) Next i Next j . j) * yArray(i) Denom(j) = Denom(j) + xArray(i. Value = orthoArray Update yArray and xArray. 1) = Zi(i. 1) = CObl(p) Next i For j = 2 To cMax For i = 1 To rMax xArray(i. 1) Next i SYZ 0 SZZ 0 For i 1 To rMax SYZ SYZ + yArray(i) SZZ SZZ + ztf(i. en.Z (Z' Z)" For i = 1 To rMax Ztf(i. 1 To en) As Double Loop Until en = cMax . the transformed .Z3(i. Z2. = 1 To rMax yArray(i) = orthoArray(i. 1) . vector. en + 1) Next i (Z' Zi) * Ztf(i.466 R. 1) orthoArray(i. Ztf(i. rMax. de Levie. orthogonal to the vector (s) in Z ReOim Ztf (1 To rMax. * Ztf (i. 1) 1) 1) I f en < cMax Then en = en + 1 ReDim Preserve z(1 To rMax. 1) Next i For i For i = 1 To rMax xArray(i.Z3. 1 To 1) As Double Zi .P To cMax 0 Denom(j) = 0 For i = 1 To rMax Num = Num + xArray(i.
1) . j) = varY / Denom(j) Next j ActiveCell. 0) . Arrange for the data output For j = 1 To cMax With ActiveCell. Select = .Font. Select Next j ActiveCell.Offset(O.Font.p) .Font. Select With ActiveCell · Font.bArrayU) * xArray(i.Ch.Offset(rMax.Offset(O. 9: Macrosfor least squares &forthe propagation ofuncertainty .Offset(O.Font . cMax) . cMax . j) Next j SSR = SSR + Resid * Resid Next i varY = SSR / (rMax .p + 1) StDevY = Sqr(Abs(varY)) Compute the variances of the coefficients For j = 2 .Offset(2. Select Next j ActiveCell.cMax . Select .Bold = True .Font. 1) . Select For j = 1 To cMax ActiveCell. Colorlndex = 1 End With ActiveCell. Compute the variance of y 467 SSR = 0 For i = 1 To rMax Resid = yArray (i) For j = 2 .Offset(1. 1) .Offset(1. Italic = True . Select Next j ActiveCell. 1) . Bold = True · Font. Select For j = 1 To cMax ActiveCell.ltalic = True ActiveCell.ColorIndex = 1 ActiveCell.Colorlndex = 1 ActiveCell. Italic = True · Font.P To cMax Resid = Resid .ltalic = True ActiveCell.P To cMax vArray(j. Colorlndex 1 · HorizontalAlignment = xlRight · Value = "Coeff:" End With ActiveCell.Offset(O. cMax) .
1) .Italic = True ActiveCell.Font. Select With ActiveCell · Font. Advanced Excel for scientific data analysis For j = 2 .Font.Font.Font.Bold = False ActiveCell.Value = bArray(j) ActiveCell.Offset(O.Offset(O. Select Next j ActiveCell. j) <= 0 Then ActiveCell. Select ActiveCell. j») ActiveCell. Select For j = 2 .Font. HorizontalAli9nment = xlRight · Value = "Ratio:" End With ActiveCell.Value = "N/A" ActiveCell.ColorIndex 1 End I f If vArray(j.Offset(O.Offset(1. Italic = True · Font.ColorIndex = 1 For j = 2 . Color Index = 1 . j) < 1E40 Then ActiveCell.Value < 1 Then ActiveCell.Value End If ActiveCell.Bold = True .ColorIndex = 16 ActiveCell. 1) . cMax . j) > 0 Then ActiveCell.ColorIndex 3 End I f . de Levie.Value > 5 Then ActiveCell.Font. j») If ActiveCell.P To cMax If vArray(j.Font · Bold = False · Italic = True · ColorIndex = 1 End With ActiveCell. cMax .p) . Italic = True .Value = "StDev:" ActiveCell.Select Next j ActiveCell. 1) .468 R. Bold = False · Font.Font.Offset(1.Font .p) . Select With ActiveCell.P To cMax With ActiveCe1l.Value = Abs(bArray(j) / Sqr(vArray(j.ColorIndex = 10 If ActiveCell.Offset(O.P To cMax If vArray(j.HorizontalAli9nment xlRight ActiveCell. Color Index = 1 End With ActiveCell.Value "<1E20" Else Sqr(Abs(vArray(j. 1) .
. using Ftests to mediate between the conflicting requirements of maximal noise rejection and minimal signal distortion. . . Soc. . t. The user specifies the length of the moving polynomial and. Barak. .Font · Bold = False · Italic = True · ColorIndex = 1 End With Selection. in ELSfixed. . . 9: Macrosfor least squares &forthe propagation ofuncertainty ActiveCell. " '" " I. Chem.. They will work best when the length ofthe moving polynomial is smaller than the width ofthe smallest significant features in the signal. . In ELSauto the polynomial order is selfoptimized by the macro.Ch. . This program for a leastsquares f i t to EQUIDISTANT data . . 1 1 . . London Math. r.Select Next j ActiveCell. Select With ActiveCell. . . . . • ... IIIIIIIIIIII~ r . .Offset(l. r T""""""'" """ " " " . x wi th a moving polynomial uses the approach pioneered by Gram. . . " I' . I . " " " " " " ' " """'" "" I I I I I I I I 1 I I I "'" 1 . 1924J and by Savitzky & Golay [Anal. 1 1 1 1 .7 ELS The equidistant least squares macros ELS are based on Gram polynomials. or (if so desired) its first or second derivative. pWbarak@facstaff. .. .. set. 1 . Provide the standard deviation of the f i t 469 ActiveCell. 1 . .wisc. • . Value StDevY End Sub 9. its order.. (2) 13 (1914) 81J and Sherriff [Proc. t . further developed by Sheppard [Proe. Barak. . Select . Royal Soc. 1) .. . I t computes smoothed values of the data . and subsequently advocated byWhittaker & Robinson [The calculus of observations. . Edinburgh 40 (1920) 112J. . y. cMax) . AnaL Chern. .Offset(O. r. "TT") Oct 1 2002 . PURPOSE: .Offset(O. .edu . '©P...p) . . 67 (1995) 2758. Both macros are based on the work ofP. . Blackie & Son. 1 1 1 . . I . and use a sliding polynomial approach for a piecewise fit to arbitrary functions. . " " " " " " " '" I"""" r"""" tIl" l' I I"" l' I " " . . 36 (1964) 1627J. . r" EQUIDISTANT LEAST SQUARES ""T 1 r. .
If the latter is not desired. for ELSfixed. PL. The selected (maximum) order of the polynomial. INPUT The user must select the moving polynomial length. simply comment out the section following the comment line "Second test for optimum Order". Advanced Excel for scientific data analysis 1 There are two options: in ELSfixed () the user also selects the order of the polynomial. ELSfixed and ELSauto. The latter uses the functions GenFact() and Smooth () . OUTPUT The output is written in one or two columns to the right of the input data. or can be overwritten. Some of the abbreviations and indices used: DerOrder: the derivative order. which calculates the Gram polynomials and the corresponding convolution weights. NPts. The second column. Moreover. Make sure that the output space is free. Moreover. 1 The program compares the ratio of the variances for a given order and that for the nextlower order with the corresponding F test as its first criterion. a single. Chem. must be a positive integer. MaxOrder<PL. and the more the underlying signal is distorted. This choice involves a compromise: the longer the polynomial. MaxOrder>O. The first output column contains the smoothed ordifferentiated data. A useful rule of thumb is to make the length such that it encompasses no more than the halfwidth of the most narrow feature in the signal that should be preserved. ELS(iOrder). cannot exceed the length of the data set. 67 (1995) 2758. It also calls the subroutine ConvolutionFactors. the more noise is removed. Since symmetrical functions often contain mostly even powers. whereas in ELSauto () the program optimizes the order of the polynomial (between 1 and an upper limit set by the user) each time the moving polynomial slides one data point along the data set. selected by the 1 1 1 1 1 1 1 1 1 1 1 1 1 1 . which appears only with ELSauto (). the length of the moving polynomial. FUNCTIONS & SUBROUTINES There are two drivers. MaxOrder.470 R. that call the main subroutine. while for ELSauto we have MaxOrder«PLl) . de Levie. final comparison is made between the variances of the nexthigher and the nextlower order. displays the order selected by the program. the corresponding value is restricted to odd integers between 3 and 31. using an algorithm described by Barak in Anal.
MaxOrder: the maximum order of the polynomial (from ELS InputBox3: Polynomial Order) or its maximum value (from ELS InputBox 3: Maximum Polynomial Order). r". and < PL forELSfixed. In ELSfixed. s ranges from 0 to DerivOrder.. t: index for the individual points in the moving polynomial. and has a maximum value of Max Order.' t r " " . selected in ELS InputBox 2: Length of Moving Polynomial. s: index for the order of the derivative. t. Order starts at 1. calculated as m = (PLl)/2. m: number of points on each side of the central point in the moving polynomial. . tries: number of attempts to enter data in input box Y () and YDa ta (): the input data containing the entire data set. MaxOrder must be an integer. 9: Macrosfar least squares &jorthepropagatian afuncertainty answer to ELS InputBox 4: Derivative Order: DerOrder 0 for smoothing. GP: the Gram Polynomial j: index for the order of the polynomial. Sub ELSfixed 0 Selects a fixed polynomial order . t ranges from m to +m. OptOrder: array of optimized Order values. PL: length (in number of points) of the movingpolynomial. by setting iOrder equal to 1 Dim iOrder As Integer iOrder = 1 Call ELS (iOrder) End Sub """""""""""""""""""""" i " " . i"""" . t"""" 1 ' " " " " " " " " " " " " " " " . computed by the macro from the data range provided in ELS InputBox 1: Input data. > 0. obtained from the Excel function FInv. DerOrder = 1 for the 1st derivative. Order = MaxOrder. OutputData: the final (smoothed or derivative) result OutputOrder: the values of OptOrder used in ELSauto. 471 k: index for the position of the center of the moving polynomial in the data set.Ch. or < (PLl) forELSauto. r. in ELSauto. NPts: number of points in the data set. Order: working value of the polynomial order. k ranges fromm+l to NPtsm. DerOrder = 2 for the 2nd derivative. FValueTable: the Fisher criterion.
472 R. . . " " " " " " " " " ' " r " " " " " " " " " " . BB As Double. . Tries2 As Integer Tries3 As Integer.g. SumSquares () As Double Dim SumX2 () As Double. Length As Double Perc As Double. M As Integer MaxOrder As Integer. Dim iOrder As Integer iOrder = 1 Call ELS (iOrder) End Sub .. WO As Double. Dim Dim Dim Dim AAAs Double. q As Integer S As Integer. Note that the highorder polynomials may be undesirable for use with. SumY As Double. Deltax As Double FTestl As Double. sumAs Double. de Levie.MaxOrder) by setting iOrder equal to /." r .Select If Selection.Count <> 1 Then . Type:=8) myRange. N As Integer. TestContents As Variant Dim YData As Variant Dim myRange As Range Dim Ans As String Dim z Preliminaries and data input Set myRange = Application. FTest2 As Double. NPts As Integer Order As Integer. DerOrder As Integer i As Integer. ii As Integer. SumSqAs Double SumXY As Double.Columns. Y() As Double Dim OptOrder As Variant. differentiationor interpolation. . SumY2 As Double Dim GP () As Double. t " " " " " Sub ELS(iOrder) Dim Dim Dim Dim Dim Dim Dim AAs Integer. j As Integer j j As Integer. T As Integer. OutputData As Variant Dim OutputOrder As Variant. Advanced Excel for scientific data analysis Sub ELSauto () . B As Integer. Tries4 As Integer Percentage As Long Dim GenFact As Double. PL As Integer.InputBox(Prompt:="The input data are" & "located in column:". Title:="ELS InputBox 1: Input data". k As Integer. Selects a variable polynomial order (between 1 and a userselectable maximum value. e.
Offset(O. Value For i = 1 To NPts z = TestContents (i.Ch. then try again. 1) If IsEmpty(z) Then N = N Else N = N + 1 End I f Next i .Offset(O.Rows. Proceed anyway and overwrite " & "those data?". vbYesNo.Offset(O. 1) . Select End I f If iOrder = 1 Then Selection. Select TestContents = Selection. ") . 9: Macros/or least squares &/orthepropagatian a/uncertainty 473 MsgBox "Only enter a single column of input data. Value For i = 1 To NPts z = TestContents(i. 1) If IsEmpty(z) Then N = N Else N = N + 1 End I f Next i If iOrder = 1 Then If N > 0 Then Ans = MsgBox (" There are data in the " & "column where the output" & Chr (13) & "will be written. Select TestContents = Selection.Value 'defines the array size OutputData = myRange.Value Test and prepare the defaul t output range N = 0 Selection." End End If NPts = Selection. 1) . 1) .Count If NPts = 0 Then End YOata = myRange. "Equidistant Least Squares Fit" End End I f End I f Selection.Value 'defines the array size OptOrder = myRange.Value 'defines the array size OUtputOrder = myRange. "Equidistant Least Squares Fit") If Ans = vbNo Then MsgBox (" Safeguard the data in the highlighted " & "area by" & Chr (13) & "moving them to another" & "place.
Length <> 0 Or Clnt( (PL . Make sure that PL is an odd integer 1 larger than 0 and smaller than NPts If (Length <= 0 Or Length >= NPts Or PL . MaxOrder Tries3 = 0 Line3: If iOrder = 1 Then MaxOrder = InputBox(Prompt:"'''The order of the moving" & "polynomial is:". "Equidistant Least Squares Fit" End End I f End I f Selection. "Equidistant Least Squares Fit" Tries2 = Tries2 + 1 If Tries2 = 2 Then End GoTo Line2 End I f Select the order of the moving polynomial.1) / 2) «Length . "Equidistant" & "Least Squares Fit") If Ans = vbNo Then MsgBox ("Safeguard the data in the highlighted " & "area by" & Chr(13) & "moving them to another" & "place. 2) . then try again.Offset(O. Select End I f . Ti tIe: ="ELS InputBox 3: Polynomial Order") End I f .474 R. Select the length of the moving polynomial. Advanced Excel for scientific data analysis If N > 0 Then Ans = MsgBox("There are data in the TWO columns" & "where the output" & Chr (13) & "will be " & "written. PL Tries2 0 Line2: Length InputBox(Prompt:="The length of the moving" & "polynomial is:". vbYesNo. Title:="ELS InputBox 3: Polynomial Order") Else MaxOrder = InputBox(Prompt:". ") .1) / 2) <> 0) Then MsgBox "The length of the moving polynomial must" & Chr (13) & " be an odd integer larger than zero " & "and" & Chr (13) & "smaller than the length of" & "the input column. Title:="ELS InputBox 2: Polynomial Length") PL = Clnt(Length) .". de Levie. Proceed anyway and overwrite" & Chr (13) & "those data?"."The maximum order of the " & "moving polynomial is:".
Make sure that DerOrder has the value 0. Tries4 = first derivative. "Equidistant Least Squares Fit" Tries3 = Tries3 + 1 If Tries3 = 2 Then End GoTo Line3 End I f End I f Select smoothing." . Make sure that MaxOrder > 0 and that either MaxOrder < PL (for ELSfixed) or MaxOrder < PL . select 0. or second derivative 0 Line4: DerOrder InputBox(Prompt:="Select the order of the " & "derivative" & Chr (13) &" (either 1 or 2).1) Then MsgBox "The maximum order of the moving polynomial" & Chr (13) & "must be larger than zero." & Chr(13) & Chr(13) & "The order of the derivative is:". If iOrder = 1 Then If (MaxOrder <= 0 Or MaxOrder >= PL) Then MsgBox "The order of the moving polynomial" & Chr(13) & "must be larger than zero.". and smaller" & "than" & Chr(13) & "the length of the moving" & "polynomial minus 1.1 (for ELSauto) .". 1 (for the first" & Chr (13) & "derivative). "Equidistant Least Squares Fit" Tries4 = Tries4 + 1 If Tries4 = 2 Then End GoTo Line4 End I f . Title:="ELS InputBox4:" & "Derivative Order") . 1.Ch. for" & "smoothing. or 2 (for the second" & "derivative) . 9: Macrosfar least squares &forthepropagatian afuncertainty 475 . or 2 If DerOrder = 0 Then GoTo Line6 E1self DerOrder 1 Then GoTo Line5 Elself DerOrder 2 Then GoTo Line5 Else MsgBox " The order of the moving polynomial must be" & Chr (13) & "either 0 (for smoothing). "Equidistant Least Squares Fit" Tries3 = Tries3 + 1 If Tries3 = 2 Then End GoTo Line3 End I f Else If (MaxOrder <= 0 Or MaxOrder >= PL . and smaller" & Chr (13) & "than the length of the moving polynomial.
PL) ReDim FValueTable(l To MaxOrder. jj) = M + 1 To NPts . j. jj) = Application. Next ii Loop Until j j = PL For k ii.SumY "2 / (2 * T + 1) . Calculate SumSquares for Order = 0 Order = 0 SumY = 0 Sumy2 = 0 For T = M To M SumY = SumY + Y(k + T) SumY2 = SumY2 + Y (k + T) "2 Next T SumSquares (0) = SumY2 . W. DerOrder. 'l'i tIe: ="ELS InputBox 5: X Increment") Line6: (PL . MaxOrder. 1) = MaxOrder Next i 1) Call ConvolutionFactors(PL.1) M = / 2 ReDim Y(l To NPts). 1) OptOrder(i.05. For i = 1 To NPts Y(i) = YData(i. "2 Calculate FVal ueTable (MaxOrder. 0) Next i SumX2(j) = sum Next j .M ReDim SumSquares(O To MaxOrder) . GP. j j = 1 To PL) 0 Do jj jj + 1 For ii = 1 To MaxOrder FValueTable(ii.476 R. de Levie.Flnv(0. OptOrder(l To NPts. Y) THE FOLLOWING SECTION IS USED ONLY BY ELSauto If iOrder = 1 Then ReDim SumX2(1 To MaxOrder) For j = 1 To MaxOrder sum = 0 For i = M To M sum = sum + GP (i. Advanced Excel for scientific data analysis LineS: Deltax = InputBox(Prompt:="The data spacing in x is:".
1) .Order) / (SumSquares (Order» If (FTest2 / FValueTable(2. W. Order.1 End I f SumXY = SumXY + linelO: OptOrder(k.Order .SumSquares(Order» (PL . Order. PL .Ch. Order. Y) _ .Order . T. 0) Next T SumSquares(Order) = SumSquares(Order ..2) . 2 .1 .1» < 1 Then Order = Order . k.1) .1) . PL . 9: Macrosjar least squares &jorthepropagatian ajuncertainty .SumXY A 2 / SumX2 (Order) FTest2 = (SumSquares(Order. First test for optimum Order FTestl = (SumSquares(Order.Order) / (SumSquares (Order» Loop Until (FTestl / FValueTable(1.1» < 1 * Second test for optimum Order If Order Order SumXY = < MaxOrder Then Order + 1 0 For T = M To M Y(k + T) * GP(T.SumXY A 2 / SumX2 (Order) . 0.Y(k + T» Next T SumSquares(1) = SumSq Test whether onehigher order satisfies the criterion Do Order = Order + 1 If Order > MaxOrder Then GoTo linelO Calculate SumSquares for Order > 1 SurnXY = 0 For T = M To M SumXY = SumXY + Y(k + T) * GP(T. Calculate SumSquares for Order Order = 1 SumSq = 0 477 1 For T = M To M SumSq = SumSq + (Smooth (PL. 1) Order .SumSquares(Order» * (PL . 0) Next T SumSquares (Order) = sumSquares (Order ..
M Then For T = 1 To M OutputData(k+T. Prepare the output files & Perc = 100 * Percentage = Percentage & H% done. DerOrder. Advanced (k / NPts) Excel for scientific data analysis Int(Perc) Application. k. k. Offset (0. Offset (0. 1) = Smooth (PL. 1) OptOrder(k. Next T End If Next k Write the output files DerOrder. " For k = M + 1 To NPts . Y) = Smooth(PL. Select Selection. Select End I f Application. OptOrder(k. 1) = OptOrder(k. Next T End If DerOrder. 1) If k = NPts . W. W. OptOrder(k + T. 1) Selection. Offset (0 . OptOrder(k + T. 0. 1) = Smooth (PL.478 R. 1) T. Value = OutputData If iOrder = 1 Then Selection. k. 1) OptOrder(k. Value = OutputOrder Selection. T. 1) . Y) OutputOrder(k. 1). 1) OutputData(k.StatusBar End Sub False .M If k = M + 1 Then For T = M To 1 OutputData (k + T. / (Deltax" DerOrder) 1). 1). Select Selection. de Levie. / = (Del tax " DerOrder) OutputOrder(k + T. 1) .StatusBar = "Calculation " Next k End If THIS ENDS THE PART USED ONLY BY ELSauto . 1) . Y) / = (Del tax " DerOrder) OutputOrder(k + T. W.
BB As Double M = (PL . 9: Macrosfor least squares &forthepropagation ofuncertainty """ rill. ." r r" r I I". 1 To MaxOrder. PL: Polynomial Length W: Weight Dim i As Integer.. B) computes the generalized factorial Dim gf As Double. f"" I r.1) / 2 ReDim GP(M To M. I' r"". M As Integer. M As Integer Dim S As Integer.". Next i Smooth = sum End Function j . T. DerOrder. T.Ch. s = 1 to DerOrder) . s 1 toDerOrder). W.' I I r r" Function GenFact(A. j As Integer gf = 1 For j = (A . t = m to m. j. Y) computes the appropriately weighted sum of the Yvalues Dim i As Integer. and ofW(i = m to m. 1 To DerOrder) . T As Integer Dim AA As Double. GP.. (PL ."" 479 r. k As Integer. r " " . Calculates tables ofGP(i =m tom. Abbreviations used: DerOrder: Derivative Order GP: Gram Polynomial MaxOrder: Maximum Order. . k. S) * Y(k + i) Sub ConvolutionFactors(PL. W. Y) .B + 1) gf = gf * j Next j GenFact = gf End Function To A """"""""""""""""""""""" Public Function Smooth(PL..1) / 2 sum As Double M = sum 0 For i M To M sum sum + W(i. k = 1 to MaxOrder. . k = 1 to MaxOrder. S." t". MaxOrder.
=0 0) 0) Then GP (i. 0) 0) Else GP(i. S) I.k + 1» / (k * (2 * M . GP(i. k. 0) Next i For i = M To 1 * * If k Mod 2 GP (i.2. 0) 0 Next i For i = M To 1 GP(i. k. 0) i / M Next i For i = 1 To M GP(i.1.1) / (k (2 M . I. S .1» . End If Next i Next k 'Evaluate the Gram polynomials for DerOrder>O If DerOrder > 0 Then For S = 1 To DerOrder For i = M To M GP(i. 0) = i / M Next i For k = 2 To MaxOrder AA = 2* (2 * k .480 R. GP(i. k. k .1) * (2 * M + k» / (k * (2 * M . For k AA M To M. I. k) / . S) S * GP (i.BB * GP (i. I. k.2. de Levie. k . 1 To De rOrder) = = 0 To MaxOrder (2 * k + 1) * GenFact(2 * M. k .1) For i = M To GP (i . Next i 0. 0 ) AA * i * GP (i . 0 ) BB * GP (i.k + 1» BB = «k . 0.k + 1) ) For i = 0 To M GP (i. 0) 1 GP(i. S) Next k Next S End If 'Calculate the convolution weights ReDim W(M To M. k.1) / (k * (2 * M + k» A A = 2 * (2 * BB = «k .k + 1» + M  AA * (i * GP (i. k. S) = 0 S) = 0 For k = 1 To MaxOrder k .I. I. k . 1 To MaxOrder. Advanced Excel for scientific data analysis 'Evaluate the Gram polynomials for DerOrder=Q For i = M To M GP(i. k Next i * (2 * M .
. . . .t I t . . S) Next T Next i Next S Next k End Sub 9. t . . .""'" T '. .1) where the matrix W contains the weights. de Levie Oct 1 2002 . I . "ll"~AA~~~~AAAAAA~~A~AA~~AAA~A~~~"t. 2nd ed. in which case they are all assumed to be unity.. I . . t . ..2) Beyond the involvement of the weight matrix W (which reduces to the unit matrix for an unweighted least squares). . 1 . . . while the covariance matrix V is calculated as (9. .. y2 I I I I The function of the following two drivers is merely to set the value of one parameter. in order to choose between a general least squares fitting (p = 1) or one that forces the curve through the origin (p = 0) . 9: Macros/or least squares &forthe propagation o/uncertainty GenFact (2 * M + k + I. f . 1 1 I I I . "". k For S = 0 To DerOrder For i = M To M For T = M To M W(i. t •• . . . . . 1 I . R. k . 1 1 I . . and the other parameters are the same as in section 9.8 WLS The macro for weighted least squares is essentially a generalization of the regular linear least squares algorithm LS. T. 1 .. I .A. . Wiley. WLS can be used instead ofLS by leaving the weights unspecified (while preserving their space). 1 .2. New York 1981 pp. . WEIGHTED LEAST SQUARES . k. .I. 1 "'" " " """ " . 1 . . 1 1 I . "" . S) 0) * GP(T. ~ 11'" . . . T. equal to either one or zero. . 1 • . the macro is similar to LS... 1 . 481 + 1) W(i. . . r f 1 I 1 1 "'" "" """ tl" . .." " " "'11' ""'" . 1 . t 1 . . Draper & H. . " I"". 1 . In short. f' _ . 1 .Ch. . The macro implements the mathematical description given in N. . for least squares fit through the origin Sub WLSO() I .. . 1 . Applied Regression Analysis. . 1 . 1 . . I . k. .. I .8. " . the sought coefficients are found in the vector b where b = (XTWXrlXTWy (9.. 1 . p. . . " 1 . 1 . . . In fact. . .8. I . 1 I . 1 t 1 t I . . . k. " t t . . Smith.'"": \ _'R . T. S) + AA * GP(i.. 1 . 108111.
r'. the corresponding array of linear correlation coefficients.. while p = 0 forces the fit to pass through the origin. as in that case the . It . except when space for them is Invert.482 p R. SUBROUTINES: This macro requires the subroutines Multiply. 1 ' " " " " " " " " " " " " " . for all individual weights. weights blank. or enter the same number (say. Some rows are labeled. and the standard deviation of the fit. The first column must contain the dependent va. . . or 13) . If an unweighted least squares is desired. it assumes that y = 0 for x = o. . and . PURPOSE: The macros WLSI and WLSO compute the parameters and their standard deviations for a weighted least squares fit to data in 2 or more columns. . macro will misinterpret the input. deviations. " I' t"" I tIll I'" I r. Advanced Excel for scientific data analysis Dim p As Double = 0 Call WeightedLeastSquares(p) End Sub . rill' t"". de Levie. " " " . follows. i. Sub WLSI () for general least squares fit Dim p As Double p = 1 Call WeightedLeastSquares(p) End Sub It"" t " " " " " " " . The second column should contain the weights w . Transpose . leave the .. optionally. Do NOT delete the second column. their standard . The third (and any subsequent) . . . . . of the individual points."'" Sub WeightedLeastSquares(p) . . riable y. II!. INPUT: The input data must be organized in columns. . arranged as . 1. The macros set the input parameter p for the subroutine WeightedLeastSquares: p = 1 causes a general weighted least squares fit to the data. . " " " .e. . ? 1'" r. OUTPUT: The macro provides the coefficients. . column (s) must contain the independent variable (s) x. also shows the covariance matrix and.
.
"Weighted Least Squares") If hAnswer = vbNo Then End If hAnswer = vbYes Then Set myRange = Application. 1 To 1) As Double wArray(l To rMax. vbYesNo. 1 To rMax) As Double ytwArray(l To 1. so that the number of data . If area was not highlighted If rMax = 1 And cMax = 1 Then hAnswer = MsgBox("You forgot to highlight" & Chr (13) & "the block of input data. 1 To rMax) As Double XArray(1 To rMax.Columns. one for W." & Chr(13) & "Do you want to do so now?" . 1 To ccMax) As Double qArray(1 To ccMax. Advanced Excel for scientific data analysis Begin: rMax = Selection. 1 To 1) As Double xtArray(l To ccMax.484 R.". "Weighted Least Squares Fit" End End I f . 1 To ccMax) As Double piArray(1 To ccMax. de Levie. 1 To 1) As Double & .Rows. and one or more for X. 1 To ccMax) As Double ytArray(1 To 1. "Weighted Least Squares Fit" End End I f Check that rmax > cmax.Count ccMax = cMax . one" & Chr(13) & "for Y. 1 To rMax) As Double xtwArray(l To ccMax. points is sufficient to define the problem: If rMax < cMax Then MsgBox "There must be at least " & cMax & " input" Chr (13) & " data to define the problem. Dimension the arrays: ReDim ReDim ReDim ReDim ReDim ReDim ReDim ReDim ReDim ReDim ReDim YArray(1 To rMax.". 1 To rMax) As Double pArray(1 To ccMax. 1 To rMax) As Double ytwyArray(l To 1.Count cMax = Selection.1 u = 1 .Select End I f =_ GoTo Begin End I f Check that the number of columns is at least 3: If cMax < 3 Then MsgBox "There must be at least three columns. InputBox (Prompt: "The input data are located in:". Type:=8) myRange.
main diagonal with the listed normalized weights. 1 To 1) As Double ReDim btArray(1 To 1. normalized weights as its diagonal elements. 1) = DataArray(i. 2) Next i For i = 1 To rMax For j = 1 To rMax wArray(i. j) = 0 Next j . 1) Next i For i = 1 To rMax If IsEmpty(DataArray(i. 1 To ccMax) As Double ReDimvOArray(l To ccMax + p . wAr ray . with the normalization factorsumW / rMax sumW = 0 u For i = 1 To rMax sumW = sumW + DataArray(i. 485 Read the dataArray. 2» Then DataArray(i.Ch. then fill the various input arrays: yArray. 1 To rMax) As Double ReDim vArray(l To ccMax.1) As Double ReDim lccArray(1 To ccMax + p . . . 1 To rMax) As Double ReDim M2 (1 To rMax. 2) Next i For j = 3 To cMax For i = 1 To rMax If IsEmpty (DataArray (i. 1 To 1) As Double ReDimMl (1 To rMax. The wArray contains zeros except that it has the individual. _ "Weighted Least Squares" End End If Next i Next j . 1» Then MsgBox "Yvalue(s) missing". . andxArray. j» Then MsgBox "Xvalue(s) missing".1. 1 To ccMax) As Double ReDimbtqArray(l To 1. DataArray = Selection.1) As Double . then fill its . First zero the entire wArray. 9: Macrosfor least squares &forthe propagation ofuncertainty ReDimbArray(1 To caMax.1. _ "Weighted Least Squares" End End If Next i For i = 1 To rMax If IsEmpty(DataArray(i. 1 To ccMax + p . Value For i = 1 To rMax YArray(i. 1 To ccMax + p . .
qArray) 1. 1) X' W Y qArray ( ccmax. ccmax) X xArray ( ccmax. XArray. and rr or i indicate inversion The various arrays and their dimensions (rows. Multiply(piArray. Value For j = 1 To cMax If IsEmpty(outputArray(rMax. ccMax. Advanced Excel for 2) scientific data analysis wArray(i. Multiply(xtwArray. YArray. ccmax) =pArray ( ccmax. transposi tion. are: ( rmax. rMax.Offset(2. 1) = P Next i For j = 3 To cMax For i = 1 To rMax XArray (i. Check against overwriting spreadsheet data Selection. (j . Invert (pArray. ccMax. Value Selection. rMax. WX ( ccmax. Select M = 0 If (p = 0 And cMax = 3) Then For i = 1 To 3 Selection. rmax) X' = xtArray ( ccmax. ccMax. the rest with the data in the . i) Next i r = DataArray(i. qArray. rmax) X' W = xtwArray X. ccMax. piArray) Multiply(xtwArray. ccMax. ccMax. rMax. j» M = M Then Else M = M + 1 . j) compute b = (X' W X) rr X' W Y. . rMax. 0) . 0) . xtArray) Multiply(xtArray. 1) b bArray columns) Call Call Call Call Call Call Transpose(XArray. bArray) . 1) Y = yArray ( rmax. Select outputArray = Selection. wArray.1» Next i Next j DataArray (i. ccmax) (X' WX)" piArray ( ccmax. where ' or t denote . * rMax / sumW 0) Fill the first column of xArray with zeroes (for p = or ones (for p = 1).486 R. de Levie.Offset(2. xtwArray) ccMax. 0) . Select outputArray = Selection. rmax) W = wArray ( rmax. pArray) 1. xcolumn (s) For i = 1 To rMax XArray(i.Offset(1. rMax. ccMax.
" & "Can they be overwritten?". Select If M > 0 Then Answer MsgBox ("There are data in the " & "three lines below the" & Chr (13) & "input data array.Offset(l. "Overwrite?") If Answer = vbNo Then End End I f . btqArray) . ytArray) Transpose(bArray.Offset(3 . as varY = SSR/ (rmaxccmax).. The additional arrays and their dimensions (rows. 1) . "Overwrite?") If Answer Else = vbNo Then End For i = 1 To 3 + P + cMax Selection.cMax. ytwyArray) Multiply(btArray.p . ccmax) b' b' X' W Y = btqArray 1. Value For j = 1 To cMax Then 1f lsEmpty(outputArray(rMax.Ch. 9: Macrosfor least squares &forthe propagation ofuncertainty End I f Next j Next i 487 Selection. rMax. . the covariance matrix. wArray. 1. SSR = ytwyArray(l. of which we here only use the diagonal elements. 1. and vArray. the variances. vbYesNo. YArray. Y' y' W rmax) = ytwArray 1. 1) btArray 1. Select If M > 0 Then Answer = MsgBox ("There are data in the " & 1 + P + cMax & " lines below the" & Chr (13) & "input data array. . 0) . ccMax. ytwArray) Multiply(ytwArray. 1) Call Call Call Call Call Transpose(YArray.p + 1) StDevY = Sqr(varY) For i = 1 To ccMax . Y' W Y =ytwyArray 1. 1. 1. columns) are: rmax) ytArray 1. vbYesNo.e. " & "Can they be overwritten?". j» M = M Else M = M + 1 End I f Next j Next i Selection.Offset(3. rMax. 1. . 0) . 1. rMax. 1. i. the variance of y. Select outputArray = Selection. Calculate SSR= Y'WY b'X'WY. and then varY. ccMax. btArray) Multiply(ytArray. qArray. as V = (X' WX)" times varY. 0) . rMax. 1) varY = SSR / (rMax .ccMax .btqArray(l.
cMax) . 1) .P To ccMax ActiveCell.cMax) .Offset(O.ltalic = True ActiveCell. Select For j = 1 To cMax ActiveCell. Value = "StDev:" ActiveCe1l. 1) . 1) . cMax) . j) = varY Next j Next i * piArray(i.Font. Select Acti veCell. j» ActiveCell. 1) .cMax) . Select ActiveCell. de Levie.Value = StDevY ActiveCell.ltalic = True ActiveCell.Bold False ActiveCell. Select For j = 2 .Font. Select Next j ActiveCell.P To ccMax If vArray(j.488 R.Value End I f ActiveCell.Select Next j ActiveCell.Offset(l. 1) . 1) ActiveCell. j) Application. Select Next j ActiveCell.P . Select For j = 2 .ScreenUpdating = False ActiveCell.Offset(O. Select For j = 1 To dMax ActiveCell.Offset(l.Value "<lE20" Else sqr(vArray(j. 1) . cMax) .Font. Select For j = 1 To cMax ActiveCell. Value = '"' ActiveCell.Offset(O.0ffset(O.Offset(O.P . Advanced Excel for scientific data analysis For j = 1 To ccMax vArray(i. 1 .Font. 1) .Value = "Coeff:" ActiveCell.p) .Bold True ActiveCell.Offset(O. Select If (p = 0 And cMax = 3) Then GoTo LastLine If P = 0 Then For i = 1 To cMax .Offset(rMax.2 .Font.Offset(O. Select ActiveCell.ltalic = True ActiveCell. Select Next j ActiveCell. 1 .Bold False ActiveCell. 1) . Select Next j ActiveCell.Value = "" ActiveCell.Value = "" ActiveCell. j) < 1E40 Then ActiveCell.Offset(1.Offset(O.Offset(O.Value = bArray(j. 1 .Font.Offset(O. 1 .
The user specifies the cell block in which to write this array.Font.Value = "CM:" ActiveCell. cMax) . 1) . Select Next j ActiveCell.1 ActiveCell. 1) .lta1ic = True ActiveCel1.cMax.Offset(O.Font.Bold = False ActiveCell. 1 . j) ActiveCell. 0) .1 For j = 2 To cMax . Select Next i ActiveCell.Offset(2 .Offset(O.Value = vArray(i.Font. Select For i = 1 To cMax .Offset(O.Bold = True ActiveCell. making sure that it does not .ScreenUpdating = True 1 1 Provide as optional output the array of linear correlation coefficients.cMax) .Offset(O. 1) .Font.Font. Select Next j 489 ActiveCell. 1) . 2 .Value = vArray(i.Colorlndex = 11 ActiveCell.Offset(O.Bold = True ActiveCell. 9: Macrosjor least squares &jorthepropagation ojuncertainty For j = 1 To CMax ActiveCell. Select ActiveCell.Colorlndex = 1 ActiveCell.Font.Colorlndex = 1 ActiveCe1l.Font. Select ActiveCe1l.Value = "" ActiveCell.Font.Value = "CM:" ActiveCell.ltalic = True ActiveCell.Ch.Offset(1.Offset(l.Offset(l. Select Next j ActiveCell.Font. Select Next i End If Application. CMax) .Offset(l . 1) . j) ActiveCell.Font. Select Next i ActiveCell.Colorlndex = 11 ActiveCell.Colorlndex = 11 ActiveCe1l.Colorlndex = 11 ActiveCell.Font.CMax) .Offset(1. Select For i = 2 To cMax .Select Next j ActiveCell.Value = "" ActiveCell.1 For j = 1 To cMax .1 ActiveCell. 0) .Bold = False ActiveCell.1 For j = 1 To cMax ActiveCell.Font.Offset(O. 1) .cMax. Select Next i Elself p = 1 Then For i = 1 To cMax .
Borders(xIEdgeRight) .Weight = xlThin Selection. Type:=8) myRange. "Weighted Least Squares") OutlineMatrix: If jAnswer = vbYes Then Set myRange = Application.Borders(xlEdgeTop) .490 1 R. i) * vArray(j. vbYesNo.Rows.Columns.1 Then MsgBox "The selected range does not have " & ccMax + p . i) . Borders (xIEdgeLeft) . "Weighted Least Squares Fit" GoTo OutlineMatrix End I f If Selection.1 & " rows. Borders (xlEdgeRight) .1 & " cells. I npu tBox (Prompt: = "The matrix should be located in:".LineStyle = xlDashDot Selection.Count <> ccMax + p . LineStyle = xlDashDot Selection.Borders(xIEdgeTop) .Select . j) = vArray(i.Borders(xlEdgeBottom) . If P = 0 And cMax = 2 Then GoTo Las tLine jAnswer = MsgBox("Do you want to see the matrix of" & "linear correlation coefficients?" & Chr(l3) & "It will need a block of" & ccMax + p . LineStyle = xlDashDot Selection.Count <> ccMax + p . Value = lccArray Elself p = 0 Then For i = 2 To ccMax For j = 2 To ccMax Root = Sqr(vArray(i. j) / Root Next j Next i Selection. Draw a box around the reserved area Selection. "Weighted Least Squares" GoTo OutlineMatrix End I f . Please correct".Borders(xIEdgeBottom) . LineStyle = xlDashDot Selection.1 Then MsgBox "The selected range does not have " & ccMax + p .1 & " by" & ccMax + p . Advanced Excel for scientific data analysis overwrite essential data. j» lccArray(i.Weight = xlThin Selection. de Levie.1 & " columns." .".Weight = xlThin If P = 1 Then For i = 1 To ccMax For j = 1 To ccMax Root = Sqr(vArray(i. Make sure that the selected block has the correct size If Selection.Weight = xlThin Selection. Borders (xlEdgeLeft) . Please correct.
T " .. . • I . .... . . t t l f l' f l' . .expYn. . typically...A. .. . • 1 .)(8Yn. .... . It can also display the covariance matrix eM or the corresponding array of linear correlation coef. j) / Root Selection. X2 = L (Yn. 1 1 . 1 I . Value End I f End If lccArray LastLine: End Sub 9... • .."" ~IIII""T'" '" III 1 I I I I I 1 III 1 I > I I I I I I I I > 1 . 1 .. i.. j ) ) lccArray (i . " " .9 SolverAid SolverAid provides uncertainty estimates for the parameters computed by Solver which.".. . (For subsequent use of the Propagation .... l1A III~ SOLVERAID A. ... .. I < I I I I I I I ~ 1_ ((. .A . 1 .Ch. . . f . " l' . I 1 I .e. j . 1 . . ~ .. 1 1 . ) You can display both eM and Lee by running the macro twice. .r? . SolverAid can also provide the covariance matrix. .1) Next j Next i vArray(i..ca/8a... . t . T "' ..000001 (i. . Replace A.cai before resetting the parameter to its original value. . The partial derivatives are obtained by using the VBA instruction Selection. 1 .. r . macro. • . I I I > I <> I I I . ~ ... .. . ficients Lee. displays the corresponding standard deviations si of the .. .... . 9: Macrosfor least squares &forthepropagation ofuncertainty 491 * vArray (j... ? "' r . " f l' .dP.. .... t . . I l' 1 . 1 I 1 1 .. you will need to use the covariance ma trix. . ... .. 1 . coefficients ai. . "' ... 1 . . T T f l' • ... r . .2: (OYn.. 1 1 1 ...caD 1. 1 . . 1 . . overall fit. . . . l' '!' l' .. 1 . 1 .... SolverAid calculates the standard deviations Si = V[mi/%1 /(N~P)] where my . l .. .. 1 .e. .. by temporarily incrementing it by 104%) and to measure the resulting change in Yn.... ... . 1 . L . .. . t" . . . 't . . . . Y2~ .. REQUIREMENTS: This macro requires the subroutine Invert .. Levi ~ Oct _!~ ~9<?~ Sol verAid takes the resul ts of Sol ver. B to multiply one adjustable parameter at a time by 1..1.. t t " f f . . then computes and ... and the standard deviation sy of the . t I I « .. 1 . . . minimizes the sum of squares X 2 between the experimental and calculated functions y. I .catloaj)N is the number of data points. . and P that of the number of adjustable parameters used.
Note that these data must contain the FORMULAS referring to the Solverdetermined parameters. PROCEDURE: 1 1 1 I I I I I In order to start the process. all elements on the main diagonal are 1.}) /sqr [(v(i) *v {j)j . I Sub Sol verAid ( ) NOTATION USED. therefore. will be placed directly. cX = columns of XE should be 1 the number of rows: rY = rows of YC . in italics. cP and cX prefix denoting rP = rows of PP. and the standard deviation(s) of the parameter (s). Pl. INPUT: de Levie. The covariance matrix shows the variances on its main (topleft to bottomright) diagonal. the number of columns: PP. most convenient to leave blank the spreadsheet cells to the right of the parameters and of SSR. to the right of the corresponding parameters. and all offdiagonal elements have been calculated as r(iJ) = v (i. THESE MUST BE IN A SINGLE. provided that those cells are unoccupied. Otherwise. OUTPUT: I I I I I I I I The standard deviations of the fit. THESE MUST BE IN A SINGLE. call the macro. which ask for (1) the location of the COLUMN containing the parameters determined by Solver. a column containing merely their numerical values will NOT do. For the matrix of the correlation coefficients. and the corresponding covariances in the offdiagonal posi tions.492 I R. such as SSR. PP: SP: SSR: SY: YC: single Parameter determined by Solver multiple parameters determined by Solver NOTE. and (3) the COLUMN containing the yvalues calculated with the Solverdetermined parameters. the result (5) will be displayed in message boxes. Advanced Excel for scientific data analysis I I I I I I I The input information must be provided in response to three input boxes. CONTIGUOUS COLUMN standard deviations on those parameters the sum of the residuals squared used to optimize Solver the standard deviation on the function the Yvalues computed with the parameters P NOTE. and provide the requested addresses (either by typing or by using the f highlight & click method) to the input boxes. CONTIGUOUS COLUMN c: r: prefix denoting cP = columns of Note. It is.· (2) the location of the CELL containing the parameter to be minimized.
1) 0) PPValue(i.Ch. n2 As Integer. VarCovarValue As Variant PPValue As Variant. rrY As Integer rSSR As Integer PIValue As Double. 1) = 1E20 Next i End I f 1E20 Then . 1» Or PPValue(i. Ti tIe: =" Sol verAid InputBox 1 : Solver parameters". "SolverAid" End End I f . cSSR As Integer. Root As Double SDD1 As Double. SPIAddress.Rows. z As Double CorrelCoeff As Variant. SYAddress Select the computed Solver parameter Pl or parameters Set myRangel = Application. iiAnswer. j As Integer nl As Integer.Columns.Count rP = Selection. Value For i = 1 To rP If (IsEmpty(PPValue(i. ii As Integer. contiguous COLUMN. SYValue As Double. ". . cy As Integer i As Integer. iAnswer. SPValue As Variant YCValue As Variant YYValue As Variant. n3 As Integer rP As Integer.lnputBox(Prompt:= "The parameters determined by Solver are located in:". 9: Macrosjor least squares &forthe propagation ojimcertainty Dimension all parameters 493 Dim Dim Dim Dim Dim Dim Dim Dim Dim Dim Dim Dim Dim Dim Dim Dim cP As Integer.Select cP = Selection.Count I f cP 0 Then End I f cP > 1 And rP > 1 Then End Selection. myRange4 As Range Answer. Replace any zeros in the input parameters 1 by small nonzero numbers If rP = 1 Then PIValue = Selection. Value I f cP 1 And rP > 1 Then PPValue If cP > 1 And rP 1 Then MsgBox "The Solver parameters must be listed" & Chr (13) & " in a single. Type::::8) myRangel. SP1 As Double SPIValue As Double. Value I f cP 1 And rP 1 Then PIValue Selection. myRange2 As Range myRange3 As Range. YYValuel As Variant myRangel As Range. rY As Integer. Value If (IsEmpty(PIValue) Or PIValue 0) Then PIValue Else PPValue = Selection. SSRValue As Double Sy As Double.
Select SYValue = Selection.Columns. Value For i = 1 To rP If (IsEmpty(SPValue(i.Rows.Address SYAddress Else n2 = n2 + 1 End I f . Title:=="SolverAid InputBox 3: Ycalc". Advanced Excel for scientific data analysis Test and prepare the default output range for the standard deviations of the parameters. 1) . Select the computed chisquared value. Type:=8) myRange2. 1» Or SPValue(i.494 1 1 R. Type:=8) .Select cSSR = Selection.Count If cSSR <> 1 Then End If rSSR 0 1 Then End SSRValue = Selection. 1) . for the standard deviation of the fit. SY n2 = 0 Selection. Value . Value If (IsEmpty(SP1Value) Or SPlValue 0) Then nl = nl Else nl nl + 1 End I f Else SPValue = Selection. InputBox(Prompt:= _ "The sum of squares of the residuals is located in:". SSR o Then Set myRange2 = Application. Title:="SolverAid InputBox 2: SSR". Select If rP = 1 Then SPlAddress = Selection.Address SPlValue = Selection. Select the computed Yvalues.Offset(O. 1» nl nl Else nl nl + 1 End If Next i End I f . SF nl = 0 Selection. Test and prepare the default output range . InputBox(Prompt:= _ "The column containing Ycalc is:". 0) Then YC Set myRange3 = Application. de Levie. Value If (IsEmpty(SYValue) Or SYValue n2 = n2 Selection.Offset(O.Count rSSR = Selection.
Value PlValue = . 1» Or YYValuel(j. Value The following loop avoids counting empty rows in the computation of the variances and standard deviations I n3 = 0 For j = 1 To rY If IsEmpty(YYValue1(j.Select YYValue1 = Selection.ScreenUpdating = False .Count If cy <> 1 Then MsgBox "The Ycalc values should be in" & Chr (13) & "a single. 9: Macrosfor least squares &forthe propagation ofuncertainty myRange3. 001(1 To rY) As Double SDDl = 0 For j = 1 To rY 01(j} = (YYValue1(j.000001 * PlValue) DDl(j) = Dl(j) * Dl(j) SDD1 = SDD1 + 001(j) Next j PlValue = PlValue I 1.Select rY = Selection. Compute the partial differentials and the . 1}) 1_ (0.Select P1Value = PlValue * 1.rP» r I 1) 0 Resume to compute the partial differentials and the standard deviations for the oneparameter case ReDim 01(1 To rY) As Double.Rows.Ch.n3 Sy = Sqr (SSRValue I (rrY . contiguous COLUMN" End End If If rY <= rP + 1 Then MsgBox " The number N of data pairs must be at least" Chr (13) & "larger by one than the number of " & "parameters P.000001 Selection = PlValue myRange3. Value Application.Count cy = Selection. "SolverAid" End End If YCValue 495 & = Selection.Columns. .".000001 myRange1. standard deviations for the oneparameter case If rP = 1 Then myRange1.Select Selection.1) YCValue(j. Then n3 = n3 + 1 Next j rrY = rY .
vbYesNo. Select ActiveCell. "Overwrite?") If Answer = vbYes Then nl = 0 If Answer = vbNo Then MsgBox "Tt.Font.Select Selection. End If "SolverAid" End I f If nl = 0 Then myRange1. Can it be overwritten?" . 1) . de Levie.ltalic = True Selection.Select Selection. Select ActiveCell.! standard deviation of the parameter is & SP1. 1) * 1. 1 To rY) As Double For i = 1 To rP myRange1. Advanced Excel for scientific data analysis SPI = Sy / Sqr(SOOl) If n2 > 0 Then Answer = MsgBox("There is information in the cell" & "to the" & Chr (13) & "right of SSR. vbYesNo. "SolverAid" & Sy End I f End I f If n2 = 0 Then myRange2. "Overwrite?") If Answer = vbYes Then n2 = 0 If Answer = vbNo Then MsgBox "The standard deviation of the fit is " .Value = Sy End I f If n1 > 0 Then Answer = MsgBox("There is information in the cell" & "to the right of" & Chr (13) & "the Solver parameter.Font..ltalic = True Selection.Offset(O.Offset(O. 1) = PPValue(i. Value = PPValue myRange3. Can it be overwritten?" .496 R. Value The following loop avoids counting empty rows in the computation of the variances and standard deviations n3 = 0 For j = 1 To rY 1 .Select PPValue(i.000001 Selection.Se1ect YYValue = Selection. Value = SP1 End If Start to compute the partial differentials for the multiparameter case Else ReDim 0(1 To rP. 1) .
n3 Sy = Sqr(SSRValue / (rrY . 1 To rY) As Double For i = 1 To rP For ii = 1 To rP For j = 1 To rY DD ( i . ii) + DD ( i .Select Selection. 1» Then n3 Next j rrY = rY . ii. j ) Call Invert (SDD.YCValue (j. or" & Chr (13) & "the matrix of correlation coef" & Chr (13) & "ficients? Either one needs a" & Chr(13) & "block of" & rP & " by" & rP & " cells. j) Next j Next ii Next i ReDim SDD(l To rP. ii) = 0 Next ii Next i For i = 1 To rP For ii = 1 To rP For j = 1 To rY SDD ( i . Value PPValue ReDim DD(l To rP.000001 1» / Next i myRangel. 9: Macrosfor least squares &forthe propagation ofuncertainty If IsEmpty(YYValue(j. Resume the computation of the partial differentials and I the standard deviations for the multiparameter case For j D (i. j) = D ( i . SDDInv) . and the block I in which to write the matrix iAnswer = MsgBox ("Do you want to see either the" & Chr (13) & "covariance matrix. Select the option.ScreenUpdating = True . Next j PPValue(i. 1) . ii . 1) / 1. rP.".Ch. j) * D (i i . Next j Next ii Next i ii) = SDD ( i . vbYesNo. 1 To rP) As Double For i = 1 To rP For ii = 1 To rP SDD(i. "So1verAid") If iAnswer = vbYes Then Application. 1» (0. = 1 To rY j) = (YYValue (j.rP» = 497 n3 + 1 . 1) = PPValue(i. 1 To rP) As Double ReDim SDDInv(l To rP. 1 To rP.000001 * PPValue(i.
"Sol verAid" ) Write the covariance matrix in the selected block For i = 1 To rP For j = 1 To rP VarCovarValue(i. . Borders (xlEdgeLeft) . Value = VarCovarValue If iiAnswer = vbNo Then Selection.Borders(xlEdgeTop) . LineStyle = xlDashDotDot Selection. de Levie.Weight = xlThin Selection. LineStyle = xlDashDotDot Selection.Weight = xlThin Selection. Borders (xlEdgeRight) . j) / Root Next j Next i If iiAnswer = vbYes Then Selection.Value and draw a box around it Selection. Borders (xlEdgeLeft) . LineStyle = _ xlDashDotDot Selection. Value CorrelCoeff .498 R. j) Sy * Sy * SDDlnv(i.Count <> rP Then MsgBox "The selected range does not have " & rP & " columns.". j) Next j Next i For i = 1 To rP For j = 1 To rP Root = Sqr(VarCovarValue(i. vbYesNo. InputBox (Prompt: = _ "The matrix should be located in:". Type:=B) myRange4.Select VarCovarValue Selection. Advanced Excel for scientific data analysis Set myRange4 = Application.Borders(xlEdgeTop) . j» CorrelCoeff(i.Rows. i) * VarCovarValue(j." End End I f If Selection.Columns.Weight = xlThin . Borders (xlEdgeRight) . LineStyle = xlDashDotDot Selection. j) =VarCovarValue(i.Weight = xlThin Selection. "SolverAid" End End I f iiAnswer = MsgBox ("Do you want to see the " _ & Chr (13) & "covariance matrix rather than the" & Chr(13) & "linear correlation coefficients?". Value CorrelCoeff = Selection. Borders (xlEdgeBottom) .Count <> rP Then MsgBox "The selected range does not have " & rP & " rows. Make sure that the selected block has the correct size If Selection.Borders(xlEdgeBottom) .
1) . 9: Macrosfor least squares &forthe propagation ofuncertainty End If Application.Select Selection. Value = Sy End If End I f myRangel. " & "Can they be overwritten?".Ch. 1) Sy * Sqr(SDDInv(i. Value = SPValue Next i End I f If n2 > 0 Then Answer = MsgBox("There are data in the cells to the" & Chr (13) & "right of the sum of squares of the" & Chr(13) & "residuals. vbYesNo. Italic = True For i = 1 To rP Selection. 1). "Overwrite?") If Answer = vbYes Then n2 = 0 If Answer = vbNo Then MsgBox "The standard deviation of the fit is "SolverAid" & Sy End I f If n2 = 0 Then myRange2. "Overwrite?") If Answer vbYes Then nl = 0 If Answer = vbNo Then For i = 1 To rP MsgBox "The standard deviation of parameter i" & i & " is " & SPValue(i. Offset (0.Italic = True Selection. Select End Sub . .ScreenUpdating False 499 For i = 1 To rP SPValue (i. 1) . vbYesNo. i» Next i If nl > 0 Then Answer = MsgBox("There are data in the cells to the" & "right of the" & Chr(13) & "Solver parameters. Select ActiveCell. Select Selection. 1) . "SolverAid" Next i End I f End I f If nl = 0 Then myRange1. Font. Offset (0. Can they be overwritten?".Se1ect Selection.Offset(O.Font.Select Selection.
" .5. consistent wi th the format of the input da ta. they cannot be formulas. PURPOSE. .l. . 1 1 1 1 This macro computes the propagated standard deviation in single function F based on N input parameters. For a single input parameter. the macro recognizes which of the two computations to perform because the standard deviations will be provided as a vector. Advanced Excelfor scientific data analysis 9..500 R. . a SUBROUTINES: This macro does not require 1 any subroutines INPUT: The N independent input parameter values must be placed either in a contiguous row or in a contiguous column.. . . . ... No such assumption will be made when the covariance matrix is provided. e. i. .10 Propagation This macro is listed here without explanation. The same applies for the covariance matrix if this is used instead. For more than one input parameter. ( I 1"111 (c) " . 'I"? " " " . I I' I 'I . 1 I " " I' "" I 1 I 1 I I I I . equations. 1 I " " " . . . i. the macro will assume that the input parameters are mutually independen t... THEY MUST BE NUMBERS. . based on their standard deviations or on the corresponding covariance matrix. . """ '1""'" . When only the standard deviations are given. 2002 Sub Propagation() . . . R.. .. . again either in a contiguous row or column.. I . . e. The N standard deviations must follow the same format. 1 1 I 'I""""""'" I"""? 11"'''' PROPAGATION AI"" A I I I I I I . The components of the N by N covariance matrix are assumed to be in the same order as those of the N input parameters.. de Levie. ." """ " " I' '1""" " I. de Levie Oct. .. . 1 Moreover. . the order of the input parameters and of the standard deviations must be the same. .". while the covariance matrix has the form of a square data array. since the algorithm for the propagation of experimental uncertainty was already discussed in section 8. . ". . but they can be ei ther values or formulas. there is no distinction between the two approaches. r""" 1 " " 1 111 I "". . 1 1 1 1 " " 1 " " " " " " 1 ' / " " " " ' 1 " " 1 ' v2.
the uncertainty is the standard deviation C = 4 parameters P in column. 9: MacrosJor least squares &Jorthe propagation oJuncertainty OUTPUT: 501 1 1 1 1 The standard deviation of the single function F will be placed directly to the right of (or below) that function. CONTIGUOUS ROW OR COLUMN the covariance matrix the single function through which the error(s) propagate (s) the propagated variance of the function F the propagated standard deviation of function F 1 We distinguish five cases: C l o n e parameter P. to highlight anything. single standard deviation of X multiple input parameters (for N>l) NOTE: THESE MUST BE IN A SINGLE. a third input box will ask for the address of the function. These should have been arranged in the same order as the earlierenteredparameters. LCCTest As Integer N As Integer ' Larger dimension of Dim Dim Dim Dim . uncertainty U in matrix C= 5 parameters P in row. a second input box will request the addresses of either the standard deviations or the covariance matrix.Ch. j As Integer. The output will be provided either on the spreadsheet. There is no need 1 1 1 1 1 1 1 1 You will see an input box in which to place (either by typing or by the 'highlight & click' method) the address (ses) of the input parameter(s). MAs Integer number As Integer. in italics. the result will be displayed in a message box. Otherwise. Finally. NOTATION: 1 N: X: S: Xi: Si: CM: F: VF: SF: 1 1 the number of input parameters single input parameter (for N=l) the corresponding. call it. uncertainty U in matrix For C = 4 or 5. PROCEDURE: In order to start this macro. or through message box (es) . CONTIGUOUS ROW OR COLUMN standard deviations of the multiple input parameters. the uncertainty must be in the form of a covariance matrix. provided that this cell is either unoccupied or its contents can be overwritten. one uncertainty U C = 2 parameters P and uncertainties U in column format C = 3 parameters P and uncertainties U in row format For C = 1 to 3. C As Integer 'Case selector i As Integer. NOTE: THESE MUST BE IN A SINGLE. After you have entered these.
Select the input parameters of the function Prompt = "The input parameters of the" & Chr (13) & "function are located in:" Title = "uncertainty Propagation InputBox 1: Input" _ & Parameters n Set myRange1 = Application. de Levie. myRange3 As Range Dim myRangel As Range Dim Answer.502 R." & Chr (13) & "or in a single contiguous column. Check the type of input If NRP = 0 Then End If NRP OlAnd NCP 0 1 Then MsgBox "The input parameters should be placed" & Chr(13) & "either in a single contiguous row. dimensioning the array DelValue = Selection.Select NRP Selection. newXiValue As Variant SFiValue As Variant. Prompt. Value . Advanced Excelfor scientific data analysis Dim NCF As Integer Dim NCP As Integer Dim NCD As Integer Dim NRF As Integer Dim NRP As Integer Dim NRU As Integer Dim Dim Dim Dim Dim Dim Dim Dim Dim 'input parameter set 'Number of Columns of output Function 'Number of Columns of input Parameters 'Number of Columns of the uncertainty 'estimate 'Number of Rows of output Function 'Number of Rows of input Parameters 'Number of Rows in the uncertainty 'estimate FValue As Double. If Type:=8} myRange1.Count .Count NCP = Selection. dimensioning the array newXiValue = Selection. CMiValue As variant XiValue As Variant DimmyRange2 As Range. Value Elself NCP > 1 And NRP = 1 Then . FFValue As Double newFiValue As Double. dimensioning the array SFiValue = Selection. newFValue As Double newXValue As Double. ". VFiValue As Double VFValue As Double. Title. Title . InputBox (Prompt. Value . Value Elself NRP > 1 And NCP = 1 Then N = NRP XiValue = Selection.Columns. XValue As Double DelValue As Variant.Rows. "Propagation of uncertainty" End Else!f NRP = 1 And NCP 1 Then = N = 1 XValue = Selection. SFValue As Double SValue As Double.Value . SiValue As variant CMValue AS Variant.
dimensioning the array . 9: Macrosfor least squares &forthe propagation ofuncertainty N = NCP XiValue Selection. Value SFiValue Selection. I then read the uncertainties If NCU o Then End If N = 1 And NRU = 1 And NCU = 1 Then C = 1 'only one input parameter SValue = Selection. Title.Rows.Columns.Count NCU = Selection. Value End I f IfN>lThen 'mul tiple input parameters: . Type:=8) myRange2. or" & Chr (13) & "the covariance matrix. of the" & Chr (13) &" input parameters are located in:" Ti tIe = "Uncertainty Propagation InputBox 2: " & "Uncertainty estimates" Set myRange2 = Application . Verify and categorize the inputs. Select the uncertainty estimates (standard deviations I or covariance matrix) of the input paramete