You are on page 1of 79

CALIFORNIA INSTITUTE OF TECHNOLOGY PHYSICS MATHEMATICS AND ASTRONOMY DIVISION

Freshman Physics Laboratory (PH003)

VadeMecum for Data Analysis Beginners


(October 3, 2007) Copyright c Virgnio de Oliveira Sannibale, June, 2001

Acknowledgments
I started this work with the aim of improving the course of Physics Laboratory for Caltech freshmen students, the so called ph3 course . Thanks to Donald Skelton, ph3 was already a very good course, well designed to satisfy the needs of news students eager to learn the basics of laboratory techniques and data analysis. Because of the need of introducing new experiments, and new topics in the data analysis notes, I decided to rewrite the didactical material trying to keep intact the spirit of the course, i.e emphasis on techniques and not on the details of the theory. Anyway, I believe and hope that this attempt to reorganize old experiments and introduce new ones constitutes an improvement of the course. I would like to thank, in particular, Eugene W. Cowan for his incommensurable help he gave to me with critiques, suggestions, discussions, and corrections to the notes. His experience as professor at Caltech for several years were really valuable to make the content of these notes suitable for students at the rst year of the undergraduate course. I would like to thank also all the teaching assistants that make this course work, for their patience and valuable comments that I constantly received during the academic terms. Sincerely, Virgnio de Oliveira Sannibale

Contents
1 Physical Observables 1.1 Random Variables and Measurements . . . . . . . . . . . . . 1.2 Uncertainties on Measurements . . . . . . . . . . . . . . . . . 1.2.1 Accuracy and Precision . . . . . . . . . . . . . . . . . 1.3 Measurement and Probability Distribution . . . . . . . . . . 1.3.1 Gaussianity . . . . . . . . . . . . . . . . . . . . . . . . 1.3.2 Gaussian Distribution Parameter Estimation for a Single Variable . . . . . . . . . . . . . . . . . . . . . . . . 1.3.3 Gaussian Distribution Parameter Estimation for the Average Variable . . . . . . . . . . . . . . . . . . . . . 1.3.4 Gaussian Distribution Parameter Estimation for the Weighted Average . . . . . . . . . . . . . . . . . . . . 1.3.5 Example (Unweighted Average) . . . . . . . . . . . . 1.3.6 Example (Weighted Average) . . . . . . . . . . . . . . Propagation of Errors 2.1 Propagation of Errors Law . . . . . . . . . . . . . 2.2 Statistical Propagation of Errors Law (SPEL) . . 2.2.1 Example 1: Area of a Surface . . . . . . . 2.2.2 Example 2: Power Dissipated by a Circuit 2.2.3 Example 4: Improper Use of the Formula 2.3 Relative Uncertainties . . . . . . . . . . . . . . . 2.3.1 Example 1: . . . . . . . . . . . . . . . . . 2.4 Measurement Comparison . . . . . . . . . . . . . 2.4.1 Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 9 12 13 14 16 17 17 18 18 19 21 21 22 23 23 24 24 25 26 26

Graphical Representation of Data 27 3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 5

6 3.2

CONTENTS Graphical Fit . . . . . . . . . . . . . . . . . . . . . . . . 3.2.1 Linear Graphic Fit . . . . . . . . . . . . . . . . 3.2.2 Theoretical Points Imposition. . . . . . . . . . Linear Plot and Linearization . . . . . . . . . . . . . . 3.3.1 Example 1: Square Function . . . . . . . . . . . 3.3.2 Example 2: Power Function . . . . . . . . . . . 3.3.3 Example 3: Exponential Function . . . . . . . . Logarithmic Scales . . . . . . . . . . . . . . . . . . . . 3.4.1 Linearization with Logarithmic Graph Sheets Difference Plots . . . . . . . . . . . . . . . . . . . . . . 3.5.1 Difference Plot of Logarithmic Scales . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 29 30 31 34 34 35 35 36 36 36 39 39 39 39 40 41 41 41 42 43 43 43 45 45 45 46 47 48 49 50 51 52 54

3.3

3.4 3.5

Probability Distributions 4.1 Denitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.1.1 Probability and Probability Density Function (PDF) . 4.1.2 Distribution Function (DF) . . . . . . . . . . . . . . . 4.1.3 Probability and Frequency . . . . . . . . . . . . . . . . 4.1.4 Continuous Random Variable v.s. Discrete Random Variable . . . . . . . . . . . . . . . . . . . . . . . . . . 4.1.5 Expectation Value . . . . . . . . . . . . . . . . . . . . . 4.1.6 Intuitive Meaning of the Expectation Value . . . . . . 4.1.7 Variance . . . . . . . . . . . . . . . . . . . . . . . . . . 4.1.8 Intuitive Meaning of the Variance . . . . . . . . . . . 4.1.9 Standard Deviation . . . . . . . . . . . . . . . . . . . . 4.2 Uniform Distribution . . . . . . . . . . . . . . . . . . . . . . . 4.2.1 Random Variable Uniformly Distributed . . . . . . . 4.2.1.1 Example: Ruler Measurements . . . . . . . . 4.2.1.2 Example: Analog to Digital Conversion . . 4.3 Gaussian Distribution (NPDF) . . . . . . . . . . . . . . . . . . 4.3.1 Standard Probability Density Function . . . . . . . . 4.3.2 Probability Calculaltion with the Error Function . . . 4.4 Exponential Distribution . . . . . . . . . . . . . . . . . . . . . 4.4.1 Random Variable Exponentially Distributed . . . . . 4.5 Binomial/Bernoulli Distribution . . . . . . . . . . . . . . . . 4.6 Poisson Distribution . . . . . . . . . . . . . . . . . . . . . . . 4.6.1 Example: Silver Activation Experiment . . . . . . . .

CONTENTS 5 Parameter Estimation 5.1 The Maximum Likelihood Principle (MLP) . . . . . . 5.1.1 Example: and of a Normally Distributed dom Variable . . . . . . . . . . . . . . . . . . . 5.1.2 Example: of a set of Normally Distributed dom Variables . . . . . . . . . . . . . . . . . . . 5.2 The Least Square Principle (LSP) . . . . . . . . . . . . 5.2.1 Geometrical Meaning of the LSP . . . . . . . . 5.2.2 Example: Linear Function . . . . . . . . . . . . 5.2.3 The Reduced 2 (Fit Goodness) . . . . . . . . . 5.3 The LSP with the Effective Variance . . . . . . . . . . 5.4 Fit Example (Thermistor) . . . . . . . . . . . . . . . . . 5.4.1 Linear Fit . . . . . . . . . . . . . . . . . . . . . 5.4.2 Quadratic Fit . . . . . . . . . . . . . . . . . . . 5.4.3 Cubic Fit . . . . . . . . . . . . . . . . . . . . . . 5.5 Fit Example (Offset Constant) . . . . . . . . . . . . . .

7 57 . . . . 57 Ran. . . . 58 Ran. . . . 59 . . . . 60 . . . . 61 . . . . 61 . . . . 62 . . . . 63 . . . . 64 . . . . 65 . . . . 66 . . . . 67 . . . . 69 71 73 75 77

A Central Limit Theorem B Statistical Propagation of Errors C NPDF Random Variable Uncertainties D The Effective Variance

CONTENTS

Chapter 1 Physical Observables


1.1 Random Variables and Measurements

During an experiment, the result of a measurement of a physical quantity1 x, (a number or a set of numbers) is always somewhat indeterminate. In other words, if we repeat the measurement we can get a different result. Apart from any philosophical point of view, the reasons for this indetermination can be explained considering that we are able to control or measure just a few of the physical quantities involved in the experiment and we dont completely know the dependency of each one of them. Moreover, all those variables can change with time and it becomes impossible to measure their evolution. Fortunately and quite often, this ignorance does not preclude measurement with the required precision. Lets consider as example, a physical system ( see gure 1.1) made of a thermally isolated liquid, a heater, and a paddle wheel turning at constant velocity. Lets assume that we want to measure the average liquid temperature versus time using a mercury thermometer as instrument. Lets then try to list some of the potential perturbations mechanisms that can affect the measurement: During the measurement process the liquid temperature changes not uniformly because the system is not perfectly isolated and looses heat.
1 Any

measurable quantity is a physical quantity.

10
Current

CHAPTER 1. PHYSICAL OBSERVABLES


Motor Thermometer

1111111111111111111111 0000000000000000000000 0000000000000000000000 1111111111111111111111 0000000000000000000000 1111111111111111111111 0000000000000000000000 1111111111111111111111 0000000000000000000000 1111111111111111111111 0000000000000000000000 1111111111111111111111 0000000000000000000000 1111111111111111111111 0000000000000000000000 1111111111111111111111 0000000000000000000000 1111111111111111111111 0000000000000000000000 1111111111111111111111 0000000000000000000000 1111111111111111111111 0000000000000000000000 1111111111111111111111 0000000000000000000000 1111111111111111111111 0000000000000000000000 1111111111111111111111 0000000000000000000000 1111111111111111111111 0000000000000000000000 1111111111111111111111 Paddlewheel 0000000000000000000000 1111111111111111111111 0000000000000000000000 1111111111111111111111 0000000000000000000000 1111111111111111111111 0000000000000000000000 1111111111111111111111 Liquid 0000000000000000000000 1111111111111111111111 0000000000000000000000 1111111111111111111111 0000000000000000000000 1111111111111111111111 0000000000000000000000 1111111111111111111111 Dewar 0000000000000000000000 1111111111111111111111 0000000000000000000000 1111111111111111111111 0000000000000000000000 1111111111111111111111 0000000000000000000000 1111111111111111111111 0000000000000000000000 1111111111111111111111
Figure 1.1: Example of a physical system under measurement , i.e a variant of the
Joules Experiment (1845). The isolated liquid is heated up by the paddle wheel movement driven by an electric motor. The mercury thermometer measures the temperature changes.

During the measurement the liquid is irregularly heated up because of the location of the paddle wheel. The measurement should be taken when the liquid and the instrument are in thermal equilibrium (same temperature, no heat exchange), and this cannot really happen because the liquid is heated up and because of the thermometer heat capacity. In other words, the temperature is changing constantly impeding the thermal equilibrium. The instrument reading is affected by the parallax error, i.e. the position of the mercury column cannot be accurately read. The accuracy of the thermometer scale divisions is not perfect, i.e. the scale calibration is not perfect (the scale origin and/or the average divisions distance is not completely right, the divisions dont have the same distance, et cetera...). Liquid currents produced by the paddle wheel movement are turbulent (chaotic) affecting the uniformity of the liquid temperature.

1.1. RANDOM VARIABLES AND MEASUREMENTS

11

The heat ow due to the paddle wheel movement is not completely constant and is affected by small unpredictable uctuations. For example, measuring the current owing through the electric motor we see small random uctuations around a constant value. Some of those perturbations are probably completely negligible (the instrument is unable to see them), some others can be estimated and minimized, and some others cannot2 .

Disturbance Disturbance

111111111111111 000000000000000 Excitation 000000000000000 111111111111111 000000000000 111111111111 000000000000000 111111111111111 000000000000 000000000000000 111111111111111 x+ 111111111111 000000000000 111111111111 000000000000000 111111111111111 000000000000 111111111111 Ideal Physical 000000000000000 111111111111111 Instrument 000000000000 111111111111 000000000000000 111111111111111 System 000000000000 111111111111 000000000000000 111111111111111 000000000000 111111111111 000000000000000 111111111111111 000000000000 111111111111 000000000000000 111111111111111 Disturbance Response 000000000000000 111111111111111
1 2

Ambient

Figure 1.2: Model of a physical system under measurement.

Figure 1.2 shows a quite general model of the experimental situation. The physical system as an ideal system follows a known theoretical model. The instrument interacting with the system allows us to measure the physical quantity x. External unpredictable disturbances 1 , 2 , ..., n , ... perturb the system and the instrument. The instrument itself perturbs the physical system, when we perform the measurement. These considerations can be incorporated by a simple but widely used linear model for any physical quantity x. If we call x (t) the value of a
2 One can argue that we are trying to measure something with something worse than a

kludge. In other words, if we want the average liquid temperature, we need for example a more sophisticated apparatus that allows us to map and average very accurately the liquid temperature. Anyway, what we can do is only minimize perturbations, but never get rid of them.

12

CHAPTER 1. PHYSICAL OBSERVABLES

physical quantity with no disturbances at the time t, and (t) its random uctuations, the measured value of x at the time t will be x ( t ) = x ( t ) + ( t ). Any physical quantity x is indeed a random variable or a stochastic variable.

1.2

Uncertainties on Measurements

We can distinguish types of uncertainties based on their nature, i.e. uncertainties that can be in principle eliminated, and uncertainties that cannot be eliminated. Starting from this criterion, we can divide the source of uncertainties also called errors into two categories: random errors: any errors which are not or do not appear to be directly connected to any cause (the cause and effect principle doesnt work), and are indeed not repeatable but random. Random errors cannot be completely eliminated. systematic error: any errors in the measurement which are not random. Quite often, this kind of error algebraically adds to the measurement a constant unknown value. This value can also change/drift with time. A typical systematic error comes from a wrong calibration of the instrument used. This kind of error is hard to minimize and quite often difcult to detect. Sometimes, it can be found by repeating the measurement with different procedures and/or instruments. We can have also systematic errors due to the measurement procedure or denition. Lets consider as example, the measurement of the thickness of an elastic material. Lack in the procedure denition: measurement with a micrometer without dening the instrument applied pressure, the temperature, the humidity, etc... Lack in the procedure execution: drift of physical quantities supposed to be stationary such as pressure, temperature, etc... Systematic errors in principle can be completely removed.

1.2. UNCERTAINTIES ON MEASUREMENTS

13

x1

x*

x2

Figure 1.3: Accuracy and precision comparison. The gray strips represent represent the value the uncertainty associated with the measurement xi , x of the measured physical quantity with no perturbations. Measurement x1 is more accurate but less precise than measurement x2 .

Figure 1.4: Accuracy and precision comparison. The shooter of the target on the left is clearly more precise that the shooter of the right target. Anyway, the former shooter is more accurate than the other shooter. Which shooter would you like to have as your bodyguard?

1.2.1

Accuracy and Precision

Here we explain two denitions that allow to compare different measurements and establish their most important relative qualities: Accuracy: a measurement is said to be accurate if it is not affected by systematic errors. This characteristic does not preclude the presence of small or large random error. Precision: a measurement is said to be precise if it is not affected by random errors. This characteristic does not preclude the presence of any type of systematic errors.

14

CHAPTER 1. PHYSICAL OBSERVABLES

Even if those properties denitions are absolute, real measurements can only approximate the concept of precision and accuracy and therefore we can only establish if one measurement is more accurate or precise than others. Lets analyze the two examples shown in Figures 1.3, and 1.4. In Figure 1.3, for some mysterious reasons we know the unperturbed value of the physical quantity to be x ; then we can conclude that measurement x2 is more precise but less accurate than measurement x1 , and x1 is more accurate but less precise than x2 . In Figure 1.4, the left shooter is quite precise but not accurate as the right shooter, and the right shooter is more accurate but less precise than the left shooter. In general, accuracy and precision are antithetic properties. A measurement with very large random errors can be extremely precise because the systematic error is negligible with respect to the random error. An analogous statement can be formulated for extremely accurate measurements.

1.3

Measurement and Probability Distribution

It is experimentally well known (or accepted) that physical quantities are random variables, whose distribution follows quite often, and with good precision the so called Gaussian or Normal Distribution3 . In other words, if we are very good to keep the experimental conditions unchanged, we perform an experiment several times, and then histogram the results we will probably nd a bell-like curve which is proportional to the Gaussian probability density function. The Gaussian distribution p( x ) = 1 2 e

( x )2 2 2

is a continuous function (see gure 1.5) with one peak, symmetric around a vertical axis crossing the value , and with tails exponentially decreasing to 0. It has therefore one absolute maximum at x = . The peak width is dened by the parameter .
general, statistics are not able to provide a necessary and sufcient test to check if a random variable follows the Gaussian distribution (Gaussianity)[4].
3 In

1.3. MEASUREMENT AND PROBABILITY DISTRIBUTION


0,4

15

Pobability Density (1/A.U.)

0,3

0,2

2 0,1

0 -4

2 2 Normally distributed physical quantity (A.U.)

Figure 1.5: Probability density of a physical quantity following the Gauss/Normal distribution. The x axis units are normalized to , i.e. the square root of the variance of the distribution. The hashed area represents a probability of 68.3%, which correspond to the probability of having x in the symmetric interval ( x , x + ). The probability dP to measure a value in the interval ( x, x + dx ) is dP( x ) = 1 2 e

( x )2 2 2

dx ,

and in general the probability to measure a value in the interval ( a, b) is


b

P( a < x < b) =

1 2

( x )2 2 2

dx .

The most probable value lies indeed inside an interval centered around , and the probability to have a value in the interval ( , + ) is 68.3% and is represented by the dashed area of gure 1.5. The statistical result of a measurement with a probability/condence level of 68.3% is written as x = ( x0 )Units,

[68.3% condence]

16

CHAPTER 1. PHYSICAL OBSERVABLES

The half width of the interval is the uncertainty, or the experimental error, or simply the error that we associate to the measurement of x. Quite often, the Gaussian distribution parameters are unknown, and the problem arises to experimentally estimate and . The theory of statistics comes into play to give us the tools to estimate parameter distributions, which allow us to dene the uncertainty of the measurement. The basic idea of statistics is to use a large number of measurements of the physical quantity (samples) to estimate the parameters of the distribution. In the next paragraphs we will just state some basic results with some naive but quite intuitive explanations. They will be studied in more detail in chapter 5.

1.3.1

Gaussianity

As it has been said before, when we perform measurements physical quantities arise that behave as random variables following, to a good approximation, the NPDF. This experimental evidence is theoretically corroborated by the so called central limit theorem (see appendix A). Under a reasonably limited number of any random variof hypothesis, this theorem states that the average x able x is a random variable normally distributed, when the number of averages tends to innity. Often, the measurement of a physical quantity is the result of a intentional or unintentional average of several measurements and therefore tends to follows the Gaussian distribution. Deviations from a Gaussian distribution (gaussianity) are quite often time dependent. In other words, a physical quantity behaves as a Gaussian random variable for a given period of time. This happens mainly because it is always difcult to keep the experimental conditions constant and controlled during the time needed to perform all the measurements. Sudden or slow uncontrolled changes of the system can easily modify the parameters or the PDF of the physical quantity we are measuring. Anyway, it is important to stress that the gaussianity of a random variable or more in general the type of PDF should be always investigated.

1.3. MEASUREMENT AND PROBABILITY DISTRIBUTION

17

1.3.2

Gaussian Distribution Parameter Estimation for a Single Variable

Lets suppose that we measure a normally distributed physical quantity x N times, obtaining the results x1 , x2 , ..., x N . If no systematic errors are present, it is probable that the average x x1 + x2 + ... + x N N becomes closer to the value , when the number of measurement N in is the so called estimacreases4 . We can assume indeed that the average x of . To distinguish between the parameter and its estimator we will tor use the symbol ^, i.e. = x =x ,

Averaging the square of the distance of each single measurement xi , we will have from )2 + ( x2 x )2 + ... + ( x N x )2 ( x1 x . N Because we are averaging the distance squared between the theoretical and experimental data-points, it is reasonable to assume that the squareroot of this value is an estimator of the uncertainty of each single measure is an estimator of the of the ment xi . A rigorous approach shows that distribution, and a more rigorous approach will show that an even better estimator is the sum of the square of the distances divided by N 1, i.e. 2 = 2 =
N 1 x i )2 , ( N 1 i =1

1.3.3

Gaussian Distribution Parameter Estimation for the Average Variable

of a normally distributed variable x, is a random variable The average x itself, and it must follow the normal distribution. What is the x of the ? It can be proved that normal distribution associated to x
is possible and probable also to have a single measurement much closer to than the average, but this does not help to nd an estimate of .
4 It

18

CHAPTER 1. PHYSICAL OBSERVABLES

2 x =

2 N

x .

1.3.4

Gaussian Distribution Parameter Estimation for the Weighted Average

Lets suppose now that N measurements of the same physical quantity x1 , x2 , ..., x N , are still normally distributed, but each measurement has known 2 , 2 , ..., 2 . variance 1 2 N Considering that the variance is a statistical parameter of the measurement precision, to calculate the average we can use the i as weight for each measurement, i.e. = x where wi = 1 iN =1 w i

i =1

wi x i .
1 , i2

weighted average

i = 1, 2, ..., N .

is The uncertainty associated with the weighted random variable x


2 x

1
2 iN =1 1/ i

It is left as exercise to check what happen to the previous equation when 1 = 2 = ... = N . To try to digest these new fundamental formulas, Lets see two examples.

1.3.5

Example (Unweighted Average)

The voltage difference V across a resistor directly measured 10 times, gives the following values:
i Vi [mV] 1 123.5 2 125.3 3 124.1 4 123.9 5 123.7 6 124.2 7 123.2 8 123.7 9 124.0 10 123.2

1.3. MEASUREMENT AND PROBABILITY DISTRIBUTION is, indeed The voltage difference average V
10 = 1 Vi = 123.880 mV V 10 i=1

19

Assuming that V is a random variable normally distributed, we will have that uncertainty sv on each single measurement Vi is
10 1 )2 = 0.6070 mV. (Vi V 10 1 i =1

sV =

will be and the uncertainty on V sV = 0.1919 mV. sV = 10 Finally, we will have5 V1 = (123.5 0.6)mV, V2 = (125.3 0.6)mV, ..., V10 = (123.2 0.6)mV, = (123.880 0.192)mV. V

1.3.6

Example (Weighted Average)

The reectivity R 6 of a dielectric mirror, measured with 5 sets of measurements , gives the following table

i Ri si

1 0.4932 0.0021

2 0.4947 0.0025

3 0.4901 0.0032

4 0.4921 0.0018

5 0.4915 0.0027

5 The cumbersome notation of each single measurement is used here to avoid any kind

of ambiguity. Whenever possible results should be tabulated. 6 The reectivity of a dielectric mirror for a polarized monochromatic light of wavelength , with a incident angle can be dened as the ratio of the reected light power and the impinging power.

20

CHAPTER 1. PHYSICAL OBSERVABLES

Assuming that R is a normally distributed random variable, we will is have that the uncertainty s R on the weighted average R sR = The weighted average R = s2 R R Finally, we will have = 0.49252 0.00103. R 1
2 5 i =1 1/ si

= 0.001037.

i =1

Ri = 0.492517 s2 i

Chapter 2 Propagation of Errors


2.1 Propagation of Errors Law

f(x0)+df

f(x 0) f(x 1) +df f(x1)

x1

x1+dx

x0

x0+dx

Figure 2.1: Variation of f ( x ) at two points x0 , and x1 . The derivative accounts for the difference in magnitude variation of the function f ( x ) at different points. We want to nd a method to approximate the uncertainty of a physical quantity f , which has been indirectly determined, i.e f is a function of a a set of random variables that are physical quantities. To focus the problem it is better to consider the case of f as a function 21

22

CHAPTER 2. PROPAGATION OF ERRORS

of a single variable x with uncertainty x . The differential df = df dx dx


x = x0

represents the variation of f for the corresponding variation dx around x0 (see gure 2.1). Imposing dx = x , we can interpret the following expression as an estimation of the uncertainty on f df f x , dx x0 where x0 is the measured value. The absolute value is calculated to take into account a possible negative sign of the derivative. This formula can be easily extended, considering the denition of the differential of a function of n variables x1 , x2 , ..., xn , i.e. f f f f x1 + x2 ... + xn . x1 x2 xn (2.1)

The previous expression is called the law of propagation of errors, which gives an estimation of the maximum uncertainty on f for a given set of uncertainties x1 , ..., xn . The derivatives are calculated using the measured values of the physical quantities x1 , x2 , ..., xn . This law is rigorously exact (but not statistically correct as shown in the next section) in the case of a linear function, where the Taylor expansion coincides with the function itself. This formula is quite useful during the design stages of an experiment. In fact, because of its linearity it is easy to apply and estimate the contributions to the error of measured physical quantities. Moreover, this estimate can be used to minimize the error and optimize the measurement (see example 2.3.1)

2.2

Statistical Propagation of Errors Law (SPEL)

A more orthodox approach, which starts from a Taylor expansion formula of the variance, gives the statistically correct formula for the propagation

2.2. STATISTICAL PROPAGATION OF ERRORS LAW (SPEL)

23

of errors (see appendix B) . For the case of a function f of two random variables x, and y and the two data points ( x0 x0 ), (y0 y0 ), we have 2 f = f x
2 2 x + 0

f y

2 y +2 0

f x

f y

E [( x x0 ) (y y0 )]

where the partial derivatives are calculated at ( x0 , y0 ). This expression is called the law of statistical propagation of errors (SPEL). In the special case of uncorrelated random variables, i.e. variables having independent PDF, the previous equation becomes 2 f

f x

2 x 0

f y

2 y . 0

The most general expression for SPEL and its derivation can be found in appendix B.

2.2.1

Example 1: Area of a Surface

Lets suppose that the area A of a rectangular surface having side lengths a and b is indirectly measured by measuring the sides. We will have A = ab,

2 A

2 2 b2 a + a2 b .

If the surface is a square, we could write A = a2 ,

2 A

2 2a2 a ,

which implies that we are assuming that the square is perfect, and it is sufcient to measure one side of the square.

2.2.2

Example 2: Power Dissipated by a Circuit

Suppose we want to know the uncertainty on the power P = V Icos , dissipated by an AC circuit where V and I are the sinusoidal voltage and current of the circuit, and the phase difference between the V and I .

24

CHAPTER 2. PROPAGATION OF ERRORS

Applying the SPEL and supposing that there is no correlation among the variables, we will have
2 2 2 ( Icos)2 V + (Vcos)2 I + (V Isin)2 (2.2) V 2 2 = P2 (2.3) + I + (tan )2 . V I Considering the following experimental values with the respective estimates of their expectation values and uncertainties, we get V = (77.78 0.71)V, P = (90.37 5.39)W. I = (1.21 0.071)A, = (0.283 0.017)rad, 2 P

2.2.3

Example 4: Improper Use of the Formula


E[ x ] = 0, V [ x ] = ,

Let consider a random variable x which follows a NPDF with and a function f of x, f ( x ) = x2 . Applying the SPEL to f ( x ), we obtain f = [2x ] x=0 x = 0,

which leads to a wrong result, because this approximation (see eq. B.1 in Appendix B) is not legitimate. In fact, there is no need to expand the function up to the second order term to understand that the second order expansion (i.e. the function itself) is not negligible at all. Considering the denition of variance instead, and with the aid of the integration by parts formula, we get the correct result f = V [ f ( x )] = V [ x2 ] = 2.

2.3

Relative Uncertainties

Let f be a physical quantity and f its uncertainty. The ratio f f = f

2.3. RELATIVE UNCERTAINTIES

25

is said to be the relative uncertainty or fractional error of the physical quantity f . The importance of this quantity arises mainly, when we have to analyze the contribution of each uncertainty to the uncertainty of the physical quantity f . Expression (2.1) becomes quite useful for a quick estimate of the relative uncertainty .

2.3.1

Example 1:

We want to measure the gravity constant g with a precision of 0.1% ( g = .001) using the equation of the resonant angular frequency of a simple pendulum of length l , i.e. 0 = g . l

Applying the logarithm in the previous expression, and making the derivative of both sides , we get 1 0 = 0 2 g l + g l .

Considering that uncertainties cannot exactly cancel out, we will have g = 2 0 l + , 0 l

which means that the contribution of the uncertainty on 0 and on l are linear and different just by a factor two. Supposing that we are able to measure 0 within less than 0.1%, we have to be able to measure l at least with the same precision to guarantee the required precision . If we have l = 1m

l < 0.001m.

This formula tells us that the accuracy on the knowledge of l must be smaller than 1mm to measure g within 0.1%.

26

CHAPTER 2. PROPAGATION OF ERRORS

2.4

Measurement Comparison

We can use the SPEL to determine if two different measurements of the same physical quantity x are statistically the same. Lets suppose that the following measurements x1 x1 , x2 x2 ,

are two independent measurements of the physical quantity x. The difference and the uncertainty of the difference will be, indeed x = | x2 x1 |, x =
2 + 2 . x x2 1

We can assume as test of condence that the two measurements are statistically the same if x < 2 x .

2.4.1

Example
I1 = (4.398 1.256) 102 kg m2 I2 = (4.431 1.324) 102 kg m2

Suppose that the following measurements

are two measurements of the moment of inertia of a cylinder. We will have I = (0.033 1.825) 102 kg m2 , which shows that I = | I2 I1 |, is less than 2 I . Statistically, the two measurements must be considered two determinations of the same physical quantity.

Chapter 3 Graphical Representation of Data


3.1 Introduction

A good way to analyze a set of experimental data, and review the results, is to plot them in a graph. It is important to provide all the information necessary to correctly and easily read the graph. The choice of the proper scale and the type of scale is also important. For a reasonably good understanding of a graph, the following information should be included: a title, axis labels to dene the plotted physical quantities, the physics units of the plotted physical quantities, a dot corresponding to each experimental point, and error bars or error rectangles, graphical analysis made on the graph, in particular the data-points used clearly labeled, a legend if more than one data set is plotted. Figure 3.1 shows an example of a graph containing all the information needed to properly read the plot. Judgment of the curve t goodness is quite often done by inspection of the graph, the data points and the theoretical tted curve, or better, by analyzing the so called difference plot. 27

28

CHAPTER 3. GRAPHICAL REPRESENTATION OF DATA

Current -Voltage Characteristic for a Carbon Resistor

12

y(x)=ax+b a=(996.3+-12.9)
10

Voltage difference across the Resistor, (V)

b=(-0.103+-0.094)mA

0 0 2 4 6 8 10 12

Current through the Resistor, (mA)

Figure 3.1: Example of Graph.

3.2. GRAPHICAL FIT

29

3.2

Graphical Fit

Nowadays, graphic curve tting of experimental data can be considered a romantic or nostalgic way to obtain an estimate of function parameters. Anyway, we might still face a situation where the omnipresent computer cannot be accessed to perform a statistical numerical t. Moreover, drawing a graph with data points is an exercise with a lot of pedagogical value, and therefore deserves to be studied. Statistical curve t techniques will be explained in chapter 5.

3.2.1

Linear Graphic Fit

Lets consider the case of the t of a straight line y = ax + b (3.1)

where the two parameters, the slope a, and the intercept b must be graphically determined. Lets assume that we are able to trace a straight line, which ts the experimental points reasonably well. Considering then, two points A = ( x1 , y1 ), and B = ( x2 , y2 ) belonging to the straight line, eq. (3.1), and some trivial algebra, we will have the two estimators of a, and b = a y2 y1 , x2 x1 = x2 y1 x1 y2 , b x2 x1 x1 < x2 .

Anyway, because of the previous assumption we still need an objective criterion to trace the straight line. If we draw a rectangle centered on each data point, having sides corresponding to the data-point uncertainties, we will obtain a plot similar to that shown in gure 3.2. Using a ruler, and applying the following two rules we are able to trace two straight lines with maximum and minimum slopes: The maximum slope is such that if we try to draw a straight line with steeper slope, not all the rectangles will be intercepted, The minimum slope is such that if we try to draw a straight line with less steep slope, not all the rectangles will be intercepted,

30

CHAPTER 3. GRAPHICAL REPRESENTATION OF DATA

We will then have two estimations of a, and b, whose averages will give estimated values of the slope and the intercept. Their semi-differences will give the maximum uncertainties associated with them max + a min a , 2 = bmax + bmin , b 2 max a min a , 2 = bmax bmin . b 2 = a

= a

(3.2) (3.3)

Figure 3.2 shows an example of graphic t applying the above-stated max/min slope criteria. Using the data point A, B, C, and D we obtain 12.4 (0.9) 1.0 min = 12.0 = 1.108k a 13.50.0 = 0.815 12.0 0.0 13.51.00.012.0 max = 12.0 (0.9) 0.0 12.4 = 0.9V b = 1.0V b 13.50.0 min = 12.0 0.0 max = a and nally = (1.0 0.1)k a = (0 1)V b If we cannot have all the points intercepted by a straight line, and we really need to give some numbers for the slope and intercept, we could use this additional but very subjective thumb rule: The maximum and the minimum slope straight lines are those lines, which make the straight line computed using eq.s.(3.2) and (3.3) intercept at least 2/3 of the rectangles. This rule tries to empirically take into account the results of statistics, when applied to a curve t. It is indeed better to use a statistical tting methods as explained in chapter 5.

3.2.2

Theoretical Points Imposition.

Imposing theoretical points to the t curve implies that we are assuming that the uncertainty of each experimental point is not dominated by any

3.3. LINEAR PLOT AND LINEARIZATION

31

systematic error (which is negligible compared to random errors). In fact, if we have a systematic error, the theoretical points will be not be aligned properly with the experimental points. This is likely to cause a large systematic error on the parameter estimation, especially in the case of statistical ts. Figure 3.3 shows an examples of graphic t that imposes a zero point crossing on the max/min straight lines. Using the data-points A and B we obtain max = a and nally 12.1 = 1.008k 12.0 min = a 12.0 = 0.916 13.1

= (0.96 0.05)k a

Statistically, this new measurement of a agrees with the previous one within their uncertainty. Which measurement is more accurate is difcult to say. A statistical analysis can reduce the uncertainty, giving a more precise measurement.

3.3

Linear Plot and Linearization

The graphical tting methods for straight lines can be extended to apply to non-linear functions through the so called process of the linearization. In other words, if we have a function, which is not linear we can apply functions to linearise it. We can mathematically formulate the problem in the following terms. Lets suppose that y = y ( x ; a, b ) is a non-linear function, with two parameters a and b. If we can nd transformations Y = Y ( x , y ), X = X ( x , y ), that allow us to write the following relation Y = XF ( a, b) + G ( a, b), where F, and G are known expression that only depend on a, and b, then we have linearized y. Once the F and G values are found with graphical

32

CHAPTER 3. GRAPHICAL REPRESENTATION OF DATA


Voltage-Current Characteristic for a Carbon Resistor

D
12

10

Voltage Difference Across the Resistor (V)

8
Exp. Points Limiting the Min Slope Exp. Points Limiting the Max Slope

A=( 0.0mA, 1.1V) B=(13.5mA, 12.0V) C=( 0.0mA, -0.9V) D=(12.0mA, 12.4V)

A
0

C
-2
0 2 4 6 8 10 12 14
Current through the Resistor (mA)

Figure 3.2: Maximum an minimum slope straight lines intercepts all the experimental points. If we try to draw a straight line that is steeper than the maximum slope line, some of the points will not be intercepted; the same is true for the minimum slope line, if we try to draw a line with a slope that is less steep.

3.3. LINEAR PLOT AND LINEARIZATION

33

Voltage-Current Characteristic for a Carbon Resistor

12

Exp. Point Limiting the Max. Slope

A B

10

Voltage Difference Across the Resistor (V)

8
Exp. Point Limiting the Min Slope

A=(12.0mA, 12.1V) B=(13.1mA, 12.0V)

-2
0 2 4 6 8 10 12 14
Current through the Resistor , (mA)

Figure 3.3: Maximum an minimum slope straight lines with zero crossing point imposed. The comments in the previous graphs also apply to this gure.

34

CHAPTER 3. GRAPHICAL REPRESENTATION OF DATA

or numerical methods a and b can be found by inversion of F and G. The uncertainties on a and b can be calculated using the SPEL. Quite often, the complexity of inverting F and G can make the linearization method impractical. Sometimes the linearization of a function can be achieved with nonlinear scales, as shown in the next sections.

3.3.1

Example 1: Square Function


y = ax2 .

Lets suppose that

If we dene (the two functions to linearise the equation) X ( x ) = x2 , then we will have the linear function Y ( X ) = aX , that can be plotted in a linear graph. The parameter a is now the slope of a straight line, and it can be estimated using the method already explained. Y (y) = y

3.3.2

Example 2: Power Function


y = bx a , (3.4)

If we have

applying the logarithm to both sides of the previous expression, we will have log y = a log x + log b. If we then dene the following functions Y = log y, we will have the linear function Y = aX + log b. In this case the slope a and the intercept b of the straight line are respectively the exponent and the coefcient of function y. X = log x,

3.4. LOGARITHMIC SCALES

35

3.3.3

Example 3: Exponential Function


y = be ax ,

If we have applying the logarithm to the right-hand sides, we have log y = ax + log b. If we dene the following functions Y = log y, nally have the linear function to plot Y = aX + log b. X = x, we will

3.4

Logarithmic Scales

A logarithmic scale is an axis whose scale is proportional to the logarithm of the plotted value. The base of the logarithm is arbitrary for logarithmic scales, and for the sake of simplicity we will assume base 10. For example, if we plot on a logarithmic scale the numbers log10 0.1 = 1, log10 1 = 0,and log10 10 = 1, they will be exactly equally-spaced. In general any multiple of power of 10 will be also equally-spaced. This distance is a characteristic of the logarithmic scale sheet and is called a decade. One of the main advantages of using logarithmic scales is that we are able to plot ranges of several orders of magnitude on a manageable sheet. The inconvenience is the great distortion of the plotted curves, which sometimes can lead to a wrong interpretation of the results. This distortion becomes small when the range amplitude of the scale x = (b a), is smaller than the magnitude a of the range. In fact, if x a, we will have log( a + x ) = log a(1 + x x ) = log( a) + log(1 + ) a a log( a) + x , a

and the logarithmic scale is then approximated well by the linear scale. Another advantage of using logarithmic scales is for the linearization of functions as briey discussed in the next subsection.

36

CHAPTER 3. GRAPHICAL REPRESENTATION OF DATA

3.4.1

Linearization with Logarithmic Graph Sheets

There are essentially two cases that can be linearized using logarithmic graph sheets: If the experimental points follow a power law ( y = ax b ) we will obtain a straight line if we plot y and x on a logarithmic scale. If the experimental points follow an exponential law ( y = ab x , as per the linearization procedure of example 2 ), we will obtain a straight line if we plot the logarithm of y versus the logarithm of x.

3.5

Difference Plots

The ability to see how data points scatter from the theoretically t curve in a graph is quite important for assessing the quality of the t. Quite often, if we plot experimental points and the t curve together it becomes difcult to appreciate and analyze the difference between the experimental points and the curve. In fact, if the measurement range of a physical quantity is greater than the average distance between the experimental points and the curve, the data points and the curve become indistinguishable. One way to avoid this problem is to produce the so called difference plot, i.e the plot of the difference between the theoretical and the measured points. In a difference plot, the goodness of the t and/or the poorness of the theoretical model can be better analyzed.

3.5.1

Difference Plot of Logarithmic Scales

The difference of an experimental point y0 and a theoretical point y in a logarithmic scale is the dimensionless quantity = log(y) log(y0 ) = log Supposing that y = y0 + y, and considering the rst order expansion log(1 + ) 1 y y0 1, y y0 .

3.5. DIFFERENCE PLOTS we get = log 1 + y y0 y y y0 . y0 y0

37

Assuming small deviations between experimental and theoretical data, a difference plot with a vertical logarithmic scale is the relative uncertainy plot.

38

CHAPTER 3. GRAPHICAL REPRESENTATION OF DATA

Chapter 4 Probability Distributions


This chapter reports the basic denitions describing the probability density functions (PDF) and some of the frequently used distributions with their main properties. To comprehend the next chapters it is important to become familiar mainly with the rst section, where some new concepts are introduced. The understanding of each distribution is required since they will be used in the next chapters.

4.1
4.1.1

Denitions
Probability and Probability Density Function (PDF)

Lets consider a continuous random variable x. We dene the probability that x assumes a value in the interval ( a, b) the following integral P{ a < x < b} =
b a

p( x )dx,

where p( x ) is a function, which is called probability density function (PDF) of the random variable x. From the previous denition p( x )dx represents the probability of having of x in the interval ( x, x + dx ). It is worth noticing that in general p( x ) has dimensions [ p( x )] = [ x 1 ].

4.1.2

Distribution Function (DF)

The following function 39

40

CHAPTER 4. PROBABILITY DISTRIBUTIONS

F(x) =

p( x )dx

(4.1)

is called the distribution function (DF) or cumulative distribution function of p( x ). Considering the properties of the integral, and equation (4.1), we will have P { a < x < b } = F ( b ) F ( a ). The two following relations must be satised also
x + x

lim F ( x ) = 1, lim F ( x ) = 0.

(4.2) (4.3)

The rst limit, the so called normalization condition, represents the probability that x assumes anyone of its possible values. The second limit represents the probability that x does not assume any value.

4.1.3

Probability and Frequency

Lets suppose that we measure N times a random variable x, which can assume any value inside the interval ( a, b). Partitioning the interval into n intervals, and counting how may times k i the measured value is inside the i-th interval, we will have fi = ki , N i = 1, ..., n

which is the called frequency of the random variable x . The limit Pi ( x ) = lim ki , N i = 1, ..., n

is the probability of obtaining a measurement of x in the i-th interval. This experimental (unpractical) denition will be used in the next subsections.

4.1. DEFINITIONS

41

4.1.4

Continuous Random Variable v.s. Discrete Random Variable

We can always make a continuous random variable x become a discrete random variable X . Dening a partition1 of the domain [ a, b) of x = { a = x0 , x1 , x2 , ..., xn = b}, we can compute the probability Pi associated with each interval [ x, xi+1 ). If we dene the discrete variable X , assuming one value per each interval2 X = { X1 , X2 , ...}, then we will have dened a new discrete random variable X , with probability Pi .

4.1.5

Expectation Value
+

The following integral E[ x ] = x p( x )dx

is dened to be the expectation value of the random variable x. Because of the linearity of the operation of integration, we have E [ x + y ] = E [ x ] + E [ y ], where and are constant values.

4.1.6

Intuitive Meaning of the Expectation Value

The intuitive meaning of the expectation value can be easily understood in the case of a discrete random variable. In this case, we have that the expectation value becomes3 E[ X ] =
1 In

i =1

Xi P ( Xi ) ,

general, the partition can be numerable. In other words, it can have innite number of intervals but we can associate an integer number to each one of them. 2 There is an arbitrariness on choice of the values of X because we are assuming that i any value in each given interval is equiprobable. 3 In general, when we change between the continuous to discrete variable, we have that p( x )dx P( xi )

42

CHAPTER 4. PROBABILITY DISTRIBUTIONS

where P( Xi ) is the probability that X assumes the value Xi . Lets suppose that measuring X N times, we obtain M different values of X . Let k i be the number of times that we measure the value Xi , with i = 1, 2, ..., M. If we estimate P( Xi ) with the frequency k i / N of having the event X = Xi ki P ( Xi ) , N we will have 1 M E[ X ] Xi k i , N i =1 which corresponds to the average of the N measurements of the variable X. We can conclude that experimentally, the expectation value of a random variable is estimated by the average of the measured values of the random variable.

4.1.7

Variance

The expectation value of the square of the difference between the random variable x and its expectation value is called variance of x, i.e. V [ x ] = E[( x E[ x ])2 ]. A more explicit expression gives V [x] =

( x )2 p( x )dx

Using the properties of the expectation value and of the distribution function, we obtain V [ x ] = E [ x 2 ] E [ x ]2 . A common symbol used as a shortcut for the variance is the Greek letter sigma, i.e. 2 = V [ x ]. The variance has the so called pseudo linearity property. Considering two random variables x, and y, we will have V [ x + y ] = 2 V [ x ] + 2 V [ y ], where and are constant values.

4.2. UNIFORM DISTRIBUTION

43

4.1.8

Intuitive Meaning of the Variance

To provide an intuitive meaning of the variance, we can still make use of a discrete random variable X . In this case, we will have V [X] =

i =1

(Xi E[X ])2P(Xi ),

which shows that the variance is just the sum of the square of the distance of each experimental point to the expectation value weighted by the probability to obtain the measurement. Estimating the probability with the frequency, we will have that V [ X ] is just the average of the the square of the distance of the experimental points Xi from their average, i.e. V [X] 1 N

i =1

(Xi E[X ])2 ki ,

We can conclude that the variance is estimated by the average of the square of the distances of the experimental values from their expectation value.

4.1.9

Standard Deviation

The square root of the variance is dened as the standard deviation of the random variable x, i.e. =

( x )2 p( x )dx

4.2

Uniform Distribution
1/(b a) x [ a, b] 0 x / [ a, b ]

The following PDF of the random variable x p ( x ; a, b ) =

p( x ) is called a uniform probability density function, and x is said to be uniformly distributed in the interval [ a, b] (see gure 4.1). This PDF dictates that any value in the interval [ a, b] has the same probability.

44
0.5 0.4 [AU1] 0.3 0.2 0.1 0 5

CHAPTER 4. PROBABILITY DISTRIBUTIONS

a = 1 b a = 2 b a = 3 b a = 4 b

=1 =2 =3 =4

p(x;a,b)

1 x

0 [AU]

1 0.8 [#] P(x;a,b) 0.6 0.4 0.2 0 5 a = 1 b a = 2 b a = 3 b a = 4 b 4 3 2 1 x 0 [AU] 1 2 3 4 =1 =2 =3 =4 5

Figure 4.1: Uniform probability density function p( x; a, b) and its cumulative function P( x; a, b) for different intervals [ a, b]. The cumulative distribution function is x < a, 0 xa axb P ( x ; a, b ) = b a 1 x>b The expectation value of x is E[ x ] = and the variance is a+b , 2

1 ( b a )2 . 12 The calculation of E[ x ], and V [ x ], are left as exercise. V [x] =

4.2. UNIFORM DISTRIBUTION

45

4.2.1

Random Variable Uniformly Distributed

Lets suppose that measuring N times a given physical quantity x we always obtain the same result x0 . In this case, we cannot study how x is statistically distributed. With this limited knowledge, a reasonable hypothx x esis is that x is uniformly distributed in the interval x0 2 , x0 + 2 , where x is the instrument resolution. Under this assumption, the best estimate of the uncertainty on x is indeed

x ba = . x = 2 3 12 This is a case where the statistical uncertainty cannot be evaluated from the measurements, and has to be estimated from the instrument resolution. 4.2.1.1 Example: Ruler Measurements

The measurement of a distance d with a ruler with a resolution x = 0.5mm (half of the smallest division) is repeated several times, and gives always the same value of 12.5mm. Assuming that d is uniformly distributed in the interval [12.25, 12.75], the statistical uncertainty associated to d will be 0.5 d = = 0.144mm . 2 3 4.2.1.2 Example: Analog to Digital Conversion

The conversion of an analog signal to a number through an Analog to Digital Converter (ADC), is another example of creation of a uniformly distributed random variable. In fact, the conversion rounds the analog value to a given integer number. The integer value depends on which interval the analog value lies in, and the interval length V is the ADC resolution. Then, it is reasonable to assume that the converted value follows the uniform PDF. If the ADC numerical representation is 12bit long, and the input/dynamic range is from -10V to 10V the interval length is V = 10 (10) 212 4.88mV ,

46

CHAPTER 4. PROBABILITY DISTRIBUTIONS

and the uncertainty associated to each conversion will be 4.88 V = 2 3 1.4mV .

Anyway, the previous statement about the distribution followed by the converted value is not general, and not applicable to any ADC. For example, there are some numerical techniques applied to converters that change the statistical distribution of the converted values. As usual, we have to investigate which PDF the random variable follows.

4.3

Gaussian Distribution (NPDF)


1 22
( x )2 2 2

The following PDF of the random variable x p( x ) = e

xR

(4.4)

is said Gauss/Normal probability density function (NPDF) centered in and with variance 2 . The variable x is said to be normally distributed around (see gure 4.2). Lets make a list of some important properties of the NPDF: The variance and the expectation values are respectively E [ x ] = , V [ x ] = 2 .

Calculation of E[ x ], and V [ x ], are left as exercise. (4.4) is symmetric around the vertical axis crossing the point x = . (4.4) has one absolute maximum in x = . (4.4) has exponentially decaying tails

|x|

| |,

p( x ) e

x2 2 2

4.3. GAUSSIAN DISTRIBUTION (NPDF)


0.4

47

[AU1]

0.3

p(x;,)

0.2

=1 =2 =3 =4 =5

0.1

0 8

2 x

0 [AU]

1 0.8 [#] P(x;,) 0.6 0.4 0.2 0 8 =1 =2 =3 =4 =5 6 4 2 x 0 [AU] 2 4 6 8

Figure 4.2: Gaussian probability density function p( x; , ) and its cumulative function P( x; , ) for different values of . the analytical cumulative function P( x ) = 1 22
x

dx e

( x )2 2 2

(4.5)

which is not an elementary function. It is left to the student to verify the normalization condition (4.2) for the equation (4.4).

4.3.1

Standard Probability Density Function

Applying the following transformation to the NPDF

48

CHAPTER 4. PROBABILITY DISTRIBUTIONS

x , we obtain the so called standard probability density function t=


t2 1 p(t) = e 2 , 2

x R,

with E[t] = 0, and 1 P(t) = 2 V [t] = 1 ,


t

dt et

2 /2

4.3.2

Probability Calculaltion with the Error Function


2 erf(t) =
t 0
2

Considering the denition of error function dx e x ,

then P {t1 t t2 } = 1 2 erf t1 2 t2 + erf 2 , 0 t1 t2 .

The error function is usually available in most of modern programming languages implementations and even in some pocket calculators. Calculating the probability of a normally distributed variable x using P(t) , and for an arbitrary interval containing , is straightforward. In fact, because of the properties of the integral operator we have P { a x b; , } = 1 2 erf a 2

+ erf

b 2

a b.

For a symmetric interval about , we have P { a x a ; , } = erf a 2 , a .

The demonstration of the three previous formulas are left as exercise.

4.4. EXPONENTIAL DISTRIBUTION


1

49

x0 = 1 0.8 [AU1] 0.6 0.4 0.2 0 x0 = 2 x0 = 3 x0 = 4 x0 = 5

p(x;x0)

5 x [AU]

10

1 0.8 [#] 0.6 0.4 0.2 0 x0 = 1 x0 = 2 x0 = 3 x =4


0

P(x;x0)

x0 = 5 0 1 2 3 4 5 x [AU] 6 7 8 9 10

Figure 4.3: Exponential probability density function p( x; x0 ) and its cumulative distribution function P( x; x0 ) for different values of x0 .

4.4

Exponential Distribution
x0 > 0

The following PDF of the random variable x 1 x/x 0 0 x < , x0 e p ( x ; x0 ) = 0 x < 0,

is the exponential probability density function, and x is therefore exponentially distributed in the interval [0, ) (see gure 4.3). The cumulative distribution function is x 1 e x / x0 dx = 1 e x/ x0 , P ( x ; x0 ) = 0 x0

50

CHAPTER 4. PROBABILITY DISTRIBUTIONS

the expectation value of x is E[ x ] = and the variance V [x] =


+
0

+
0

x x / x0 e dx = x0 , x0

( x x0 )2

1 x / x0 2 e dx = x0 . x0

The calculation of E[ x ], and V [ x ], are left as exercise.

4.4.1

Random Variable Exponentially Distributed

The decay time of an unstable particle measured in its rest frame is a random variable that follows the exponential distribution whose PDF is p( ) = 1 /0 e . 0

The quantity 0 is called the mean lifetime of the particle. In other words, the previous formula gives the probability of an unstable particle to decay after a time interval ( , + d ) measured in its rest frame. Lets demonstrate the previous formula. If N (t) is the number of unstable particles at the time t, then the rate of decayed particle N after a time t will be N = N , > 0, t where is the propability of a decay in a time t. The assumption here is that the decay of each single particle is an independent random process, and therefore the rate of particles that decay must be proportional to the number of particles. The minus sign is necessary because the particles number is decreasing ( N 0). Considering N very large (lots of decays per unit time and therefore the variation in the particle number is almost continuous), then we can approximate with a continuous decay rate N dN

dN = dt . N

Integrating the previous differential equation we get

4.5. BINOMIAL/BERNOULLI DISTRIBUTION

51

N (t) = N0 et ,

N (t = 0) = N0 .

The previous formula give us the particles number survived after a time t. The PDF of one single particle to survive after a time t p(t) = d N (t) d = et = et dt N0 dt

1 0 = .

Experiment 26 of the ph7 sophomore lab is an interesting study of unstable decay which uses Californium 252 (252 Cf) neutrons source to activate silver atoms.

4.5

Binomial/Bernoulli Distribution

Lets suppose that we perform an experiment, which only allows to obtain . The rst event has probability P( A) = p two results/events4 A, and A ) = (1 p ). and the other event has necessarily probability P( A If we repeat the experiment N times, the probability of having k times ) is the event A (and indeed having N k times A P N , p (k) = where N k n k p k (1 p ) N k , n! (n k)! k! 0p1 (4.6)

is the binomial coefcient. (4.6) is called the Binomial or Bernoulli distribution. The expectation value of k and its variance are respectively E [ k ] = N p, V [ k ] = N p (1 p ). The calculation of E[k ], and V [k], are left as exercise.
kind of experiment which has more than one, numerable, or innite results can be arbitrarily arranged into two set of results, and indeed into two possible events.
4 Any

52
0,18 0,16 0,14 0,12 Probability 0,1 0,08 0,06 0,04 0,02 0

CHAPTER 4. PROBABILITY DISTRIBUTIONS

10

15

20

25 Events

30

35

40

45

50

Figure 4.4: Poisson Distribution Pm (k ) for different values of the parameter m (5, 10, 15, 25) For large values of N ( N > 20, and especially for p 1/2), P(k ) becomes well approximated by the NPDF. The Poisson distribution which is described in the next section, is an asymptotic limit of the binomial distribution when N .

4.6

Poisson Distribution

Imposing the following conditions to the binomial distribution N p 0 N p = m = const., we obtain the so called Poisson distribution (see gure 4.4) Pm ( k ) = mk m e . k!

4.6. POISSON DISTRIBUTION The expectation value of k and its variance are respectively E [ k ] = m, V [ x ] = m.

53

The calculation of E[k ], and V [k], are left as exercise. The meaning of this distribution is essentially the same as the binomial distribution. It represents the probability of measuring an event k times when the number of measurements N is very large, i.e N m. Lets mention some important qualitative properties of the Poisson distribution. For m < 20, Pm (k) is quite asymmetric and the expectation value does not coincide with the maximum. For m 20, Pm (k) is quite symmetric, the expectation value is very close to the maximum, and the curve is quite well approximated by a Gaussian distribution with = m, and = m. If k1 , k2 , ..., k n are n measurements of k, a good estimator of m is the average of the measurements, i.e. 1 n = ki , m n i =1 and the estimated uncertainty on each single measurement is indeed k =

. m

is The uncertainty on the estimator m m = k = n m . n

The demonstration of the validity of these estimators is based on concepts explained in chapter 5.

54

CHAPTER 4. PROBABILITY DISTRIBUTIONS

4.6.1

Example: Silver Activation Experiment

A classical example of application of the Poisson distribution is the statistical analysis of atomic decay. In this case, the Poisson variable is the total number of decays measured during a given time . The number N is the number of atoms that can potentially decay, which is normally quite difcult to make very small, making the approximation N quite good. Lets consider here some real data taken from the activation of the silver with a radioactive source: Number of measurements n = 20 Measurement time = 8s Table of measurements (with the average radioactive background decays already removed) i 1 k i (Counts) 52 i 11 k i (Counts) 59 2 46 12 48 3 61 13 63 4 60 14 50 5 48 15 55 6 55 16 56 7 53 17 55 8 59 18 61 9 56 19 49 10 53 20 39

Using the average as the estimator of m we get = m The uncertainty is k = The uncertainty on m is m = k = 1.6416 counts . n The mean number of decaying atoms during a period of = 8s is indeed = (53.90 1.64) counts m 1 1078 ki = = 53.90 counts n 20

= 7.3417 counts, m

4.6. POISSON DISTRIBUTION

55

Finally, neglecting the uncertainty on the measurement time , the statistical measurement of the decay rate obtained dividing by is R = (6.74 0.21) decays/s It is important to notice that a single long measurement gives the same result. In fact, considering the overall decay time , we will have m = 1078 counts = 1078 = 32.83 counts

= (1078.0 32.8) counts m

Because of the integration time is now n = 8 20 = 160s, we will have R = (6.74 0.21) decays/s The calculation of R using a single long measurement has essentially two issues: it does not allow us to check the assumption on the statistic distribution. There is no way to check for any anomalies during the data taking with just one single datum.

56

CHAPTER 4. PROBABILITY DISTRIBUTIONS

Chapter 5 Parameter Estimation


When a function depends on a set of parameters we are faced with the problem of estimating the values of those parameters. Starting from a nite set of measurements we can make a statistical determination of the parameters set. In the following sections we will examine two standard methods, the maximum-likelihood and least-square methods, to estimate the parameters of a PDF and of a general function of one independent variable.

5.1

The Maximum Likelihood Principle (MLP)

Let x be a random variable and f its PDF, which depends on a set of unknown parameters = (1 , 2 , ...n ) f = f ( x ; ). Given N independent samples of x, x = ( x1, x2 , ..., x N ), the quantity L( x; ) =

f ( xi ; )
i =1

is called the likelihood of f . L is proportional to the probability to obtain the set of samples x, assuming that the N samples are independent. The maximum likelihood principle (MLP) states that the best estimate of the parameters is the set of values which maximizes L( x; ). 57

58

CHAPTER 5. PARAMETER ESTIMATION

The MLP reduces the problem of parameter estimation to that of maximizing the function L. Because, in general, it is not possible to nd the parameters that maximize L analytically, numerical methods implemented in computers are often used.

5.1.1

Example: and of a Normally Distributed Random Variable

Lets suppose we have N independent samples of a normally distributed random variable x, whose and are unknown. Experimentally, this case corresponds to measuring the same physical quantity x several times with the same instrument. In this case L is L( x, ) =

1 2

exp

( x i )2 , 2 2 i =1

and we have to maximize it. Considering that the exponential function is monotone, we have just to study the argument of the exponential. Imposing the following conditions
N ( x )2 i 2 2 i =1 N ( x )2 i 2 2 i =1

= 0, = 0,

is sufcient to determine the absolute minimum of L. of Solving the rst equation respect to , we obtain the estimator = 1 N

i =1

xi ,

of which, substituted into the second equation, gives the estimator 2 = 1 N )2 . ( xi


N

i =1

5.1. THE MAXIMUM LIKELIHOOD PRINCIPLE (MLP)

59

The estimator of the variance is biased, i.e . the expectation value of the estimator is not the parameter itself; in fact 2] = E[ 1 1 N 2 .

Because of this it is preferable to use the following unbiased estimator s2 =


N 1 )2 . ( xi N 1 i =1

E [ s2 ] = 2 .

What is the variance associated with the average ? To answer to this question, lets consider the average variable, = x 1 N

i =1

xi ,

which must be a Gaussian random variable. Using the pseudo-linearity property, its variance can be computed directly, i.e. ] = V [x 1 2 . N

5.1.2

Example: of a set of Normally Distributed Random Variables

Lets suppose we have N independent samples of N normally distributed random variables x1 , x2 , ..., x N , having the same unknown expected value 2 , 2 ..., 2 . Experimentally, this case cor but different known variances 1 2 N respond to measuring the same physical quantity x several times with different instruments. In this case L is L( x, ) = 1 2
N N N ( x i )2 1 exp , i 2 2 i =1 i =1 i

and we have to maximize it. Imposing the following condition


N d ( x )2 i 2 d 2i i =1

= 0,

60

CHAPTER 5. PARAMETER ESTIMATION

is sufcient to determine the absolute maximum of L. of Solving this equation respect to , we obtain the estimator =

i =1

1/i2
2 1/k

xi ,

k =1

which is the weighted average of the random variables . What is the variance associated with the weighted average ? To answer to this question lets consider the weighted average of a random variable, N 1/i2 = N x x 2 i i =1 k=1 1/ k Using the pseudo-linearity property, its variance can be directly computed, i.e. 1 ] = N V [x i=1 1/i2

5.2

The Least Square Principle (LSP)

Lets suppose that we have a function y of one variable x, and of a set of parameters = (1 , 2 , ...n ) y = y ( x i ; ). If we have N pairs of values of ( xi , yi i ) with i = 1, ..., N , then the following quantity ( x; ) =
2

i =1

yi y ( xi ; ) i

is called the chi square1 of y. The Least Square Principle (LSP) states, that the best estimate of the parameters is the set of values which minimizes 2 . If i = for i = 1, ..., N , the 2 expression can be simplied because can be eliminated, so the search for the 2 minimum becomes easier .
1 2

must be considered a symbol, i.e. we cannot write =

2 .

5.2. THE LEAST SQUARE PRINCIPLE (LSP)

61

It is important to notice that in this formulation the LSP requires knowledge of the i (the uncertainties of the measurements yi ) and assumes no uncertainties are associated with the xi . In a more general and correct formulation the uncertainties of the xi should be taken into account (see appendix D). In general, uncertainties in the independent variable x can be neglected if xi xi yk yk i = 1, 2, ..., N , k = 1, 2, ...., N

It is easy to prove that in the case of a normally distributed random variable the MLP and the LSP are equivalent, i.e. applying the two principles to the Gaussian function yields the same estimators for and .

5.2.1

Geometrical Meaning of the LSP

Neglecting the uncertainties i , 2 is just the sum of the square of the distance between the curves points ( xi , y( xi )) and the points yi . The minimization of 2 minimization corresponds to the search of the best curve, which minimizes the distance between the points yi and the curves points, varying the parameters k . The introduction of the uncertainties is necessary if we want to perform a statistical analysis instead of just solving a pure geometrical problem.

5.2.2

Example: Linear Function

Lets suppose that the function we want to t is a straight line y( x ) = ax + b, where a and b are the parameters that we have to determine. For the sake of simplicity, we assume that all the experimental values y1 , y2 , ..., yn have the same uncertainty y , and that the uncertainties on x1 , x2 , ..., xn are negligible Applying the LSP to y( x ), we get 2 ( x ; a, b ) = 1 y

i =1

(yi axi b)2 .

62

CHAPTER 5. PARAMETER ESTIMATION

Computing the partial derivatives with respect to the parameters a and b and equating them to zero, 2 = 0, a 2 = 0, b and solving the linear system for a, and b, we nally get2 = a where = x 1 N )(yi y ) iN =1 ( x i x , N )2 i =1 ( x i x
2 iN iN =1 x i x =1 x i y i = y b , N )2 i =1 ( x i x

i =1

xi ,

= y

1 N

i =1

yi .

are the best estimators of parameters a and b given , and b The functions a by the LSP. Uncertainties on a and b can be estimated by applying the SPEL to the expressions. After some tedious algebra we get and b a

i =1

a yi

i =1

b yi

where the partial derivatives with respect to x1 , ...x N are neglected because we assumed that their uncertainties are negligible.

5.2.3

The Reduced 2 (Fit Goodness)


2

The Reduced 2 is dened as


N 1 yi y ( xi ; ) /( N d) = N d i =1 i 2

2 As already mentioned, the use of the hat symbol is just to distinguish the parameter

is the estimator of a. from its estimator, i.e. a

5.3. THE LSP WITH THE EFFECTIVE VARIANCE

63

where d , which is called the number of degrees of freedom, equals the number of parameters to be estimated. The meaning of the reduced 2 can be intuitively understood as follows. Because the difference between the tted valued y( xi ; ) and the experimental value yi is statistically close to i , we can naively estimate the 2 value to be equal to N ( i ) /i = N . Dividing 2 by N , we expect to obtain a number close to 1. A rigorous theoretical approach can explain the need to subtract d from N , but that is outside the scope of this introductory note. It is worthwhile to notice that this subtraction is not negligible for small values of N which are comparable to d . For N d < 20, the reduced 2 should be slightly less than one. When a reduced 2 is not close to one, the most likely causes are: Uncertainties on xi and/or on yi are too small 2 /( N d) > 1, Uncertainties on xi and/or on yi are too large 2 /( N d) < 1, a small number of data points too much scattered from the theoretical curve 2 /( N d) > 1, wrong tting function or poor physical model 2 /( N d) > 1, Poor gaussianity of the data points distribution 2 /( N d) > 1, A more rigorous statistical interpretation of the reduced 2 value implies the study of its PDF.

5.3

The LSP with the Effective Variance

To generalize the LSP in the case of appreciable uncertainties on the independent variable x, we can use the so called effective variance, i.e. i2 = f x
2 2 2 x + y . i i

where xi , and yi are the uncertainties associated with xi , and yi respectively. The derivative is calculated in xi . Substituting this new denition of i into the previous the denition of the 2 will take into account the effect of the uncertainty on x. The proof of this formula is given in appendix D.

64
x 10
3

CHAPTER 5. PARAMETER ESTIMATION


Thermistor Characteristic Curve Fit

4 1/T (K)

3.5

x 10 6 5 4 1/T (K) 3 2 1 0 1

Difference Plot

2 Resistance

3 (log(Ohm))

Figure 5.1: Linear t of the thermistor data. The difference plot clearly show the bad approximation of the linear t.

5.4

Fit Example (Thermistor)

Thermistors, which are devices made of a semiconductor material [5], are remarkably good candidates to study tting techniques. In fact, the temperature dependency of their energy band gap Eg ( T ), makes their resistance vs. temperature characteristics ideal for demonstrating some of the aspects and issues and of tting. The thermistor response is described by the equation R( T ) = R0 e Eg (T )/2kb T . where R is the resistance, T is temperature in Kelvin, and k b is the Boltzmann constant.

5.4. FIT EXAMPLE (THERMISTOR) Taking the logarithm of the previous expression, we have log R = log R0 + Eg ( T ) . 2k b T

65

Neglecting the temperature dependence of Eg , we can linearize the thermistor response as follows 1 2k 2k = b log R b log R0 , T Eg Eg

y=

1 , x = log R. T

Following what has been done in a published article [6], the corrections to the linear t can be introduced empirically using a polynomial expansion in log R, i.e. 1 = C0 + C1 log R + C2 log2 R + C3 log3 R. T

5.4.1

Linear Fit

Applying a linear curve t, we obtain


1 T

= C0 + C1 log R

C0 = (2.67800 0.0005) 103 a.u. C1 = (3.0074 0.00024) 104 a.u. 2 /( N 2) = 4727.5 It is clear from the difference plot in gure 5.1 and the reduced 2 value that there is more than just a linear trend in the data set. In fact, the difference plot shows a quadratic curve instead of a random distribution of data-points around the horizontal axis. It is noteworthy to observe how difcult it is to see any difference between the straight line and the experimental points in the curve t of gure 5.1.

66
x 10
3

CHAPTER 5. PARAMETER ESTIMATION


Thermistor Characteristic Curve Fit

4 1/T (K)

3.5

x 10 5 4 3 1/T (K) 2 1 0 1 2

Difference Plot

2 Resistance

3 (log(Ohm))

Figure 5.2: Quadratic t of the thermistor data. The plot of the experimental points and the theoretical curve seems to show a good agreement. The difference plot clearly still shows a coarse approximation of the quadratic t.

5.4.2

Quadratic Fit
1 T

Applying a quadratic curve t, we obtain

= C0 + C1 log R + C2 log2 R

C0 = (2.68800 0.0005) 103 a.u. C1 = (2.7717 0.00067) 104 a.u. C2 = (5.460 0.014) 106 a.u. 2 /( N 3) = 34.2 Again, it is difcult to see any non-linear trend in the data points of the curve t shown in gure 5.2. However, the reduced 2 is still larger than

5.4. FIT EXAMPLE (THERMISTOR)


x 10
3

67

Thermistor Characteristic Curve Fit

4 1/T (K)

3.5

8 6 4 1/T (K) 2 0 2 4 6

x 10

Difference Plot

2 Resistance

3 (log(Ohm))

Figure 5.3: Cubic t of the thermistor data. The plot of the experimental points and the theoretical curve, and the difference plot dont show any clear systematic difference between the experimental point an theoretical curve.

1 and the difference plots clearly shows a residual trend in the data.

5.4.3

Cubic Fit

Applying a cubic curve t, we obtain

68
1 T

CHAPTER 5. PARAMETER ESTIMATION

= C0 + C1 log R + C2 log2 R + C3 log3 R = (2.6874 0.0003) 103 a.u. = (2.8021 0.0012) 104 a.u. = (3.376 0.066) 106 a.u. = (2.88 0.09) 107 a.u.

C0 C1 C2 C3

2 /( N 4) = 0.403 In this nal case the reduced 2 value is smaller than one and the scatter of the data from the horizontal axis in the difference plot of gure 5.3 suggests that the assumed uncertainty on the temperature T = 0.02K is probably a little too large. Apart from an increase of the data-points uncertainty, no special trend seems to be visible on the difference plot. The following table contains the data points used for the thermistor characteristic ts .

Point (#) 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

Resist. R() 0.76 0.86 0.97 1.11 1.45 1.67 1.92 2.23 2.59 3.02 3.54 4.16 4.91 5.83 6.94 8.31

Temp. T (K ) 383.15 378.15 373.15 368.15 358.15 353.15 348.15 343.15 338.15 333.15 328.15 323.15 318.15 313.15 308.15 303.15

T (K ) 0.02 0.02 0.02 0.02 0.02 0.02 0.02 0.02 0.02 0.02 0.02 0.02 0.02 0.02 0.02 0.02

R () 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

Point (#) 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32

Resist. R() 10.00 12.09 14.68 17.96 22.05 27.28 33.89 42.45 53.39 67.74 86.39 111.3 144.0 188.4 247.5 329.2

Temp. T (K ) 298.15 293.15 288.15 283.15 278.15 273.15 268.15 263.15 258.15 253.15 248.15 243.15 238.15 233.15 228.15 223.15

T (K ) 0.02 0.02 0.02 0.02 0.02 0.02 0.02 0.02 0.02 0.02 0.02 0.02 0.02 0.02 0.02 0.02

R () 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

5.5. FIT EXAMPLE (OFFSET CONSTANT)

69

5.5

Fit Example (Offset Constant)

The following data (?) 3 are the measurements of the voltage-current (V versus I ) characteristic of a silicon diode[5], i.e I (V ) = I0 eqV / kb T 1 , where I0 is the reverse bias current, q is the electron charge, k b the Boltzmann constant, T the absolute temperature, and a parameter, which is equal to 2 for silicon diodes Figure (?), which has been made using the following tting curve y = aebx , shows a systematic quadratic residue in the difference plot. Fitting the experimental points with the following curve y = aebx + c the quadratic trend in the difference plot disappears. This effect is typical of ts for which the correct offset constant is not introduced, thus producing systematic trends in the difference plots.

3 Please

let us know if you notice that the data and plot are missing in this section.

70

CHAPTER 5. PARAMETER ESTIMATION

Appendix A Central Limit Theorem


Lets consider a set of N independent random variables 1 , 2 , . . . , N , all having the same PDF p( i ) with the following parameters1 E [ i ] = , V [ i ] = i2 ,

i {1, 2, . . . , N }

Let s then consider the following random variable 1 x= N

i.
i

The central limit theorem states that, for N , x is normally distributed around and its variance is the sum of the random variable variances, i.e. E[ x ] = 2 ( x ) 1 2 e 2 , N 1 p( x ) N 2 2 2 V [ x ] = 1 i / N 2 The proof of this theorem is rather complicated and is outside the scope of this work. The central limit theorem tell us that a random varaiable that is the sum of random variables following an unknow PDF behaves as a normallly distributed random variable.
theorem has a more general formulation. It was proved rst by Laplace and then extended by other mathematicians including P.L.Chebychev, A.A. Markov and A.M. Lyapunov.
1 This

71

72

APPENDIX A. CENTRAL LIMIT THEOREM

It is noteworthy that this theorem suggests a simple way to generate values for a normally distributed variable. In fact a normally distributed value can be computed just by adding a large number of values of a given random variable. The and the can be estimated from the set of generated values.

Appendix B Statistical Propagation of Errors


We want to nd an approximate formula that computes the variance of a function of random variables by using the standard Taylor series expansion. Let f be a function of n random variables x = ( x1 , x2 , ..., xn ). The rst order Taylor expansion of f ( x ) about = (1 , 2 , ..., n ) where i = E [ x i ], is f ( x ) = f () +
1 n

i = 1, 2, ...n,

f (x) xi

( xi i ) + O ( xi i ) ,
xi = i

(B.1)

where O(m) indicates terms of order higher than m. Substituting the (B.1) into the denition of the variance of f ( x ), we get V [ f ( x )] = E[( f ( x ) f ())2 ] n f (x) = E ( x i ) x i xi = i i 1 By the aid of the expected value properties, we obtain V [ f ( x )] = (B.2)
2

(B.3)

i =1, j=1

n,n

f (x) xi

xi = i

f (x) xj 73

E[( xi i ) x j j ].
x j = j

74

APPENDIX B. STATISTICAL PROPAGATION OF ERRORS Dening the covariance matrix Vij as follows Vij = E[( xi i ) x j j ], i, j = 1, 2, ....n,

the previous expression of the variance of f nally becomes


N,N

V [ f ( x )] =

i =1, j=1

f (x) xi

xi = i

f (x) xj

Vij
x j = j

(B.4)

which is the law of statistical propagation of errors.

Appendix C NPDF Random Variable Uncertainties


Lets consider a set of independent measurements X = { x1 , x2 , ..., xn } of the same physical quantity x following the NPDF. Measurement Set with no Uncertainties (Unweighted Case) The uncertainty s of each single measurement xi , and the way to report it, are s2 =
N 1 )2 , ( xi x N 1 i =1

= x

1 N

i =1

xi ,

x = ( xi s)units

and its uncertaintys x For the mean of the measurements x , we have 1 = x N

i =1

xi ,

s2 = x

s2 , N

sx x = (x )units

Measurement Set with Uncertainties (Weighted Case) If each measurement has an uncertainty = {1 , 2 , ..., n }, we will trivially have x = ( xi i )units For the mean of the measurements and its uncertainty, we have 75

76

APPENDIX C. NPDF RANDOM VARIABLE UNCERTAINTIES

s2 x

1 , = N i=1 1/i2

= x

s2 x

i =1

i , 2 i

sx x = (x )units

Appendix D The Effective Variance


Let x be a normally distributed random variable and let y = f ( x ) be another random variable. Lets suppose that each data point ( xi xi , yi i , y i ). yi ), i = 1, ..., N , is normally distributed around ( x Applying the MLP to y and x we obtain L( x, y) =

i =1

xi

exp

i )2 ( xi x 2 2x i

yi 2 1 1 exp S, = xi 2 yi 2 where S=
N i =1

exp

i )2 ( yi y 2 2y i

i )2 ( y i y i )2 ( xi x , + 2 2 2x 2y i

Making the following approximation y ( xi ) S becomes S i ) + ( x x i ) f ( x i ), f (x

i =1

i )2 [ f ( x i ) + ( x x i ) f ( xi ) y i ]2 ( xi x + . 2 2 2x 2y i 77

78

APPENDIX D. THE EFFECTIVE VARIANCE In this case, the condition of maximization of L( x, y) S = 0, xi i = 1, ..., N ,

gives i f ( xi ) } f ( xi ) i ) 2{ y i [ f ( x i ) ( x i x 2( x i x + = 0. 2 2 x x i we obtain the expression Solving with respect to the parameters x i = xi x where
2 x i

i2

[ f ( xi ) yi )] f ( xi ),

2 2 i2 = [ f ( xi )]2 x + y . i i

Replacing this expression in the denition of S we obtain S=

[yi f ( xi )]2 2i2 i =1

which is the new S that must be minimized.

Bibliography
[1] P.R. Bevington, D. K. Robinson, Data Reduction and Error Analysis for the Physical Science, second edition, WCB McGraw-Hill. [2] J. Orear, Least squares when both variables have uncertainties, Am. J. Phys. 50(10), Oct 1982 [3] S. G. Rabinovich, Measurements Errors and Uncertainties Theory and Practice, second edition, Springer . [4] C. L. Nikias, A. P. Petropulu, Higher-Order Spectral Analysis, PTR Prentice Hall. [5] V. de O. Sannibale, Basics on Semiconductor Physics, Freshman Laboratory Notes, //http://www.ligo.caltech.edu/~vsanni/ph3/ [6] Deep Sea Research, 1968, Vol 15, pp 497 to 501, Pergamon Press (printed in Great Britain).

79

You might also like