You are on page 1of 20

Probability and Statistics

Damned Lies But True

One of the Course Goals: Learn how to interpret and present data.
Why Statistics?
Statistical analysis is performed on measurement data to
analytically determine the uncertainty of final measurement
result.
• What value best characterizes a data set?

• How much variation is present in the data?


• How likely is it that our best value represents the truth?

Statistics • is the tool used to interpret data and present results.


• is about estimating the population mean from a
sampled pool of data.

Some Vocabulary
• Population: set of all possible values for a variable
• Sample: set of data from repeated measures under fixed
operating conditions
• Variable: what you are measuring
• Probability: is about confidence in that estimate

Being Average
• True Population Average: 𝑥′ Uncertainty
Interval
Best estimate
• Sample Average: 𝑥

Since N → ∞ is impossible 𝑥 ′ = 𝑥 ± 𝑢𝑥 (𝑃%)


 Standard Deviation (Root Mean Square Deviation)
of an infinite number of measurements is defined as “the square
root of the sum of all individual deviations squared and divided by
the number of readings, denoted by 𝜎 as:

𝑑1 2 + 𝑑2 2 + 𝑑3 2 + ⋯ 𝑑𝑛 2 𝑑𝑛 2
𝜎= =
𝑛 𝑛

However, practically the number of readings cannot be


infinite; so we can have:

𝑑1 2 + 𝑑2 2 + 𝑑3 2 + ⋯ 𝑑𝑛 2 𝑑𝑛 2
𝜎= =
𝑛−1 𝑛−1

• For a good measurement, the value of standard deviation for


random errors is low.
 Variance ( Mean Square Deviation)
“the square of the standard deviation denoted by 𝜎 2 as:
2 2 2 2
𝑑1 + 𝑑 2 + 𝑑 3 + ⋯ 𝑑 𝑛 𝑑𝑛 2
𝜎2 = =
𝑛 𝑛
For finite number of readings we have:
2 2 2 2 2
𝑑1 + 𝑑 2 + 𝑑 3 + ⋯ 𝑑 𝑛 𝑑 𝑛
𝜎2 = =
𝑛−1 𝑛−1

Numerical Problem P 5.1:


A current passing through a resistor is recorded by 10 different
observers & the readings obtained are: 100.1, 101.7, 100.9, 102.1,
101.5, 101.0, 100.0, 102.1, 102.3, and 101.3 A. Calculate:
(a) arithmetic mean (b) standard deviation (c) variance
✔ Solution is done in class.
Histogram: PDF Precursor (Frequency Distribution Curve)

The scattering of measurements of the same quantity around the


most probable value or central value can be represented in the
form of block diagram
or histogram
(also called frequency
Distribution Curve).
• In histogram, bands or
data bins of equal width
across the range of
measurement values are
defined and the number
of measurements within
each band is counted.
Sturgis rule, calculates the number of bands as:

Number of bands = 1 + 3.3 log10 𝑛 (Take whole number)

where 𝑛 is the number of measurement values.

• Make K intervals δ𝑥 from 𝑥𝑚𝑖𝑛 → 𝑥𝑚𝑎𝑥


let K=1.87(N-1)0.4+1
• 𝑛𝑗 : # of samples in each interval At least one 𝑛𝑗 ≥ 5

• Histogram shows:
Central tendency Probability density
Frequency distribution
• As more readings are taken at relatively smaller increments, the
curve of the histogram becomes smoother. This bell-shaped
curve is known as Gaussian Curve.
 PDF’s (Probability Density Functions)
• Probability: certain values for a variable will be measured
with some frequency of occurrence relative to
other values
Probability Density: frequency with which a measured variable
assumes a particular value
Probability Density Function (p.d.f):
If the height of the frequency distribution curve is normalized such
that the area under it is unity, then this curve is known as a
probability curve, and the height F(D) at any particular deviation
magnitude D is known as the probability density function (p.d.f.)

Mathematically 𝐹 𝐷 𝑑𝐷 = 1 Eq. (L5.a)


−∞

• The probability that the error in any one particular


measurement lies between two levels D1 and D2 can be
𝐷2

calculated as: 𝑃 𝐷1 ≤ 𝐷 ≤ 𝐷2 = 𝐹 𝐷 𝑑𝐷
𝐷1

• The maximum error likely in


any one measurement say 𝐷0 is
the probability of observing a
value less than or equal to 𝐷0
and called as the cumulative
distribution function (c.d.f.);
this is shown in Fig. expressed
mathematically as:
𝐷0

𝑃 𝐷 ≤ 𝐷0 = 𝐹 𝐷 𝑑𝐷 Eq. (L5.b)
−∞

• The c.d.f. is the area under the curve to the left of a vertical line
drawn through 𝐷0 as shown
• The deviation magnitude 𝐷𝑝 has the greatest probability, If the
errors are entirely random in nature, then the value of 𝐷𝑝 will
equal zero. Any nonzero value of 𝐷𝑝 indicates systematic errors in
data in the form of a bias that is often removable by recalibration.

Central Tendency:
value about which other values are scattered
occurs more often than others
different from mean since mean doesn’t have to occur
more often than other values

Take N = 20 samples Central Tendency

PDF as Mathematical Histogram


𝑛𝑗
• 𝑝 𝑥 = 𝑙𝑖𝑚𝑁→∞,δ𝑥→0
𝑁(2δ𝑥)
• Provides central tendency
• Depends on variable and measurement quality

True Lies
• Regardless of distribution, if a variable shows a central tendency
it is described by its mean and variance

True Mean True Variance


∞ ∞

𝑥′ = 𝑥 𝑝 𝑥 𝑑𝑥 Continuous Data 𝜎2 = (𝑥 − 𝑥 ′ )2 𝑝 𝑥 𝑑𝑥
−∞ −∞
𝑁 𝑁
1 2
1

𝑥 = 𝑙𝑖𝑚𝑁→∞ 𝑥𝑖 Discrete Data 𝜎 = 𝑙𝑖𝑚𝑁→∞ (𝑥𝑖 − 𝑥 ′ )2
𝑁 𝑁
𝑖=1 𝑖=1

Standard Deviation 𝜎 = 𝑣𝑎𝑟𝑖𝑎𝑛𝑐𝑒


 Standard Distributions
Normal Distribution χ𝟐 - Distribution

 Normal (Gaussian) Distribution: Infinite Statistics


• Measurements that only contain random errors usually
conform to a Normal Distribution.
• Normal Distribution: normalized frequency distribution that is
symmetrically distributed scatter about central tendency (line
of zero error). Mathematically,
frequency and magnitude of the
quantities is expressed as:
(Used for
prediction)

where 𝑥 ′ is the mean value of data


set 𝑥
• Frequency of small deviations from
the mean value is much greater than the frequency of large
deviations.
Has maxima about 𝑥 = 𝑥 ′ Most expected value is the true mean
• If measurement deviations 𝐷 are calculated for all measure-
ments such that 𝐷 = 𝑥 − 𝑥 ′ , then the curve of deviation
frequency F(𝐷) plotted against deviation magnitude 𝐷
is a Gaussian curve known as the Error Frequency Distribution
Curve. Then from , we have:

1 −𝐷2
𝐹 𝐷 = 𝑒𝑥𝑝 2𝜎2 • the width of the curve decreasing as
𝜎 2𝜋 𝜎 becomes smaller

• If standard deviation 𝜎 is used as a unit of error, then Gaussian


Curve can be used and the probability that the error lies between
error levels 𝐷1 and 𝐷2 can be expressed as:
𝐷2
1 −𝐷2
𝑃 𝐷1 ≤ 𝐷 ≤ 𝐷2 = 𝐹 𝐷 𝑑𝐷 = 𝑒𝑥𝑝 2𝜎2 𝑑𝐷
𝜎 2𝜋
𝐷1

𝐷
By substitution 𝑧 = ;
now error distribution curve becomes new
𝜎
Gaussian distribution that has a 𝜎 = 1 and a mean of zero, is
known as Standard Gaussian Curve or 𝑧 distribution as:
𝑧2 −𝑧 2
1
𝑃 𝐷1 ≤ 𝐷 ≤ 𝐷2 = 𝑃 𝑍1 ≤ 𝑍 ≤ 𝑍2 = 𝑒𝑥𝑝( 2 ) 𝑑𝑧
𝑧1 𝜎 2𝜋
Standard Gaussian Tables
Or (z-Distribution)
• tabulates the area under Gaussian
curve 𝐹 𝑧 for various values of 𝑧,
where 𝐹 𝑧 is given by:
𝑧 −𝑧 2
1
𝐹(𝑧) = 𝑒𝑥𝑝( 2 ) 𝑑𝑧
−∞ 𝜎 2𝜋

𝐹(𝑧) the proportion of data values that are less than or equal
to 𝑧 and equal to area under the curve 𝐹(𝑧) versus 𝑧 which is to
the left of 𝑧.
• Standard Gaussian tables, only gives 𝐹(𝑧) for positive values
of 𝑧, while for negative values of 𝑧, we have:
𝐹 −𝑧 = 1 − 𝐹(𝑧)
• 𝐹 𝑧 = 0.5
for 𝑧 = 0
Take Home Message
• Area under a PDF in interval 𝑥 ′ − 𝑧1 𝜎 ≤ 𝑥 ≤ 𝑥 ′ + 𝑧1 𝜎 is the
probability that the next measurement will assume a value
within the interval
• For Normal Distribution
 Standard Error of the Mean
The standard deviation of mean values of a series of finite sets of
measurements relative to the true mean (the mean of the infinite
population that the finite set of measurements is drawn from) is
defined as the standard error of the mean, 𝜶
𝛼=𝜎 𝛼 0 as number of measurements 𝑛 ∞
𝑛
• the measurement value obtained by calculating the mean of
a set of 𝑛 measurements, 𝑥1 , 𝑥2 , … . 𝑥𝑛 can be expressed as:
𝑥 = 𝑥𝑚𝑒𝑎𝑛 ± 𝛼 with 68 % certainty that the magnitude of the
error does not exceed |𝛼|
𝑥 = 𝑥𝑚𝑒𝑎𝑛 ± 2𝛼 with 95.4 % certainty that the magnitude of
the error does not exceed |2𝛼|
𝑥 = 𝑥𝑚𝑒𝑎𝑛 ± 3𝛼 with 99.7% certainty that the magnitude of the
error does not exceed |3𝛼|
 Estimation of Random Error in a Single Measurement
• The maximum likely deviation in a single measurement can be
expressed as:
𝐷𝑒𝑣𝑖𝑎𝑡𝑖𝑜𝑛 = ±1.96 𝜎 (within 95% confidence limits)
However, this only expresses the maximum likely deviation of the
measurement from the calculated mean of the reference
measurement set, which is not the true value.

• The calculated value for the standard error of the mean has to
be added to the likely maximum deviation value. So, the
maximum likely error in a single measurement can be
expressed as:
𝐸𝑟𝑟𝑜𝑟 = ±1.96 (𝜎 + 𝛼) (within 95% confidence limits)
Numerical Problem P 5.2:
Suppose that a standard mass is measured 30 times with the same
instrument to create a reference data set, and the calculated
values of 𝜎 and 𝛼 are 𝜎 = 0.46 and 𝛼 = 0.08. If the instrument is
then used to measure an unknown mass and the reading is
105.6 kg, how should the mass value be expressed?
✔ Solution is done in class.

Numerical Problem P 5.3:


An integrated circuit chip contains 105 transistors. The transistors
have a mean current gain of 20 and a standard deviation of 2.
Calculate the following:
(a) number of transistors with a current gain between 19.8 and
20.2
(b) number of transistors with a current gain greater than 17
✔ Solution is done in class.

You might also like