Lec 6

Probability and Statistics
Damned Lies But True
One of the Course Goals: Learn how to interpret and present data.
Why Statistics?
Statistical analysis is performed on measurement data to
analytically determine the uncertainty of final measurement
result.
• What value best characterizes a data set?
• How much variation is present in the data?

• How likely is it that our best value represents the truth?
Statistics • is the tool used to interpret data and present results.

• is about estimating the population mean from a
sampled pool of data.
Some Vocabulary
• Population: set of all possible values for a variable
• Sample: set of data from repeated measures under fixed
operating conditions
• Variable: what you are measuring
• Probability: is about confidence in that estimate
Being Average
• True Population Average: 𝑥′ Uncertainty
Interval
Best estimate
• Sample Average: 𝑥
Since N → ∞ is impossible 𝑥 ′ = 𝑥 ± 𝑢𝑥 (𝑃%)

 Standard Deviation (Root Mean Square Deviation)
of an infinite number of measurements is defined as “the square
root of the sum of all individual deviations squared and divided by
the number of readings, denoted by 𝜎 as:
𝑑1 2 + 𝑑2 2 + 𝑑3 2 + ⋯ 𝑑𝑛 2 𝑑𝑛 2
𝜎= =
𝑛 𝑛
However, practically the number of readings cannot be

infinite; so we can have:
𝑑1 2 + 𝑑2 2 + 𝑑3 2 + ⋯ 𝑑𝑛 2 𝑑𝑛 2
𝜎= =
𝑛−1 𝑛−1
• For a good measurement, the value of standard deviation for

random errors is low.
 Variance ( Mean Square Deviation)
“the square of the standard deviation denoted by 𝜎 2 as:
2 2 2 2
𝑑1 + 𝑑 2 + 𝑑 3 + ⋯ 𝑑 𝑛 𝑑𝑛 2
𝜎2 = =
𝑛 𝑛
For finite number of readings we have:
2 2 2 2 2
𝑑1 + 𝑑 2 + 𝑑 3 + ⋯ 𝑑 𝑛 𝑑 𝑛
𝜎2 = =
𝑛−1 𝑛−1
Numerical Problem P 5.1:

A current passing through a resistor is recorded by 10 different
observers & the readings obtained are: 100.1, 101.7, 100.9, 102.1,
101.5, 101.0, 100.0, 102.1, 102.3, and 101.3 A. Calculate:
(a) arithmetic mean (b) standard deviation (c) variance
✔ Solution is done in class.
Histogram: PDF Precursor (Frequency Distribution Curve)
The scattering of measurements of the same quantity around the

most probable value or central value can be represented in the
form of block diagram
or histogram
(also called frequency
Distribution Curve).
• In histogram, bands or
data bins of equal width
across the range of
measurement values are
defined and the number
of measurements within
each band is counted.
Sturgis rule, calculates the number of bands as:
Number of bands = 1 + 3.3 log10 𝑛 (Take whole number)
where 𝑛 is the number of measurement values.
• Make K intervals δ𝑥 from 𝑥𝑚𝑖𝑛 → 𝑥𝑚𝑎𝑥

let K=1.87(N-1)0.4+1
• 𝑛𝑗 : # of samples in each interval At least one 𝑛𝑗 ≥ 5
• Histogram shows:
Central tendency Probability density
Frequency distribution
• As more readings are taken at relatively smaller increments, the
curve of the histogram becomes smoother. This bell-shaped
curve is known as Gaussian Curve.
 PDF’s (Probability Density Functions)
• Probability: certain values for a variable will be measured
with some frequency of occurrence relative to
other values
Probability Density: frequency with which a measured variable
assumes a particular value
Probability Density Function (p.d.f):
If the height of the frequency distribution curve is normalized such
that the area under it is unity, then this curve is known as a
probability curve, and the height F(D) at any particular deviation
magnitude D is known as the probability density function (p.d.f.)
∞
Mathematically 𝐹 𝐷 𝑑𝐷 = 1 Eq. (L5.a)

−∞
• The probability that the error in any one particular

measurement lies between two levels D1 and D2 can be
𝐷2
calculated as: 𝑃 𝐷1 ≤ 𝐷 ≤ 𝐷2 = 𝐹 𝐷 𝑑𝐷
𝐷1
• The maximum error likely in

any one measurement say 𝐷0 is
the probability of observing a
value less than or equal to 𝐷0
and called as the cumulative
distribution function (c.d.f.);
this is shown in Fig. expressed
mathematically as:
𝐷0
𝑃 𝐷 ≤ 𝐷0 = 𝐹 𝐷 𝑑𝐷 Eq. (L5.b)
−∞
• The c.d.f. is the area under the curve to the left of a vertical line
drawn through 𝐷0 as shown
• The deviation magnitude 𝐷𝑝 has the greatest probability, If the
errors are entirely random in nature, then the value of 𝐷𝑝 will
equal zero. Any nonzero value of 𝐷𝑝 indicates systematic errors in
data in the form of a bias that is often removable by recalibration.
Central Tendency:
value about which other values are scattered
occurs more often than others
different from mean since mean doesn’t have to occur
more often than other values
Take N = 20 samples Central Tendency
PDF as Mathematical Histogram

𝑛𝑗
• 𝑝 𝑥 = 𝑙𝑖𝑚𝑁→∞,δ𝑥→0
𝑁(2δ𝑥)
• Provides central tendency
• Depends on variable and measurement quality
True Lies
• Regardless of distribution, if a variable shows a central tendency
it is described by its mean and variance
True Mean True Variance

∞ ∞
𝑥′ = 𝑥 𝑝 𝑥 𝑑𝑥 Continuous Data 𝜎2 = (𝑥 − 𝑥 ′ )2 𝑝 𝑥 𝑑𝑥
−∞ −∞
𝑁 𝑁
1 2
1
′
𝑥 = 𝑙𝑖𝑚𝑁→∞ 𝑥𝑖 Discrete Data 𝜎 = 𝑙𝑖𝑚𝑁→∞ (𝑥𝑖 − 𝑥 ′ )2
𝑁 𝑁
𝑖=1 𝑖=1
Standard Deviation 𝜎 = 𝑣𝑎𝑟𝑖𝑎𝑛𝑐𝑒

 Standard Distributions
Normal Distribution χ𝟐 - Distribution
 Normal (Gaussian) Distribution: Infinite Statistics

• Measurements that only contain random errors usually
conform to a Normal Distribution.
• Normal Distribution: normalized frequency distribution that is
symmetrically distributed scatter about central tendency (line
of zero error). Mathematically,
frequency and magnitude of the
quantities is expressed as:
(Used for
prediction)
where 𝑥 ′ is the mean value of data

set 𝑥
• Frequency of small deviations from
the mean value is much greater than the frequency of large
deviations.
Has maxima about 𝑥 = 𝑥 ′ Most expected value is the true mean
• If measurement deviations 𝐷 are calculated for all measure-
ments such that 𝐷 = 𝑥 − 𝑥 ′ , then the curve of deviation
frequency F(𝐷) plotted against deviation magnitude 𝐷
is a Gaussian curve known as the Error Frequency Distribution
Curve. Then from , we have:
1 −𝐷2
𝐹 𝐷 = 𝑒𝑥𝑝 2𝜎2 • the width of the curve decreasing as
𝜎 2𝜋 𝜎 becomes smaller
• If standard deviation 𝜎 is used as a unit of error, then Gaussian

Curve can be used and the probability that the error lies between
error levels 𝐷1 and 𝐷2 can be expressed as:
𝐷2
1 −𝐷2
𝑃 𝐷1 ≤ 𝐷 ≤ 𝐷2 = 𝐹 𝐷 𝑑𝐷 = 𝑒𝑥𝑝 2𝜎2 𝑑𝐷
𝜎 2𝜋
𝐷1
𝐷
By substitution 𝑧 = ;
now error distribution curve becomes new
𝜎
Gaussian distribution that has a 𝜎 = 1 and a mean of zero, is
known as Standard Gaussian Curve or 𝑧 distribution as:
𝑧2 −𝑧 2
1
𝑃 𝐷1 ≤ 𝐷 ≤ 𝐷2 = 𝑃 𝑍1 ≤ 𝑍 ≤ 𝑍2 = 𝑒𝑥𝑝( 2 ) 𝑑𝑧
𝑧1 𝜎 2𝜋
Standard Gaussian Tables
Or (z-Distribution)
• tabulates the area under Gaussian
curve 𝐹 𝑧 for various values of 𝑧,
where 𝐹 𝑧 is given by:
𝑧 −𝑧 2
1
𝐹(𝑧) = 𝑒𝑥𝑝( 2 ) 𝑑𝑧
−∞ 𝜎 2𝜋
𝐹(𝑧) the proportion of data values that are less than or equal
to 𝑧 and equal to area under the curve 𝐹(𝑧) versus 𝑧 which is to
the left of 𝑧.
• Standard Gaussian tables, only gives 𝐹(𝑧) for positive values
of 𝑧, while for negative values of 𝑧, we have:
𝐹 −𝑧 = 1 − 𝐹(𝑧)
• 𝐹 𝑧 = 0.5
for 𝑧 = 0
Take Home Message
• Area under a PDF in interval 𝑥 ′ − 𝑧1 𝜎 ≤ 𝑥 ≤ 𝑥 ′ + 𝑧1 𝜎 is the
probability that the next measurement will assume a value
within the interval
• For Normal Distribution
 Standard Error of the Mean
The standard deviation of mean values of a series of finite sets of
measurements relative to the true mean (the mean of the infinite
population that the finite set of measurements is drawn from) is
defined as the standard error of the mean, 𝜶
𝛼=𝜎 𝛼 0 as number of measurements 𝑛 ∞
𝑛
• the measurement value obtained by calculating the mean of
a set of 𝑛 measurements, 𝑥1 , 𝑥2 , … . 𝑥𝑛 can be expressed as:
𝑥 = 𝑥𝑚𝑒𝑎𝑛 ± 𝛼 with 68 % certainty that the magnitude of the
error does not exceed |𝛼|
𝑥 = 𝑥𝑚𝑒𝑎𝑛 ± 2𝛼 with 95.4 % certainty that the magnitude of
the error does not exceed |2𝛼|
𝑥 = 𝑥𝑚𝑒𝑎𝑛 ± 3𝛼 with 99.7% certainty that the magnitude of the
error does not exceed |3𝛼|
 Estimation of Random Error in a Single Measurement
• The maximum likely deviation in a single measurement can be
expressed as:
𝐷𝑒𝑣𝑖𝑎𝑡𝑖𝑜𝑛 = ±1.96 𝜎 (within 95% confidence limits)
However, this only expresses the maximum likely deviation of the
measurement from the calculated mean of the reference
measurement set, which is not the true value.
• The calculated value for the standard error of the mean has to
be added to the likely maximum deviation value. So, the
maximum likely error in a single measurement can be
expressed as:
𝐸𝑟𝑟𝑜𝑟 = ±1.96 (𝜎 + 𝛼) (within 95% confidence limits)
Suppose that a standard mass is measured 30 times with the same
instrument to create a reference data set, and the calculated
values of 𝜎 and 𝛼 are 𝜎 = 0.46 and 𝛼 = 0.08. If the instrument is
then used to measure an unknown mass and the reading is
105.6 kg, how should the mass value be expressed?

An integrated circuit chip contains 105 transistors. The transistors
have a mean current gain of 20 and a standard deviation of 2.
Calculate the following:
(a) number of transistors with a current gain between 19.8 and
20.2
(b) number of transistors with a current gain greater than 17

Lec 6

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Lec 6

Uploaded by

Copyright:

Available Formats

Probability and Statistics

Damned Lies But True

• How much variation is present in the data?

Statistics • is the tool used to interpret data and present results.

Since N → ∞ is impossible 𝑥 ′ = 𝑥 ± 𝑢𝑥 (𝑃%)

However, practically the number of readings cannot be

• For a good measurement, the value of standard deviation for

Numerical Problem P 5.1:

The scattering of measurements of the same quantity around the

Number of bands = 1 + 3.3 log10 𝑛 (Take whole number)

where 𝑛 is the number of measurement values.

• Make K intervals δ𝑥 from 𝑥𝑚𝑖𝑛 → 𝑥𝑚𝑎𝑥

Mathematically 𝐹 𝐷 𝑑𝐷 = 1 Eq. (L5.a)

• The probability that the error in any one particular

• The maximum error likely in

Take N = 20 samples Central Tendency

PDF as Mathematical Histogram

True Mean True Variance

Standard Deviation 𝜎 = 𝑣𝑎𝑟𝑖𝑎𝑛𝑐𝑒

 Normal (Gaussian) Distribution: Infinite Statistics

where 𝑥 ′ is the mean value of data

• If standard deviation 𝜎 is used as a unit of error, then Gaussian

Numerical Problem P 5.3:

You might also like