Professional Documents
Culture Documents
Lec 6
Lec 6
One of the Course Goals: Learn how to interpret and present data.
Why Statistics?
Statistical analysis is performed on measurement data to
analytically determine the uncertainty of final measurement
result.
• What value best characterizes a data set?
Some Vocabulary
• Population: set of all possible values for a variable
• Sample: set of data from repeated measures under fixed
operating conditions
• Variable: what you are measuring
• Probability: is about confidence in that estimate
Being Average
• True Population Average: 𝑥′ Uncertainty
Interval
Best estimate
• Sample Average: 𝑥
𝑑1 2 + 𝑑2 2 + 𝑑3 2 + ⋯ 𝑑𝑛 2 𝑑𝑛 2
𝜎= =
𝑛 𝑛
𝑑1 2 + 𝑑2 2 + 𝑑3 2 + ⋯ 𝑑𝑛 2 𝑑𝑛 2
𝜎= =
𝑛−1 𝑛−1
• Histogram shows:
Central tendency Probability density
Frequency distribution
• As more readings are taken at relatively smaller increments, the
curve of the histogram becomes smoother. This bell-shaped
curve is known as Gaussian Curve.
PDF’s (Probability Density Functions)
• Probability: certain values for a variable will be measured
with some frequency of occurrence relative to
other values
Probability Density: frequency with which a measured variable
assumes a particular value
Probability Density Function (p.d.f):
If the height of the frequency distribution curve is normalized such
that the area under it is unity, then this curve is known as a
probability curve, and the height F(D) at any particular deviation
magnitude D is known as the probability density function (p.d.f.)
∞
calculated as: 𝑃 𝐷1 ≤ 𝐷 ≤ 𝐷2 = 𝐹 𝐷 𝑑𝐷
𝐷1
𝑃 𝐷 ≤ 𝐷0 = 𝐹 𝐷 𝑑𝐷 Eq. (L5.b)
−∞
• The c.d.f. is the area under the curve to the left of a vertical line
drawn through 𝐷0 as shown
• The deviation magnitude 𝐷𝑝 has the greatest probability, If the
errors are entirely random in nature, then the value of 𝐷𝑝 will
equal zero. Any nonzero value of 𝐷𝑝 indicates systematic errors in
data in the form of a bias that is often removable by recalibration.
Central Tendency:
value about which other values are scattered
occurs more often than others
different from mean since mean doesn’t have to occur
more often than other values
True Lies
• Regardless of distribution, if a variable shows a central tendency
it is described by its mean and variance
𝑥′ = 𝑥 𝑝 𝑥 𝑑𝑥 Continuous Data 𝜎2 = (𝑥 − 𝑥 ′ )2 𝑝 𝑥 𝑑𝑥
−∞ −∞
𝑁 𝑁
1 2
1
′
𝑥 = 𝑙𝑖𝑚𝑁→∞ 𝑥𝑖 Discrete Data 𝜎 = 𝑙𝑖𝑚𝑁→∞ (𝑥𝑖 − 𝑥 ′ )2
𝑁 𝑁
𝑖=1 𝑖=1
1 −𝐷2
𝐹 𝐷 = 𝑒𝑥𝑝 2𝜎2 • the width of the curve decreasing as
𝜎 2𝜋 𝜎 becomes smaller
𝐷
By substitution 𝑧 = ;
now error distribution curve becomes new
𝜎
Gaussian distribution that has a 𝜎 = 1 and a mean of zero, is
known as Standard Gaussian Curve or 𝑧 distribution as:
𝑧2 −𝑧 2
1
𝑃 𝐷1 ≤ 𝐷 ≤ 𝐷2 = 𝑃 𝑍1 ≤ 𝑍 ≤ 𝑍2 = 𝑒𝑥𝑝( 2 ) 𝑑𝑧
𝑧1 𝜎 2𝜋
Standard Gaussian Tables
Or (z-Distribution)
• tabulates the area under Gaussian
curve 𝐹 𝑧 for various values of 𝑧,
where 𝐹 𝑧 is given by:
𝑧 −𝑧 2
1
𝐹(𝑧) = 𝑒𝑥𝑝( 2 ) 𝑑𝑧
−∞ 𝜎 2𝜋
𝐹(𝑧) the proportion of data values that are less than or equal
to 𝑧 and equal to area under the curve 𝐹(𝑧) versus 𝑧 which is to
the left of 𝑧.
• Standard Gaussian tables, only gives 𝐹(𝑧) for positive values
of 𝑧, while for negative values of 𝑧, we have:
𝐹 −𝑧 = 1 − 𝐹(𝑧)
• 𝐹 𝑧 = 0.5
for 𝑧 = 0
Take Home Message
• Area under a PDF in interval 𝑥 ′ − 𝑧1 𝜎 ≤ 𝑥 ≤ 𝑥 ′ + 𝑧1 𝜎 is the
probability that the next measurement will assume a value
within the interval
• For Normal Distribution
Standard Error of the Mean
The standard deviation of mean values of a series of finite sets of
measurements relative to the true mean (the mean of the infinite
population that the finite set of measurements is drawn from) is
defined as the standard error of the mean, 𝜶
𝛼=𝜎 𝛼 0 as number of measurements 𝑛 ∞
𝑛
• the measurement value obtained by calculating the mean of
a set of 𝑛 measurements, 𝑥1 , 𝑥2 , … . 𝑥𝑛 can be expressed as:
𝑥 = 𝑥𝑚𝑒𝑎𝑛 ± 𝛼 with 68 % certainty that the magnitude of the
error does not exceed |𝛼|
𝑥 = 𝑥𝑚𝑒𝑎𝑛 ± 2𝛼 with 95.4 % certainty that the magnitude of
the error does not exceed |2𝛼|
𝑥 = 𝑥𝑚𝑒𝑎𝑛 ± 3𝛼 with 99.7% certainty that the magnitude of the
error does not exceed |3𝛼|
Estimation of Random Error in a Single Measurement
• The maximum likely deviation in a single measurement can be
expressed as:
𝐷𝑒𝑣𝑖𝑎𝑡𝑖𝑜𝑛 = ±1.96 𝜎 (within 95% confidence limits)
However, this only expresses the maximum likely deviation of the
measurement from the calculated mean of the reference
measurement set, which is not the true value.
• The calculated value for the standard error of the mean has to
be added to the likely maximum deviation value. So, the
maximum likely error in a single measurement can be
expressed as:
𝐸𝑟𝑟𝑜𝑟 = ±1.96 (𝜎 + 𝛼) (within 95% confidence limits)
Numerical Problem P 5.2:
Suppose that a standard mass is measured 30 times with the same
instrument to create a reference data set, and the calculated
values of 𝜎 and 𝛼 are 𝜎 = 0.46 and 𝛼 = 0.08. If the instrument is
then used to measure an unknown mass and the reading is
105.6 kg, how should the mass value be expressed?
✔ Solution is done in class.