Professional Documents
Culture Documents
Engineering Statistics
Lecture 2
Summary Statistics
(Section 1.2)
1-2
Topics to learn
1. Mean, standard deviation, variance
2. Outliers
3. Median, quartile, percentile, trimmed mean
4. Mode, range
5. Frequency, sample proportion
6. Difference between ‘‘statistics’’ and ‘‘parameter ’’
Let X 1 , , X n be a sample.
Sample Mean:
- The sample mean is also called the “arithmetic
mean,” or the “average.”
- It is the sum of the numbers in the sample, divided
by how many there are.
1 n
X Xi (1)
n i 1
Standard Deviation
• Consider the two lists of numbers:
List 1: 28, 29, 30, 31, 32
List 2: 10, 20, 30, 40, 50.
• Both lists have the same mean of 30.
• But clearly the lists differ in an important way that is
not captured by the mean:
– the second list is much more spread out than the first.
• The standard deviation is a quantity that measures
the degree of spread in a sample.
Let X 1 , , X n be a sample.
Sample Variance:
(2.1)
(2.2)
Let X 1 , , X n be a sample.
• Sample standard deviation is the square root of the
sample variance.
(3.1)
(3.2)
Note:
Why is the sum of the squared deviations is divided by n − 1 rather
than n ?
• Ideally, we would compute deviations from the mean of all the
items in the population, rather than the deviations from the
sample mean.
• However, the population mean is in general unknown, so the
sample mean is used in its place.
• It is a mathematical fact that
– the deviations around the sample mean tend to be a bit smaller than
the deviations around the population mean, and that
– dividing by n − 1 rather than n provides exactly the right correction.
(The five heights (in inches) are: 65.51, 72.30, 68.31, 67.05, 70.68.)
Y a bX
and,
s b s , and s y b sx .
2
y
2 2
x
Example
In Example 1.9, if the heights were measured in
centimeters rather than inches what would happen to the
sample mean, variance, and standard deviation?
Outliers
• Outliers are points that are much larger or smaller
than the rest of the sample points.
• Outliers may be data entry errors or they may be
points that really are different from the rest.
• Outliers should not be deleted without considerable
thought—sometimes calculations and analyses will
be done with and without outliers and then compared.
Outliers
• Outliers are a real problem for data analysts.
– For this reason, when people see outliers in their
data, they sometimes try to find a reason, or an
excuse, to delete them.
• An outlier should not be deleted, however, unless
there is reasonable certainty that it results from an
error.
• If a population truly contains outliers, but they are
deleted from the sample, the sample will not
characterize the population correctly.
Definition of a Median
The median is another measure of center, like the
mean.
Order the n data points from smallest to largest. Then
If n is odd, the sample median is the number in
n 1
position .
2
(Recall; the five heights are: 65.51, 72.30, 68.31, 67.05, 70.68.)
Trimmed Mean
• Like the median, the trimmed mean is a measure of center
that is designed to be unaffected by outliers.
• The trimmed mean is computed by
– arranging the sample values in order,
– “trimming” an equal number of them from each end, and
– computing the mean of those remaining.
• If p% of the data are trimmed from each end, the resulting
trimmed mean is called the “p% trimmed mean.”
• There are no hard-and-fast rules on how many values to trim.
The most commonly used trimmed means: 5%, 10%, and 20%
trimmed means.
Quartiles
Quartiles divide the data as nearly as possible
into quarters.
The first quartile is the median of the lower
half of the data.
To find the first quartile, compute 0.25(n + 1);
- If this is an integer, then the sample value in that
position is the first quartile.
- If not, take the average of the sample values on
either side of this value.
Quartiles
The third quartile is the median of the upper
half of the data.
To find the third quartile, compute 0.75(n + 1);
- If this is an integer, then the sample value in that
position is the third quartile.
- If not, take the average of the sample values on
either side of this value.
Definition of Percentile
• The pth percentile of a sample, for a number
p between 0 and 100, divides the sample so
that as nearly as possible p% of the sample
values are less than the pth percentile, and
(100 – p%) are greater.
• The computation of the location of the pth
percentile is analogous to what we did for the
quartiles.
To Find Percentiles
Order the n sample values from smallest to
largest.
Compute the quantity (p/100)(n + 1), where n
is the sample size.
If this quantity is an integer, the sample value
in this position is the pth percentile.
Otherwise, average the two sample values on
either side.
Note on Percentiles
• The first quartile is the 25th percentile.
Example 4
• Suppose we have the following data:
2, 3, 5, 6, 7, 9, 9, 11, 12, 15
• What is the mean of these data?
• What is the median?
• What is the first quartile?
• What is the third quartile?
Example 4 (cont.)
Population Parameters
A numerical summary of a sample is called a
statistic.