Professional Documents
Culture Documents
Summation Notation
The most common symbol or notation used in statistics is the summation notation or simply summation ( ∑ ).
n
∑ Xi = X1 + X2 + X3 + ... + Xn .
i=1
Theorems on Summation
1. The summation of the sum of two or more variables is the sum of their summations. Thus,
n n n n
∑ (xi + yi + zi ) = ∑ xi + ∑ yi + ∑ zi
i =1 i =1 i =1 i=1
2. If c is a constant, then
n n
∑ cxi = c∑ xi
i=1 i=1
3. If c is a constant, then
n
∑ c = nc
i =1
Exercise:
A. Write the following in full.
5
1. ∑ 5 xi
i=1
6
2. ∑ ( xi + 2 yi - 3)
i=3
5
3. ∑ 3xi 2
i=2
4
4. ∑ (3xi ) 2
i =1
4
5. ∑ ( xi+2 - 3)
i=1
B. Write each of the following expressions in summation notation with appropriate limits.
1. 2x1 + 2x2 + 2x3 + 2x4 + 2x5 + 2x6 + 2x7 + 2x8
2. ( x2 - 3 y2 ) + ( x3 - 3 y3 ) + ( x4 - 3 y4 ) + ( x5 - 3 y5 )
C. Given: x1 = 4 y1 = -2
x2 = -3 y2 = 5
x3 = 6 y3 = -1
x4 = 2 y4 = 3
Evaluate the following.
4 4 4 4 4 4
1. ∑ (2 xi + 3 yi - 4) 2. ∑ xi 2 yi 3 3. ( ∑ xi 2 )( ∑ yi 3 ) 4. ( ∑ xi ) 2 ( ∑ yi ) 3
i=1 i=1 i =1 i =1 i=1 i=1
MEASURES OF CENTRAL TENDENCY AND LOCATION
The Mean ( x )
The mean, also known as the arithmetic average, is found by adding the values of the data and dividing by
the total number of values. If x1, x2 , x3 ,..., xn represents a finite set of observations of size n , then the mean is
n
∑ xi
i =1
n
The population mean is denoted by and the sample mean is denoted by x . The mean should be rounded to one
more decimal place than occurs in the raw data.
Examples
1. The data represent the number of days off per year for a sample of individuals selected from nine different
countries. Find the mean.
20, 26, 40, 36, 23, 42, 35, 24, 30
2. The numbers of building permits issued last month to 12 construction firms in a small city were 4, 7, 0, 7,
11, 4, 1, 15, 3, 5, 8, and 7. Find the mean.
~
The Median ( X )
The median is the halfway point in a data set. Before you can find this point, the data must be arranged in
order. When the data set is ordered, it is called a data array. The median either will be a specific value in the data
set or will fall between two values.
Examples
1. The number of rooms in the seven hotels in downtown Pittsburgh is 713, 300, 618, 595, 311, 401, and 292.
Find the median.
2. The numbers of building permits issued last month to 12 construction firms in a small city were 4, 7, 0, 7,
11, 4, 1, 15, 3, 5, 8, and 7. Find the median.
The Mode ( X̂ )
The mode of a set of observations is that value which occurs most often or with the greatest frequency. A
data set that has only one value that occurs with the greatest frequency is said to be unimodal. If a data set has two
values that occur with the same greatest frequency, both values are considered to be the mode and the data set is said
to be bimodal. If a data set has more than two values that occur with the same greatest frequency, each value is used
as the mode, and the data set is said to be multimodal. When no data value occurs more than once, the data set is
said to have no mode. A data set can have more than one mode or no mode at all.
Examples
1. Find the mode of the signing bonuses of eight NFL players for a specific year. The bonuses in millions
of dollars are 18.0, 14.0, 34.5, 10, 11.3, 10, 12.4, 10.
2. Find the mode for the number of coal employees per county for 10 selected counties in southwestern
Pennsylvania. 110, 731, 1031, 84, 20, 118, 1162, 1977, 103, 752
3. The data show the number of licensed nuclear reactors in the United States for a recent 15-year period. Find
the mode.
104 104 104 104 104
107 109 109 109 110
109 111 112 111 109
The mode is the only measure of central tendency that can be used in finding the most typical case when
the data are nominal or categorical.
Example. A survey showed this distribution for the number of students enrolled in each field. Find the mode.
Business 1425
Liberal arts 878
Computer science 632
Education 471
General studies 95
Weighted Mean
The type of mean that considers an additional factor is called the weighted mean, and it is used when the
values are not all equally represented.
If k quantities x1, x2 , . . . , xk have weights w1 , w2 , . . . , wk , respectively, where the weights
represents measures of relative importance, then the weighted mean is
∑ xi wi
xw =
∑ wi
Examples:
1. What is the average for a student who received grades of 85, 76, and 82 on three tests and a 79 on the final
examination in a certain course if the final examination counts three times as much as each of the three
tests?
2. On a vacation trip a family bought 21.3 liters of gasoline at 39.9 cents per liter, 18.7 liters at42.9 cents per
liter, and 23.5 liters at 40.9 cents per liter. Find the mean price paid per liter.
3. A savings and loan association makes one car loan of $5000 at 10.5% interest, a second car loan of $6300
at 10.8% interest, and a third car loan of $4500 at 11% interest. What is the average percentage return to the
savings and loan association for these three loans?
Combined Mean
If k finite groups having n1 , n2 , . . . , nk measurements, respectively, have means x1 , x 2 , . . . , x k , the
combined mean is
∑ ni x i
xc =
∑ ni
Example:
1. Three sections of a statistics class containing 28, 32, and 35 students averaged 83, 80, and 76, respectively,
on the same final examination. What is the combined mean for all three sections?
2. A survey of a random sample of people leaving an amusement park showed an average expenditure of
$10.30 for the evening. The average expenditure for the 20 girls in the sample was $9.70 and for the boys it
was $11.10. How many boys are there in the random sample?
Δ1 i
Mode Xˆ = L + where: L = lower boundary of the modal class
Δ1 + Δ 2
1 = difference between the frequencies of the modal class and the next lower class
2 = difference between the frequencies of the modal class and the next higher class
i = size of the class interval
Properties and Uses of Central Tendency
The Mean
1. The mean is found by using all the values of the data.
2. The mean varies less than the median or mode when samples are taken from the same population and all
three measures are computed for these samples.
3. The mean is used in computing other statistics, such as the variance.
4. The mean for the data set is unique and not necessarily one of the data values.
5. The mean cannot be computed for the data in a frequency distribution that has an open-ended class.
6. The mean is affected by extremely high or low values, called outliers, and may not be the appropriate average
to use in these situations.
The Median
1. The median is used to find the center or middle value of a data set.
2. The median is used when it is necessary to find out whether the data values fall into the upper half or lower
half of the distribution.
3. The median is used for an open-ended distribution.
4. The median is affected less than the mean by extremely high or extremely low values.
The Mode
1. The mode is used when the most typical case is desired.
2. The mode is the easiest average to compute.
3. The mode can be used when the data are nominal, such as religious preference, gender, or political affiliation.
4. The mode is not always unique. A data set can have more than one mode, or the mode may not exist for a
data set.
The Midrange
1. The midrange is easy to compute.
2. The midrange gives the midpoint.
3. The midrange is affected by extremely high or low values in a data set
MEASURES OF VARIATION
Measures of variability or dispersion are measures of the average distance of each observation from the
center of the distribution. They measure the homogeneity or heterogeneity of a particular group.
A small measure of variability would indicate that the data are: clustered closely around the mean, more
homogeneous, less variable, more consistent, and more uniformly distributed.
There are two general classifications of measures of variability or dispersion: Measures of absolute
dispersion and measures
of relative dispersion.
1. The Range
The range is the difference between the highest and the lowest values. This is the simplest but the most
unreliable measure of variability since it uses only two values in the distribution.
2. The Variance
Variance is the average of the squared deviation from the mean.
n∑ fi X i2 - (∑ fi X i )
2
Sample variance: s =
2
n(n - 1)
Example :
1. Consider the following sets of grades in Mathematics of two groups of 5 students each.
Male group : 70, 95, 60, 80, 100
Female group : 82, 80, 83, 81, 79
Find the range, variance, and standard deviation for each set.
2. Net Worth of Corporations These data represent the net worth (in millions of dollars) of 45 national corporations.
Class limits Frequency
10–20 2
21–31 8
32–42 15
43–53 7
54–64 10
65–75 3
The range can be used to approximate the standard deviation. The approximation is called the range rule
range
of thumb. A rough estimate of the standard deviation is 𝑠 4 .
Coefficient of Variation
- describes the standard deviation relative to the mean
Coefficient of variation (CV) is the ratio of the standard deviation to the mean and is usually expressed in
percentage. It is used to compare the variability of two or more sets of data even when they are expressed in
different units of measurement.
s
cv = where : s = standard deviation
x
x = mean
Example: Height and Weight of Men
Using the height and weight data for 40 males included in a sample, the statistics are given below.
Find the coefficient of variation for the height, the coefficient of variation for the weight, and then compare the
results.
Example. IQ scores of normal adults on the Wechsler test have a bell shape distribution with a mean of 100
and a standard deviation of 15. What percentage of adults have IQ scores between 55 and 145?
2. CHEBYSHEV’S THEOREM
- Applies to distribution of any shape
1
At least the fraction 1 - of the measurements of any set of data must lie within k standard
k2
deviations of the mean.
Examples:
1. If the IQs of a random sample of 1080 students at a large university have a mean score of 120 and a
standard deviation of 8,
a. Determine the interval containing at least 810 of the IQs in the sample,
b. In what range can we be sure that no more than 120 of the scores fall?
2. A coffee-maker is regulated so that it takes an average of 5.8 minutes to brew a cup of coffee with a
standard deviation of 0.6 minute. According to Chebyshev’s theorem, what percentage of the times
that this coffee-maker is used will the brewing time take anywhere from
a. 4.6 minutes to 7.0 minutes
b. 3.4 minutes to 8.2 minutes
c. 4.3 minutes to 7.3 minutes
Note: For typical data sets, it is unusual for a score to differ from the mean by more than 2 or 3 standard deviations.
3. A study of the nicotine contents of a certain brand of cigarette shows that on the average one cigarette
contains 1.52 milligrams of nicotine with a standard deviation of 0.07 milligram. According to
Chebyshev’s theorem, between what values must the nicotine content be for
24
a. at least of all cigarettes of this brand?
25
48
b. At least of all cigarettes of this brand?
49
MEASURES OF POSITION
Standard Scores
A z score or standard score for a value is obtained by subtracting the mean from the value and dividing
the result by the standard deviation. The symbol for a standard score is z.
xx
For samples, the formula is z
s
x-μ
For populations, the formula is z =
σ
The z score represents the number of standard deviations that a data value falls above or below the mean.
Example: A student scored 65 on a calculus test that had a mean of 50 and a standard deviation of 10; she
scored 30 on a history test with a mean of 25 and a standard deviation of 5. Compare her relative positions
on the two tests.
Quartiles
Quartiles are values that divide a set of observations into 4 equal parts. These values, denoted by Q1, Q2,
and Q3, are such that
25% of the data falls below Q1,
50% of the data falls below Q2 , and
75% of the data falls below Q3.
Example. The lengths of service (in years) of 16 employees in a certain town hall are
7 1 5 35 28 10 15 22
11 10 12 6 8 14 18 16
Find Q1, Q2, and Q3.
Deciles
Deciles are values that divide a set of observations into 10 equal parts. These values, denoted by D1. D2. . . .
. D9 , are such that
10% of the data falls below D1 ,
20% of the data falls below D2 ,
.
.
.
90% of the data falls below D9 .
Example. The lengths of service (in years) of 16 employees in a certain town hall are
7 1 5 35 28 10 15 22
11 10 12 6 8 14 18 16
Find D1, D4, and D9.
Percentiles
Percentiles are values that divide a set of observations into 100 equal parts. These values, denoted by P1,
P2, P3, . . . , P99 , are such that
1% of the data falls below P1,
2% of the data falls below P2,
3% of the data falls below P3,
.
.
.
99% of the data falls below P99.
Example. The lengths of service (in years) of 16 employees in a certain town hall are
7 1 5 35 28 10 15 22
11 10 12 6 8 14 18 16
Find P17, P43, and P87.
kn
( - Sb ) i
Dk L 10 where:
fd
L = lower boundary of the decile class
n = total number of observations
Sb = sum of the frequencies before the decile class
fd = frequency of the decile class
i = size of the class interval
kn
( - S b )i
Pk L 100 where:
fp
L = lower boundary of the percentile class
n = total number of observations
Sb = sum of the frequencies before the percentile class
fp = frequency of the percentile class
i = size of the class interval
SKEWNESS AND KURTOSIS
Skewness refers to the degree of symmetry or asymmetry of a distribution.
~
Normal distribution is a distribution with a bell-shaped appearance. In a normal distribution, X = X = X̂ .
A distribution is skewed to the left if the mean is less than the median. The bulk of the distribution is on the
right. This is otherwise known as negatively skewed. The graph has a long left tail.
A distribution is skewed to the right if the mean is greater than its median. The bulk of the distribution is on
the left. This is otherwise known as positively skewed. The graph has a long right tail.
3( X - X )
~
SK = where: SK = coefficient of skewness
s
X = the mean
~
X = the median
s = standard deviation
Note:
If SK = 0, the distribution is normal.
If SK < 0, the distribution is skewed to the left.
If SK > 0, the distribution is skewed to the right.
∑ (X - X) 4
Ku = , for ungrouped data
ns 4
∑ f(X - X) 4
Ku = , for grouped data
ns 4
where:
Ku is the kurtosis s standard deviation
X raw data or class mark n sample size
X mean
Note:
If Ku = 3, the distribution is normal.
If Ku > 3, the distribution is leptokurtic.
If Ku < 3, the distribution is platykurtic.
Example: In each of the following numbers, compute the coefficients of skewness and kurtosis. Indicate if the
distribution is normal, skewed to the right or skewed to the left , and also if it is mesokurtic, platykurtic, or
leptokurtic.
1. Given the date set:
72 81 67 83 61 75 78 82 71 67
2. Refer to the table and find the coefficient of skewness. Describe the distribution.
Class limits Frequency
10–20 2
21–31 8
32–42 15
43–53 7
54–64 10
65–75 3