You are on page 1of 12

1.

Descriptive Statistics (1)

What is Statistics?
(1) Collection and Organization of Data:
(i) Graphically: through the use of charts and graphs
(ii) Numerically: through the use of tables of data
(2) Analysis of Data:
Once the data is organized, we can compute various quantities (called statistic or parameters) associated with the data.
(3) Interpretation of Data:
Once we have performed the analysis, we can use the information to make decision about the aggregate / population
Statistical techniques are used in production, construction, marketing etc.
Branches of Statistics
(1) Descriptive Statistics: Descriptive statistics includes statistical procedures that we use to describe the population.
The data could be collected from either a sample or a population, but the results help us organize and describe data.
Descriptive statistics can only be used to describe the group that is being studying. Frequency distributions,
measures of central tendency (mean, median, and mode), and graphs like pie charts and bar charts that describe the
data are all examples of descriptive statistics.
(2) Inferential Statistics: Inferential statistics is concerned with making predictions or inferences about a population
from observations and analysis of a sample. Regression analysis, test of hypothesis, significance, analysis of variance
are the examples of inferential statistics.
Example (1)
Data have been obtained on the lives of batteries of a particular type in an industrial application. Following Table shows
the lives of 36 batteries recorded to the nearest tenth of a year.
4.1 5.1 4.4 2.3 4.3 4.8 1.8 3.3 3.0
5.2 4.0 5.6 4.5 3.9 3.7 5.1 5.8 4.3
2.8 4.1 6.1 4.9 3.2 4.6 4.2 4.4 4.7
4.9 3.5 3.7 5.6 5.0 5.5 6.3 4.8 5.1
(i) Construct a frequency distribution with class width 0,5
(ii) Construct a bar chart, frequency Histogram, frequency polygon and frequency curve
(iii) Construct cumulative Histogram, cumulative frequency polygon and cumulative frequency curve (Ogive)
(DeCoursey-EXCEL-Statistics-and-Probability-for-Engineering-Applications, Example 4.1)

Exercise 1.
The data shown here represent the number of miles per gallon (mpg) that 30 selected four-wheel-drive sports utility
vehicles obtained in city driving. Construct a frequency distribution, and analyze the distribution.
12 17 12 14 16 18 16 18 12 16 17 15 15 16 12
15 16 16 12 14 15 12 15 15 19 13 16 18 16 14
Source: Model Year Fuel Economy Guide. United States Environmental Protection Agency.
The complete ungrouped frequency distribution is

Repeat parts (i) through (iii) of Example (1)


1. Descriptive Statistics (2)

Exercise 2.
The bureau of labor statistics has sampled 30 communities nationwide and compiled prices in each community at the
beginning and end of August in order to find out approximately how the Consumer Price Index has changed during August. The
percentage changes in prices for the 30 communities are as follows:
0.7 0.4 0.3 0.2 0.1 0.1 0.3 0.7 0.0 0.4
0.1 0.5 0.2 0.3 1.0 0.3 0.0 0.2 0.5 0.1
0.5 0.3 0.1 0.5 0.4 0.0 0.2 0.3 0.5 0.4
Using the following four equal sized classes, starting from the minimum value as lower class limit..
th
(Ref. Ex. 2.19 “Statistics for Management” 7 by Levin Rubin)

Exercise 3.
In a group of 500 wage-earners, the weekly wages of 4% were under Rs.60 and those of 15% were under Rs.62.50. 15% of
the workers earned Rs.95 and over, and 5% of them got Rs.100 and over. The median and quartile wages were Rs.82.25, Rs. 72.75
th th
and Rs. 90.50; the 4 and 6 decile wages were Rs. 78.75 and Rs.85.25 respectively. Put the above information in the form of a
frequency distribution and estimate the mean wages of the 500 wage-earners there from.
Averages
The following average measures are also called the central measures
(1) Arithmetic Mean
(2) Geometric Mean
(3) Harmonic Mean
(4) Weighted Mean
Arithmetic Mean
The Arithmetic mean or simply the mean is the most familiar average. It is defined as

Sum of all the observations


Mean =
Number of the observations

 x1+x2+ … +xn xi


For ungrouped data, x = = , (i = 1, 2, … , n)
n n
 f1 x1+f2 x2+ … +fn xn fi xi
For grouped data, x= = , (n= fi)
f1+f2+ … +fn fi
Advantages of Arithmetic Mean
 its concept is familiar to most people and intuitively clear.
 It is a measure that can be calculated, and it is unique because every data set has one and only one mean
 The mean is useful for performing statistical procedure such as comparing the means from several data sets.
Disadvantages of Arithmetic Mean
 It may be affected by the extreme values that are not representative of the rest of the data. e.g. the mean of
the values 4.2, 4.3, 4.7, 4.8, 5.0, 5.1, 9.0 is 5.3. But if we exclude the value 9.0, the answer is about 4.7. The one
extreme value 9.0 distorts (de-shapes) the value we get for the mean.
 It may be time consuming sometime.
 We are unable to compute mean for the data with open ended classes.
Properties
 Mean (a) = a
 Mean (X  a) = Mean (X)  a
 Mean (bX) = b Mean (X)
 Sum of the deviations from mean value is equal to zero.
 
 For the two sets of data with n1, n2 number of values and X1 , X2 mean values respectively,
 
 n X +n X
1 1 2 2
the joint mean X is
n1 + n2
1. Descriptive Statistics (3)

Exercise
Find the arithmetic mean, geometric mean and harmonic mean of the series
n
(i) 1,2,4,8,16,…, 2
n
(ii) 1,3,9,27,81,…, 3 . (Sher)
The Weighted Mean
The weighted mean enables us to calculate an average that takes into account the importance of each value to the overall total.
The weighted mean of a variable X is obtained by multiplying each value by its corresponding weight and dividing the sum of the
products by the sum of the weights.
The formula for calculating the weighted average is
 (wxi) x1w1 + x2w2 + … + xnwn
xw = = w1 + w2 + … + wn
w
Exercises
(1) The following are the monthly salaries in rupees of 30 employees of a firm:
139 126 114 100 88 62 77 99 103 108
144 129 148 63 69 148 132 118 142 116
123 104 95 80 85 106 123 140 134 133
The firm gave bonuses of Rs. 10, 15, 20, 25, 30 and 35 for individuals in the respective salary groups; exceeding 60 but not
exceeding 75, exceeding 75 but not exceeding 90 and so on up to exceeding 135 but not exceeding 150. Find the average
bonus paid per employee.
Example (2)
Dave’s Giveaway Store advertises, “If our average prices are not equal or lower than everyone else’s, you get it free”. One
of Dave’s customers came into the store one day and threw on the counter bills of sale for six items she bought from a competitor
th
for an average price less than Dave’s. (“Statistics for Management”, 7 Ed, by Richard Levin and David Rubin Chap 3)

The items cost:


$1.29, $2.97, $3.49, $5.00, $7.50, $10.95
Dave’s price for the same six items are:
$1.35, $2.89, $3.19, $4.98, $7.59, $11.50
Dave told the customer, “My ad refers to a weighted average price of these items. Our average is lower because our sales
of these items have been”
7, 9, 12, 8, 6, 3
Is Dave getting himself into or out of trouble by talking about weighted averages.
Solution
With unweighted average, we get
 xi 1.29 + 2.97 + 3.49 + 5.00 + 7.50 + 10.95 31.20
xC = = = = $5.20 at the competitor
n 6 6
 xi 1.35 + 2.89 + 3.19 + 4.98 + 7.59 + 11.50 31.50
xD = = = = $5.20 at Dave’s
6 6 6
with weighted average
 (wxi) 7(1.29) + 9(2.97) + 12(3.49) + 8(5.00) + 6(7.50) + 3(10.95) 195.49
xC = = = = $4.344
w 7 + 9 + 12 + 8 + 6 + 3 45
at the competition
 (wxi) 7(1.35) + 9(2.89) + 12(3.19) + 8(4.98) + 6(7.59) + 3(11.50) 193.62
xD = = = = $4.303
w 7 + 9 + 12 + 8 + 6 + 3 45
at Dave’s
Although, Dave is technically correct, the word average in popular usage is equivalent to unweighted average in technical
usage, and the typical customer will surely be angry with Dave’s assertion (whether he or she understands the technical point or not)
Exercise 4.
Bennett Distribution Company, a subsidiary of major appliance manufacturer, is forecasting regional sales for the next
year. The Atlantic branch, with current yearly sales of $193.8 million, is expected to achieve a sales growth of 7.25 percent; the
Midwest branch, with current sales of $79.3 million, is expected to grow by 8.20 percent; and the Pacific branch, with sales of $57.5
million, is expected to increase sales by 7.15 percent. What is the average rate of sales growth forecasted for next year?
(“Statistics for Management”, 7th Ed, by Richard Levin and David Rubin Chap 3)
1. Descriptive Statistics (4)

Exercise ( Bluman )
1. Find the weighted mean price of three models of automobiles sold. The number and price of each of each
model sold are shown in this list.

Model Number Price


A 8 $10,000
B 10 $12,000
C 12 $8,000

2. Using the weighted mean, find the average number of games of fat per ounce of meat or fish that a person
would consume over a 5 day period if he ate these:
Meat or Fish Fat (g/oz)
3 oz fried shrimp 3.33
3 oz veal cutlet (broiled) 3.00
2 oz roast beef (lean) 2.50
2.5 oz fried chicken drumstick 4.40
4 oz tuna (canned in oil) 1.75
Source:- The World Almanac and Book of Facts

3. A recent survey of a new diet cola reported the following percentages of people who liked the taste. Find the
weighted mean of the percentages.

Area % favored Number Surveyed


1 40 1000
2 30 3000
3 50 800

4. The costs of three models of helicopters are shown below. Find the weighted mean of the costs of the models

Model Number sold Cost


Sunscraper 9 $ 427,000
Skycoaster 6 $ 365,000
High-flyer 12 $ 725,000

5. An instructor grades exams, 20%; term paper, 30%; final exam, 50%. A student had grades of 83, 72, and 90,
respectively, for exams, term paper, and final exam. Find the student’s final average. Use the weighted mean.

6. Another instructor gives four 1-hour exams and one final exam, which counts as two 1-hour exams. Find
student’s grade if she received 62, 83, 97, and 90 on the 1-hour exams and 82 on the final exam.

Harmonic Mean
Harmonic mean is used to calculate the average value when the values are expressed as value/unit. Since the speed is
expressed as km/hour, harmonic mean is used for the calculation of average speed.
Harmonic Mean is defined as the reciprocal of the arithmetic mean of reciprocals of the observations.
Harmonic Mean for Ungrouped data;
Let x1, x2, ..., xn be the n observations then the harmonic mean is defined as
(1/xi) n
H.M = Reciprocal of ( )=
n (1/xi)
Example
A man travels from Lahore to Islamabad by a car and takes 4 hours to cover the whole distance. In the first hour he
travels at a speed of 50 km/hr, in the second hour his speed is 64 km/hr, in third hour his speed is 80 km/hr and in the fourth hour
he travels at the speed of 55 km/hr. Find the average speed of the motorist.
Harmonic Mean for grouped data;
Let x1, x2, ..., xn be the n observations with corresponding frequencies f1, f2, …, fn , then the harmonic mean is defined as
fi(1/xi) fi
H.M = Reciprocal of ( )=
fi fi(1/xi)
.
1. Descriptive Statistics (5)

Example
The following data is obtained from the survey. Compute H.M
Speed of the Car 130 135 140 145 150
No. of Cars 2 4 8 9 2

Merits of Harmonic Mean:


 It is rigidly defined
 It is based on all the observations of the series
 It is suitable in case of series having wide dispersion
 It is suitable for further mathematical treatment
 It gives less weight to large items and more weight to small items
Limitations of Harmonic Mean:
 It is difficult to calculate and is not understandable
 All the values must be available for computation
 It is not popular due to its complex calculation.
 It is usually a value which does not exist in series
Exercises
(2) Find the average rate of

(i) motion in case of a person who rides the first mile at the rate of 10 miles an hour, the next mile at the rate
of 8 miles per hour and the third mile at the rate of 6 miles per hour.
(ii) Increase in the population, which in the first decade has increased 20%, in the next 25% and in the third
44%. (Problem 4-108 “Elementary Statistics” by Bluman, chapter 3, page 122 )
(3) A salesperson drives 300 miles round trip at 30 miles per hour going to Chicago and 45 miles per hour returning home.
Find the average miles per hour.
(4) A bus driver drives 50 miles to West Chester at 40 miles per hour and returned driving 25 miles per hour. Find the average
miles per hour.
(5) A carpenter buys $500 worth of nails at $50 per pound and $500 worth of nails at $10 per pound. Find the average cost of
1 pound of nails.
Measures of Location (median. Mode, quartiles, deciles and percentiles)
Median
 If n is odd, then the median is the middle value.
 If n is even, the median is the average of the two middle values.
Mode
The mode is the value that is repeated most often in the data set.
e.g. The ages in years of the cars worked on by the Village Auto Haus last week
5 6 3 6 11 7 9 10 2 4 10 6 2 `1 5. Mode in this case is 6
Example (3)
A computing student received the following grades in subjects of his first semester 2007:
Y = [6; 7; 6; 8; 5; 7; 6; 9; 10; 6] Mode = 6 called unimodel
1,2,3,4,5,6,6,7,7 mode value is 6 and 7 called Bimodal
2,3,4,2,3,4,7,8 2,3,4, are the modes called Multimodal
2,3,4,5,6,7,8 no mode
2,2,3,3,4,4,5,5 no mode
Exercise 5.
A semi-commercial test plant produced the following daily outputs in tonnes/day:
1.3 2.5 1.8 1.4 3.2 1.9 1.3 2.8 1.1 1.7
1.4 3.0 1.6 1.2 2.3 2.9 1.1 1.7 2.0 1.4
Find out the mode?
(ref . McCoursey Chap 4 )
Other Measures of Location
we will discuss here are quartiles, deciles and percentiles
1. Descriptive Statistics (6)

Quartiles
Quartiles divide the distribution into four groups, separated by Q1, Q2, Q3. Note that Q1 is the same as the 25th percentile;
Q2 is the same as the 50th percentile, or the median; Q3 corresponds to the 75th percentile, as shown:

n n n n
For Q1 we see that is an integer or a non-integer 1  , 2 , 3
4 4 4 4
n n th
If is not an integer, then Q1 = [ ] + 1 item in the data let say [7.25] = 7
4  4 
n n n
If 4 is an integer, then Q1 = average of {4 th and(4 +1)th items}
2n 3n
Similarly for Q2 and Q3 we will check whether and is an integer or non-integer respectively, then we find the value of Q2 and
4 4
Q3 same as we did in the case of Q1.
Deciles
Deciles divide the distribution into 10 groups, as shown. They are denoted by D1, D2, etc.

7n
For D7 we see that is an integer or a non-integer
10
7n 7n th
If is not an integer, then D7 = [ ] + 1 item in the data
10  10 
7n 7n 7n
If is an integer, then D7 = average of { th and( +1)th items}
10 10 10
2n 3n
Similarly for D2 and D3 we will check whether and is an integer or non-integer respectively, then we find the value of D2 and
10 10
D3 same as we did in the case of D7.
Percentiles
Percentiles are position measures used in educational and health-related fields to indicate the position of an individual in a group.
Percentiles divide the data set into 100 equal groups.
Percentiles are symbolized by
P1, P2, P3, . . . , P99
and divide the distribution into 100 groups.

27n
For instance, For P27 we see that is an integer or a non-integer
100
27n 27n th
If is not an integer, then P27 = [
100  100 ] + 1 item in the data
27n 27n 27n
If is an integer, then P27 = average of { th and( +1)th items}
100 100 100
25n 30n
Similarly for P25 and P30 we will check whether and is an integer or non-integer respectively, then we find the
100 100
value of P25 and P30 same as we did in the case of P27.
Note that
1. Descriptive Statistics (7)

Dispersion Measures

The main measures are


(1) Range = xmax - xmin = xn  x0
xn  x0
co-efficient of dispersion =
xn + x0
(2) Inter-quartile Range
IQR = Q3  Q1
Q3  Q1
Q.D = , called quartile deviation
2
Q3  Q1
co-efficient of quartile deviation =
Q3 + Q1
which is used for comparing the variation in two or more sets of data.

 xi  x 
(3) Mean Deviation = M.D = , also called absolute measure
n

 (xi  x ) 
= 0 because  (xi  x ) = 0 as the sum of the deviations from mean is always zero
n
(4) Population Variance and Standard Deviation
The variance is the mean of the squared deviations from mean values.
2
2 (xi )
 = (for ungrouped data)
N
alternative formula,
2
2 Xi Xi 2
 = -( ) (for ungrouped data)
N N
The positive square root of the variance is called standard Deviation. Symbolically,
2
(xi)
= (for ungrouped data) data : 3, 3, 3, 3, 3, 3, 3, 3, 3, 3
N
Note : S.D is always non-negative
The standard deviation is a very important concept that serves as a basic measure of variability. A smaller value of the
standard deviation indicates that most of the observations in the data are close to the mean while a larger value implies that the
observations are scattered widely about the mean.
It is an absolute measure of dispersion. Its relative measure called coefficient of standard deviation, is defined as
Standard Deviation
Coefficient of S.D. =
Mean
Standard Deviation
Coefficient of variation =  100
Mean
These measures are used for consistency, reliability and stability
(5) Sample Variance and Standard Deviation
2
In most cases, the expression (x - x) /n does not give best estimate of the population variance. Therefore, instead of
dividing by n, find the variance of the sample by dividing by n  1, giving a slightly larger value and an unbiased estimate of the
2
population variance. The formula for the sample variance denoted by s , is
2
2 (xi x)
s =
n1
and standard deviation of a sample (denoted by s) is
2
(xi)
s=
n1
Note that:

(i) x is an unbiased estimate of the population mean 
2 2
(ii) s is biased estimate of the population variance  .
2
(iii) the division by n-1 is to make s an unbiased estimator of population parameter.
1. Descriptive Statistics (8)

Properties of Variance
(1)
Var .(a) = 0
2
( 2 ) Var (X + a) = Var (X) = 
2
( 3 ) Var (aX) = a Var (X)
( 4 ) Var (X Y)= Var (X) + Var (Y)
2 2
( 5 ) Let x¯1 and s1 be mean and variance of n1 observations and x¯2 and s2 be mean and variance of n2 observations
(n1 and n2 are sufficiently large) then if the variance of n1 + n2 observations prove that
2 2 2 2
2 n 1 s1 + n2 s2 n 1 ( x
¯ 1 - x̄ ) n 2 ( x
¯ 2 - x̄ )
S = + +
n1 +n2 n1 +n2 n1 +n2
Example (4)
The breaking strength of test pieces of a certain alloy is given as under
X: 95 103 97 130 96 73 78 95 89 68
82 79 69 67 83 108 94 87 93 117
Calculate the average breaking strength of the alloy and the standard deviation.
2 2
X X  2 X X  2
|X - X| (X - X) |X - X| (X - X)
67 4489 23.15 535.92 94 8836 3.85 14.823
68 4624 22.15 490.62 95 9025 4.85 23.522
69 4761 21.15 447.32 95 9025 4.85 23.522
73 5329 17.15 294.12 96 9216 5.85 34.222
78 6084 12.15 147.62 97 9409 6.85 46.922
79 6241 11.15 124.32 103 10609 12.85 165.12
82 6724 8.15 66.423 108 11664 17.85 318.62
83 6889 7.15 51.123 117 13689 26.85 720.92
87 7569 3.15 9.9225 130 16900 39.85 1588
89 7921 1.15 1.3225 1803 167653 253 5112.6
93 8649 2.85 8.1225

X 1803 2 2
Mean = = = 90.15 (remember ( X)   X )
n 20
2
X X 2 167653 1803 2
= -( ) = -( ) = 8382.65 - 8127.0225 = 255.6275 = 15.99
n n 20 20
 
xi  x  253  (xi  x )
Mean Deviation = M.D = = =? remember =0
n 20 n
Because the sum of the deviations from mean value is always zero
2
2 (xi x) 5112
Variance =  = = =?
n 20
Example (5)
The mean of the number of sales of cars over a 3-month period is 87, and the standard deviation is 5. The mean of the
commissions is $5225, and the standard deviation is $773. Compare the variations of the two.
Solution
The coefficients of variation are
 5
C.V =   100 = 87  100 = 5.7 % (sales)
x
 773
C.V =   100 = 5225  100 = 14.8 % (commissions)
x
Since the coefficient of variation is larger for commissions, the commissions are more variable than the sales
1. Descriptive Statistics (9)

Empirical Rule and Chebyshev’s Theorem


Chebyshev’s Theorem
At least (1  1/k2) of the data values must be within k standard deviation of the mean, where k is any value greater than 1.
(k is not necessarily an integer).
Some of the implications of this theorem, with k = 2, 3, 4 standard deviations, follow.
 At least 0.75, or 75% of the data values must be within k = 2 standard deviations of the mean
 At least 0.89, or 89% of the data values must be within k = 3 standard deviations of the mean
 At least 0.94, or 94% of the data values must be within k = 4 standard deviations of the mean

The Empirical Rule


Chebyshev’s theorem applies to any distribution regardless of its shape. However, when a distribution is bell-shaped (or
what is called normal), the following statements, which make up the empirical rule, are true.
 Approximately 68% of the data values will fall within 1 standard deviation of the mean.
 Approximately 95% of the data values will fall within 2 standard deviations of the mean.
 Approximately 99.7% of the data values will fall within 3 standard deviations of the mean.
For example, suppose that the scores on a national achievement exam have a mean of 480 and a standard deviation of 90.
If these scores are normally distributed, then approximately 68% will fall between 390 and 570 (480 + 90 = 570 and 480  90 = 390).
Approximately 95% of the scores will fall between 300 and 660 (480 + 2.90 = 660 and 480  2.90 = 300). Approximately 99.7% will
fall between 210 and 750 (480 + 3.90 = 750 and 480  3.90 = 210). See Figure.

Example (6)
Heights of 18-year-old males have a bell-shaped distribution with mean 69.6 inches and standard deviation 1.4 inches.
(a) About what proportion of all such men are between 68.2 and 71 inches tall?
(b) What interval centered on the mean should contain about 95% of all such men?
Solution (a)

x − ks = 68.2  69.6 – k (1.4) = 68.2  k = 1

x + ks = 71  69.6 + k (1.4) = 71  k = 1

by empirical rule, hence 1-S.D interval about the mean x , it contains approx. 68% of the data
Solution (b)

By the Empirical Rule the shortest such interval containing 95% of the data is x ± 2s. So the interval from

x − 2s = 69.6 − 2(1.4) = 66.8

x + 2s = 69.6 + 2(1.4) = 72.4
So this interval, (66.8, 72.4) contains approximately 95% of the data values.
1. Descriptive Statistics (10)

Example (7)

A sample of size n = 50 has mean x = 28 and standard deviation s = 3. Without knowing anything else about the sample,
what can be said about the number of observations that lie in the interval (22, 34)? What can be said about the number of
observations that lie outside that interval?
This means we are given .

x - ks = 22  k = 2
x + ks =34  k = 2
2
Then at least (1−1/k )% = 1 – ¼) % = ¾% = 75% of 50 values = 37.5 = 38 values are contained in the given interval.
Therefore 12 observations fall outside the interval.
Exercise
The mean of a distribution is 20 and the standard deviation is 2. Use Chebyshev’s theorem.
a. At least what percentage of the values will fall between 10 and 30?
b. At least what percentage of the values will fall between 12 and 28? (Bluman ch. 3)
Exercise
The Energy Information Administration reported that the mean retail price per gallon of regular grade gasoline was $2.30
(Energy Information Administration, February 27, 2006). Suppose that the standard deviation was $.10 and that the retail price per
gallon has a bell shaped distribution.
a. What percentage of regular grade gasoline sold between $2.20 and $2.40 per gallon?
b. What percentage of regular grade gasoline sold between $2.20 and $2.50 per gallon?
c. What percentage of regular grade gasoline sold for more than $2.50 per gallon?
(prob. 3.30, Sweeny Chap 3 )

Skewness and Kurtosis


(1) Skewness
If a curve is symmetrical, then the number of values deviating from mean values below
the mean and above the mean are the same. This is called the symmetry. Skewness is
the degree of asymmetry (departure from symmetry of a distribution)
In a symmetrical distribution, the mean, median and mode coincide.

If the frequency curve of a distribution has a longer tail to the right of the central maximum
than to the left, the distribution is said to be skewed to the right or to have positive
skewness. In positive skewed distribution, the mean exceeds the mode.

If the frequency curve of a distribution has a longer tail to the left of the central maximum
than to the right, the distribution is said to be skewed to the left or to have negative
skewness. In negative skewed distribution, the mean is smaller than the mode.

Karl Pearson investigated the following formula to measure the skewness:


mean  mode
Coefficient of Skewness = b1 =
standard deviation
Led Bowley introduced the following measure of skewness
Q3 + Q1  2Q2
coefficient of skewness = b1 =
Q3  Q1
Another measure of skewness is
m3
coefficient of skewness = b1 = 3/2
m2
If b1 = 0 , we say that the distribution is symmetrical
If b1 > 0 , we say that the distribution is positively skewed
And if b1 < 0 , we say that the distribution is negatively skewed
1. Descriptive Statistics (11)

Kurtosis
Kurtosis is the degree of peakness of a distribution. A distribution having relatively high peak is called Lepto-Kurtic
whereas a distribution having flat topped is called Platy Kurtic. A frequency curve which is neither very high peaked nor vary flat
topped is called Meso-kurtic or a Normal curve having a Normal distribution.

Kurtosis is measured by the following formula:


m4 2
coefficient of Kurtosis = b2 = 2 = 6/(1.5) = 2.66 (curve is plety-Kurtic, i.e. a flat curve we have)
m2
b2 = 3, showing the curve to be Meso-kurtic or normal
b2 < 3 , showing the curve to be for Plety-Kurtic or a flat curve
b2 > 3 , showing the curve to be lepto-Kurtic or with a high peak
Another measure of Kurtosis is:
Q.D (Q3 - Q1)/2 Q3 - Q1
Percentile coefficient of Kurtosis = b2 = = =
P90  P10 P90  P10 2(P90  P10)
Q3Q1
Where Q.D = quartile deviation =
2
Exercise 6.
The second moment about the mean of two distributions are 8.5 and 22.5 while the third moments about mean are 5.1
and 2.8. Which of the distribution is (i) symmetrical (ii) positively skewed and (iii) negatively skewed.
Exercise 7.
The second moments about the mean of two distributions are 9 and 16, while the fourth moments about mean are 230
and 780 resp. Which of the distribution is (i) Lepto-Kurtic (ii) Meso-Kurtic and (iii) Platy-Kurtic.
Exercise 8.
The 4th moment about the mean of symmetrical distribution is 243. what would be the value of standard deviation in
order that the distribution may be normal.
Exercise 9.
The fourth mean moment of a symmetrical distribution is 243. What would be the value of the standard deviation in
order that the distribution may be meso-kurtic.
Exercise 10.
What can you say of skewness in each case of the following cases;
(i) The median is 26.01, while the two quartiles are 13.73 and 38.29.
(ii) Mean = 140 and mode = 148.7
(iii) Mean = 129.5 and median = 128.7
(iv) The first three moments about the value 16 are respectively  0.35, 2.9 and 1.93
Exercise 11.
The second moments about the mean of the two distributions are 8.5 and 22.5 while the third moments about the mean
are 5.1 and  2.8. which of the distribution is symmetrical, positively skewed and negatively skewed.
Exercise 12.
Which of the following is correct in a positively skewed and negatively skewed distribution
(i) The arithmetic mean is greater than the mode.
(ii) The arithmetic mean is less than the mode.
(iii) The arithmetic mean is greater than the median.
(iv) The median is greater than the mode.
Exercise 13.
The length of stay on the cancer floor of Apolo Hospital were organized into a frequency distribution. The mean length of
stay was 28 days, the medial 25 days and modal length is 23 days. The standard deviation was computed to be 4.2 days.
1. Descriptive Statistics (12)

Exercise 14.
2 2
Let x¯1 and s1 be mean and variance of n1 observations and x¯2 and s2 be mean and variance of n2 observations
(n1 and n2 are sufficiently large) then if the variance of n1 + n2 observations prove that
2 2 2
2 n1 s1 + n2 s2 n1n2( x¯1 - x¯2 )
S = + 2
n1 +n2 (n1 +n2)
solution
Group I : x11 , x12 , x13 , x14 , … , x1n1 ; with x¯1
Group II : x21 , x22 , x23 , x24 , … , x2n2 , with x¯2
and x̄ be the combined mean of both data sets

Let xij = jth observation of the ith subgroup. i = 1, 2 and j = 1,2, ….. , ni
2 1 2 ni 2 n1 x¯1 + n2 x¯2
S = (x - x̄ ) where x̄ =
n1+n2   ij n1+n2
i =1 j =1
1 2 ni 2
= [(x - ¯x )+( ¯xi - x̄ )]
n1+n2   ij i
i =1 j =1
1 2 ni 2 2
= [(x ¯x ) +( ¯xi - x̄ ) + 2(xij - ¯xi ) ( ¯xi -x̄ ) ]
n1+n2   ij i
i =1 j =1

1 2 ni 2 1 2 ni 2 1 2 ni
=
n1+n2   [(xij - ¯xi ) ] +
n1+n2   [( ¯xi -x̄ ) ] +2
n1+n2 
 [(xij - ¯xi ) ( ¯xi -x̄ ) ]
i=1 j=1 i=1 j=1 i=1 j=1
ni
since  [(xij - ¯xi )] = 0 therefore
j=1

1 2 ni 2 1 2 ni 2
=n +n   [(xij - ¯xi ) ] +n +n   [( ¯xi -x̄ ) ]
1 2 1 2
i=1 j=1 i=1 j=1
2 2 2 2
n1 s1 + n2 s2 n1( x¯1 -x̄ ) + n2( x¯2 -x̄ )
= + 2 ------------------------(A)
n1 +n2 (n1 +n2)
n1 x¯1+ n2 x¯2
Where x¯1 -x̄ = x¯1 -
n1+n2
n1 x¯1+ n2 x¯1- n1 x¯1- n2 x¯2 n2( x¯1- x¯2)
= x¯1 - =
n1+n2 n1+n2
Similarly
n1 x¯1+ n2 x¯2
x¯2 -x̄ = x¯2 -
n1+n2
n1 x¯2+ n2 x¯1- n1 x¯1- n2 x¯2 -n1( x¯1- x¯2)
= x¯2 - =
n1+n2 n1+n2
From (A)
2 2 2 2 
2 n1 s1 + n2 s2 1  n2( x¯1- x¯2) -n1( x¯1- x¯2)
(n1 +n2)  n1 n +n  
S = + 
n1 +n2 +n2
  1 2   n1+n2  
2 2 2 2
n1 s1 + n2 s2 1  n1 n2 2 n1 n2 2 
= + 2(x
¯ - x¯ ) + 2(x
¯ - x¯ )
n1 +n2 (n1 +n2)  (n1 +n2) 1 2 (n1 +n2) 1 2 
2 2
n1 s1 + n2 s2 1  n1n2(n1 + n2) 2 
= + 2 ( x¯1- x¯2)
n1 +n2 (n1 +n2)  (n1 +n2) 
2 2 2
n1 s1 + n2 s2 n1n2( x¯1 - x¯2 )
= + 2
n1 +n2 (n1 +n2)

You might also like