Constructing histograms and frequency distributions

Handout 1 (Complete) (1)
Semester : Fall 2020, Subject Teacher:

th MUHAMMAD NAEEM SANDHU
6 Sem. Civil Engineering, 2018
Subject : MA 356 Probability and Statistics in Engineering Cell# 03224577602
Email: naeemuetlahore@hotmail.com
Why Statistics in Civil Engineering?
In civil engineering, uncertainties in hydrology and water resources engineering arises from incompleteness of historical data,
limitations in adequate representation of sample data, variability of hydrologic data, uncertain predictions etc. Assessment of uncertainty
is carried out through different probability methods, i.e., distribution fitting to data, probability and quantile estimation, interval
estimation of parameters etc.
In structural engineering, failure can cause excessive monetary loss and injury or death. Thus, an extremely low rate of failure
is assured in design. Safety factors are determined by considering risk or probability of failure. Different standards of acceptance are
developed based on the probability concepts.
The common probabilistic theories and tools like histogram, estimation of mean, variance, standard deviation, and probability
distribution functions are useful for all different disciplines in civil engineering.
What is Statistics?
(1) Collection and Organization of Data:
(i) Graphically: through the use of charts and graphs
(ii) Numerically: through the use of tables of data
(2) Analysis of Data:
Once the data is organized, we can compute various quantities (called statistic or parameters) associated with the data.
(3) Interpretation of Data:
Once we have performed the analysis, we can use the information to make decision about the aggregate
Reference Book:
1. “the theory and problems of STATISTICS” 4th Edition by Schaum’s Outline series.
2. “Applied Statistics for Civil and Environmental Engineers ” 2nd Ed., by Nathabandu T. Kottegoda and Renzo Rosso,
Blackwell Publishers, 2008.
Descriptive Statistics: Descriptive statistics includes statistical procedures that we use to describe the population. The data could be
collected from either a sample or a population, but the results help us organize and describe data. Descriptive statistics can only be
used to describe the group that is being studying. Frequency distributions, measures of central tendency (mean, median, and mode),
and graphs like pie charts and bar charts that describe the data are all examples of descriptive statistics.
Inferential Statistics: Inferential statistics is concerned with making predictions or inferences about a population from observations
and analysis of a sample. Regression analysis, test of hypothesis, significance, analysis of variance are the examples of inferential
statistics.
Example 1 : Thirty batteries were tested to determine how long they would last. The results, to the nearest minute, were recorded as:
423, 369, 387, 411, 393, 394, 371, 377, 389, 409, 392, 408, 431, 401, 363, 391, 405, 382, 400, 381, 399, 415, 428, 422, 396, 372, 410,
419, 386, 390 Construct a frequency distribution table. Also Construct a histogram and Ogive
ANS. Range = xmax  xmin = 431 – 363 = 68
Frequency Classes f c.f C.B
C.L C.B Tally
( f )
360-369 360-370 || 2 360-369 2 2 359.5-369.5 363-372
370-379 370-380 ||| 3 370-379 3 5 369.5-379.5 373-382
380-389 380-390 |||| | 5 380-389 5 10 379.5-389.5 383-392
390-399 390-400 |||| ||| 7 390-399 7 17 389.5-399.5 393-402
400-409 400-410 |||| | 5 400-409 5 22 399.5-409.5 403-412
410-419 410-420 |||| 4 410-419 4 26 409.5-419.5 413-422
420-429 420-430 ||| 3 420-429 3 29 419.5-429.5 423-432
430-439 430-440 | 1 430-439 1 30 429.5-439.5
30 Given
Note: in class boundaries, all upper class limits are considered in their next class interval
Exercise 1 : The bureau of labor statistics has sampled 30 communities nationwide and compiled prices in each community at the beginning
and end of August in order to find out approximately how the Consumer Price Index has changed during August. The percentage changes in
prices for the 30 communities are as follows: Ref. Ex. 2.19 “Statistics for Management” 7th by Levin Rubin
0.7 0.4 0.3 0.2 0.1 0.1 0.3 0.7 0.0 0.4
0.1 0.5 0.2 0.3 1.0 0.3 0.0 0.2 0.5 0.1
0.5 0.3 0.1 0.5 0.4 0.0 0.2 0.3 0.5 0.4
Using the following four equal sized classes, starting from the minimum value as lower class limit. .
Example 2 the weights of 40 male students at University are recorded to the nearest pound. Construct frequency distribution.
classes Mid points Tally frequency Cf
119 135 138 144 146 150 156 164 119-127 122.5 ||| 3 3
127-135 130.5 || 2 5
125 135 140 144 147 150 157 165 135-143 |||| |||| || 10 15
143-151 |||| |||| |||| 12 27
126 135 140 145 147 152 158 168
151-159 |||| || 6 33
128 136 142 145 148 153 161 173 159-167 |||| 4 37
167-175 || 2 39
132 138 142 146 149 154 163 176 175-183 | 1 40
.
.
Examples (1) (a line diagram or a bar chart)

The occurrences of a discrete variable can be classified on a line diagram or
bar chart.
Consider the annual number of floods of the Magra River at Calamazza,
situated between Pisa and Genoa in northwestern Italy, over a 34-year
period, as shown in Table 1.1.1.
A flood in the river at the point of measurement means the river has risen
above a specified level, beyond which the river poses a threat to lives and
property. The data are plotted in Fig. 1.1.1 as a line diagram.
no. of floods in 0 1 2 3 4 5 6 7 8 9 Total
a year
no. of 0 2 6 7 9 4 1 4 1 0 34
occurrences
.Ref. Book 2
Example 2 (Histogram with unequal classes)
A Company manufactures metal rods in different lengths. The table given below shows information of a day’s production of the company.
Length (cm) 10-19 20-29 30-39 40-49 50-69 70-99 100-139
No. of metal rods (Frequency) 6 7 8 10 10 9 8

The size of the first four intervals is equal but the sizes of 5th ,6 th
and the 7th are unequal.
In such cases we find proportional height for rectangular bars. So we construct table as follows:
And now construct histogram by taking class boundaries along x-axis and proportional height along y-axis.
C.L C.B Freq. Class Width Proportional

Height
10-19 9.5-19.5 6 10 6 ÷ = 10 = 6
20-29 19.5-29.5 7 10 7 ÷ 10 = 7
30-39 29.5-39.5 8 10 8 ÷ 10 = 8
40-49 39.5-49.5 10 10 1.0
50-69 49.5-69.5 10 20 0.5
70-99 69.5-99.5 9 30 0.3
100-139 99.5-139.5 8 40 0.2
Book: “Theory and Problems of Statistics” 4th Edition by Schaum’s Series; Practice Problems: 2.27, 2.28, 2.29
Measures of Location
Median
- If n is odd, then the median is the middle value.
- If n is even, the median is the average of the two middle values.
Mode
The mode is the value that is repeated most often in the data set.
e.g. The ages in years of the cars worked on by the Village Auto Haus last week
5 6 3 6 11 7 9 10 2 4 10 6 2 `1 5. Mode in this case is 6
Examples (2)
A computing student received the following grades in subjects of his first semester 2007:
Y = [6; 7; 6; 8; 5; 7; 6; 9; 10; 6] Mode = 6 called unimodel
1,2,3,4,5,6,6,7,7 mode value is 6 and 7 called Bimodal
2,3,4,2,3,4,7,8 2,3,4, are the modes called Multimodal
2,3,4,5,6,7,8 no mode
2,2,3,3,4,4,5,5 no mode
Exercise 1.
A semi-commercial test plant produced the following daily outputs in tonnes/day:
1.3 2.5 1.8 1.4 3.2 1.9 1.3 2.8 1.1 1.7
1.4 3.0 1.6 1.2 2.3 2.9 1.1 1.7 2.0 1.4
Find out the mode?
(ref . McCoursey Chap 4 )
Other Measures of Location
we will discuss here are quartiles, deciles and percentiles
Quartiles
Quartiles divide the distribution into four groups, separated by Q1, Q2, Q3. Note that Q1 is the same as the 25th percentile; Q2 is the
same as the 50th percentile, or the median; Q 3 corresponds to the 75th percentile, as shown:
n
For Q1 we see that 4 is an integer or a non-integer
n n
If 4 is not an integer, then Q1 = [ 4 ] + 1 item in the data
th
 
n n n
If 4 is an integer, then Q1 = average of {4 th and(4 +1)th items}
2n 3n
Similarly for Q2 and Q3 we will check whether 4 and 4 is an integer or non-integer respectively, then we find the value of Q 2 and Q3 same as
we did in the case of Q1.
Deciles
Deciles divide the distribution into 10 groups, as shown. They are denoted by D1, D2, etc.
7n
For D7 we see that 10 is an integer or a non-integer
7n
If 10 is not an integer, then
7n
D7 = [ 10 ] + 1 item in the data
th
 
7n
If 10 is an integer, then
7n 7n
D7 = average of {10 th and(10 +1)th items}
2n 3n
Similarly for D2 and D3 we will check whether 10 and 10 is an integer or non-integer respectively, then we find the value of D2 and D3 same as
we did in the case of D7.
Percentiles
Percentiles are position measures used in educational and health-related fields to indicate the position of an individual in a group.
Percentiles divide the data set into 100 equal groups.
Percentiles are symbolized by
P1, P2, P3, . . . , P99
and divide the distribution into 100 groups.
27n
For instance, For P27 we see that 100 is an integer or a non-integer
27n 27n
If 100 is not an integer, then P27 = [ 100 ] + 1 item in the data
th
 
27n 27n 27n
If 100 is an integer, then P27 = average of {100 th and(100 +1)th items}
25n 30n
Similarly for P25 and P30 we will check whether 100 and 100 is an integer or non-integer respectively, then we find the value of P 25 and
P30 same as we did in the case of P27.
Note that
Median is a point that cumulates 50% of the data below this point (i.e 2n/4 or n/2)
Q1 is a point that cumulates 25% of the data below this point ( i.e. n/4)
Q3 is a point that cumulates 75% of the data below this point ( i.e. 3n/4)
D7 is a point that cumulates 70% of the data below this point ( i.e. 7n/10)
P45 is a point that cumulates 45% of the data below this point ( i.e. 45n/100)
Note:
For all of these measures, first we arrange our data in ascending order.
Examples (3)
The breaking strength of test pieces of a certain alloy is given as under
X: 95 103 97 130 96 73 78 95 89 68
82 79 69 67 83 108 94 87 93 117
Find quartiles, deciles and percentiles.
Solution
Arranged form of the data:
67, 68, 69, 73, 78, 79, 82, 83, 87, 89, 93, 94, 95, 95, 96, 97, 103, 108, 117, 130
Mode = the most frequent value in the data = 95

89+93
Median is either the middle value or the average of two middle values in the data which is 2 = 91
89+93
For Q2, 2n/4 = n/2 = 10 (integer), Q2 = Ave (10th and the 11th values in the data) = 2 = 91
96+97
For Q3, 3n/4 = 15 (integer), then Q3 = Ave (15th and 16th items in the data) = 2 = 96.5
For 7th decile , 7n/10 = 14 (integer), D7 = Ave (14th and 15th values in the data) = (95 + 96)/2 = 95.5
For P70 ; 70n/100 = 14 (integer) therefore P70 = Ave (14th and 15th values in the data) = (95 + 96)/2 = 95.5
Data II : 67, 68, 69, 73, 78, 79, 82, 83, 87, 89, 93, 94, 95, 95, 96,
For Q3 , 3n/4 = 11.25 (non-integer), Q3 = ( [11.25] + 1 ) th = (11 + 1) th item in the data = 12 th item in the data = 94
P35 , 35n/100 = 5.25 (non-integer), P35 = ( [5.25] +1 )th item in the data =( 5 + 1)th item in the data = 6 th value in the data = 79
Note that bracket [ x ] provides an integral value which is equal to x or an integral value just below x
This means [3] = 3 , [7] = 7, [37] = 37
And [3.1] = 3, [3.9] = 3, [ 3.999999] = 3
Examples (4)
Let consider grouped data (or a frequency distribution)
Classes 10-19 20-29 30-39 40-49 50-59 60-69 70-79 80-89
frequency 6 7 8 10 12 9 7 4
Calculate median, first quartile, 7 th decile and 45th percentile, also calculate mode.
classes f C.B c.f

10-19 6 9.5-19.5 6 h n 10
Median = l1 + f { 2 - C } = 49.5 + 12 {31.5 - 31 } = 49.9
20-29 7 19.5-29.5 13
30-39 8 29.5-39.5 21 h n 10
Q1 = l1 + f { 4 - C } = 29.5 - 8 {15.75 - 13} = ?
40-49 10 = f1 39.5-49.5 31
50-59 12= fm 49.5-59.5 43 h 3n
Q3 = l1 + f { 4 - C }
60-69 9 = f2 59.5-69.5 52
70-79 7 69.5-79.5 59 h 7n
D7 = l1 + f { 10 - C }
80-89 4 79.5-89.5 63
n = 63 h 45n
P45 = l1 + f { 100 - C }
fm - f 1 12 - 10
Mode = l1 + (f - f )+(f - f )  h = 49.5 + (12 - 10)+(12 - 9 )  10 = 53.5
m 1 m 2
Quartiles, Deciles and Percentiles with the help of Ogive. Graphically we can find out all the three quartiles as:
Similarly we can find out deciles and percentiles using ogive here in this example we have n = 83
Practice Problems, Chap 3 Solved problems: 3.19, 3.26, 3.35, 3.37, 3.40, 3.44, 3.45
The Weighted Mean

The weighted mean enables us to calculate an average that takes into account the importance of each value to the overall total.
The weighted mean of a variable X is obtained by multiplying each value by its corresponding weight and dividing the sum of the products by
the sum of the weights.
The formula for calculating the weighted average is
 (wxi)
xw =
w
Exercise
(1) An instructor gives four 1-hour exams and one final exam, which counts as two 1-hour exams. Find a student’s grade if she
received 62, 83, 97, and 90 on the 1-hour exams and 82 on the final exam. (Bluman Problem 3.31)
(2) An instructor grades exams, 20%; term paper, 30%; final exam, 50%. A student had grades of 83, 72, and 90, respectively,
for exams, term paper, and final exam. Find the student’s final average. Use the weighted mean.
(3) A salesperson drives 300 miles round trip at 30 miles per hour going to Chicago and 45 miles per hour returning home. Find
the average miles per hour. (Harmonic Mean)
Average speed = (30 + 45)/2 = 37.5 miles per hour this is wrong
2
H.M = 1 1 = 36 miles per hour
30+45
Verify: time taken on the way going; t1 = 300/30 = 10
Time taken on the way back; t2 = 300/45 = 6.67
total distance 600
Now speed = total time = 16.67 = 35.99  36 miles per hour
(4) A bus driver drives 50 miles to West Chester at 40 miles per hour and returned driving 25 miles per hour. Find the average
miles per hour. (Harmonic Mean)
(5) A carpenter buys $500 worth of nails at $50 per pound and $500 worth of nails at $10 per pound. Find the average cost of 1
pound of nails.
(6) Bennett Distribution Company, a subsidiary of major appliance manufacturer, is forecasting regional sales for the next year.
The Atlantic branch, with current yearly sales of $193.8 million, is expected to achieve a sales growth of 7.25 percent; the
Midwest branch, with current sales of $79.3 million, is expected to grow by 8.20 percent; and the Pacific branch, with sales
of $57.5 million, is expected to increase sales by 7.15 percent. What is the average rate of sales growth forecasted for next
year? (“Statistics for Management”, 7th Ed, by Richard Levin and David Rubin Chap 3)
(7) A recent survey of a new diet cola reported the following percentages of people who liked the taste. Find the weighted
mean of the percentages.
Area % favored Number Surveyed

1 40 1000
2 30 3000
3 50 800
(8) The costs of three models of helicopters are shown below. Find the weighted mean of the costs of the models
Model Number sold Cost

Sunscraper 9 $ 427,000
Skycoaster 6 $ 365,000
High-flyer 12 $ 725,000
The Harmonic Mean

This mean is useful for finding the average speed. Suppose a person drove 100 miles at 40 miles per hour and returned deriving 50
40 + 50
miles per hour. The average miles per hour is not 2 = 45 miles per hour. Correct average is found as shown:
Since Time = distance / rate, then
100
Time 1 = 40 = 2.5 hours to make a trip and
100
Time 2 = 50 = 2 hours to return
Hence total time is 4.5 hours, and total miles driven are 200. Now the average speed is
distance 200
Rate = time = 4.5 = 44.44 miles per hour
This value can also be found by using the harmonic mean as
2 1/40 + 1/50
H.M = 1/40 + 1/50 = 44.44 ( 1/H.M = 2 )
2
Definition (Harmonic Mean of the two values a and b, H.M = 1 1)
a+b
(1/xi) n
The harmonic mean is the reciprocal of the mean of the reciprocals. i.e H.M = Reciprocal of ( n ) =
(1/xi)
Exercise 2.
Using the harmonic mean, find each of these.
(a). A salesperson drives 300 miles round trip at 30 miles per hour going to Chicago and 45 miles per hour returning home. Find the average
miles per hour.
(b). A bus driver drives the 50 miles to West Chester at 40 miles per hour and returns driving 25 miles per hour. Find the average miles per
hour.
(c). A carpenter buys $500 worth of nails at $50 per pound and $500 worth of nails at $10 per pound. Find the average cost of 1 pound of
nails.
Dispersion Measures
The main measures are
(1) Range = xmax - xmin = xn  x0
xn  x0
co-efficient of dispersion = x + x
n 0
(2) Inter-quartile Range
IQR = Q3  Q1
Q3  Q1
Q.D = 2 , called quartile deviation
Q3  Q1
co-efficient of quartile deviation = Q + Q
3 1
which is used for comparing the variation in two or more sets of data.

 xi  x 
(3) Mean Deviation = M.D = n , also called absolute measure

 (xi  x ) 
n = 0 because  (xi  x ) = 0 as the sum of the deviations from mean is always zero
(4) Population Variance and Standard Deviation
The variance is the mean of the squared deviations from mean values.
(xi )2
2 = N (for ungrouped data)
alternative formula,
Xi2 Xi
2 = N - ( N )2 (for ungrouped data)
The positive square root of the variance is called standard Deviation. Symbolically,
(xi)2
= N (for ungrouped data) data : 3, 3, 3, 3, 3, 3, 3, 3, 3, 3
Note : S.D is always non-negative
The standard deviation is a very important concept that serves as a basic measure of variability. A smaller value of the
standard deviation indicates that most of the observations in the data are close to the mean while a larger value implies that the
observations are scattered widely about the mean.
It is an absolute measure of dispersion. Its relative measure called coefficient of standard deviation, is defined as
Standard Deviation
Coefficient of S.D. = Mean
Standard Deviation
Coefficient of variation = Mean  100
These measures are used for consistency, reliability and stability
(5) Sample Variance and Standard Deviation

In most cases, the expression (x - x)2/n does not give best estimate of the population variance. Therefore, instead of
dividing by n, find the variance of the sample by dividing by n  1, giving a slightly larger value and an unbiased estimate of the
population variance. The formula for the sample variance denoted by s2 , is

(xi x)2
s =
2
n1
and standard deviation of a sample (denoted by s) is
(xi)2
s=
n1

note : (i) x is an unbiased estimate of the population mean 
(ii) s2 is biased estimate of the population variance 2.
(iii) the division by n-1 is to make s2 an unbiased estimator of population parameter.
Examples (5)
The breaking strength of test pieces of a certain alloy is given as under
X: 95 103 97 130 96 73 78 95 89 68
82 79 69 67 83 108 94 87 93 117
Calculate the average breaking strength of the alloy and the standard deviation.
X X2   X X2  
|X - X| (X - X)2 |X - X| (X - X)2
67 4489 23.15 535.92 93 8649 2.85 8.1225
68 4624 22.15 490.62 94 8836 3.85 14.823
69 4761 21.15 447.32 95 9025 4.85 23.522
73 5329 17.15 294.12 95 9025 4.85 23.522
78 6084 12.15 147.62 96 9216 5.85 34.222
79 6241 11.15 124.32 97 9409 6.85 46.922
82 6724 8.15 66.423 103 10609 12.85 165.12
83 6889 7.15 51.123 108 11664 17.85 318.62
87 7569 3.15 9.9225 117 13689 26.85 720.92
89 7921 1.15 1.3225 130 16900 39.85 1588
Total: 1803 167653 253 5112.6
X 1803
Mean = n = 20 = 90.15 (remember ( X)2   X2 )
X2 X 2 167653 1803 2

= n -( n ) = 20 - ( 20 ) = 8382.65 - 8127.0225 = 255.6275 = 15.99
 
xi  x  253  (xi  x )
Mean Deviation = M.D = n = 20 = ? remember n =0
Because the sum of the deviations from mean value is always zero

(xi x)2 5112
Variance =  =
2
n = 20 = ?
Examples (6)
The mean of the number of sales of cars over a 3-month period is 87, and the standard deviation is 5. The mean of the commissions is
$5225, and the standard deviation is $773. Compare the variations of the two.
Solution
The coefficients of variation are
 5
C.V = 
  100 = 87  100 = 5.7 % (sales)
x
 773
C.V = 
  100 = 5225  100 = 14.8 % (commissions)
x
Since the coefficient of variation is larger for commissions, the commissions are more variable than the sales.
.
Empirical Rule and Chebyshev’s Theorem

(1) Empirical Rule
when a distribution is bell-shaped ( or what is called normal ), the following statements, which make up the empirical rule, are true.
Approximately 68% of the data values will fall within 1 standard deviation of the mean.
Approximately 95% of the data values will fall within 2 standard deviations of the mean.
Approximately 99.7% of the data values will fall within 3 standard deviations of the mean.
Examples (7)
Heights of 18-year-old males have a bell-shaped distribution with mean 69.6 inches and standard deviation 1.4 inches.
(a) About what proportion of all such men are between 68.2 and 71 inches tall?
(b) What interval centered on the mean should contain about 95% of all such men?
Solution (a)

x − ks = 68.2  69.6 – k (1.4) = 68.2  k = 1

x + ks = 71  69.6 + k (1.4) = 71  k = 1

by empirical rule, hence 1-S.D interval about the mean x , it contains approx. 68% of the data
Solution (b)

By the Empirical Rule the shortest such interval containing 95% of the data is x ± 2s. So the interval from

x − 2s = 69.6 − 2(1.4) = 66.8

x + 2s = 69.6 + 2(1.4) = 72.4
So this interval, (66.8, 72.4) contains approximately 95% of the data values.
Difficulty:
Alternatively Part (a) lets say we have the limits 68.2 and 72.4

x − ks = 68.2  69.6 – k (1.4) = 68.2  k = 1 corresponding percentage 68%

x + ks = 72.4  69.6 + k (1.4) = 72.4  k = 2 corresponding percentage 95%
Required %age = Average (68% and 95%) = (68/2 + 95/2)% = 81.5% Approx. ans
Note : If value of k is different from 1 , 2 or 3, then %age will be calculated by (1−1/k 2)% , K>1
(2) Chebyshev’s Theorem

The Empirical Rule does not apply to all data sets, only to those that are bell-shaped, and even then is stated in terms of
approximations. A result that applies to every data set is known as Chebyshev’s Theorem.
For any numerical data set,
 at least (1−1/k2)% of the data lie within k standard deviations of the mean,
  
that is, in the interval x ± ks or (x - ks, x + ks) will contain at least (1−1/k2)% of the data,
where k is any positive whole number that is greater than 1. ( k > 1)
Examples (8)

A sample of size n = 50 has mean x = 28 and standard deviation s = 3. Without knowing anything else about the sample, what
can be said about the number of observations that lie in the interval (22, 34)? What can be said about the number of observations that
lie outside that interval?
This means we are given .

x - ks = 22  k = 2
x + ks =34  k = 2
Then at least (1−1/k2)% = 1 – ¼) % = ¾% = 75% of 50 values = 37.5 = 38 values are contained in the given interval.
Therefore 12 observations fall outside the interval.
Exercise
The mean of a distribution is 20 and the standard deviation is 2. Use Chebyshev’s theorem.
a. At least what percentage of the values will fall between 10 and 30?
b. At least what percentage of the values will fall between 12 and 28? (Bluman ch. 3)
Exercise
The Energy Information Administration reported that the mean retail price per gallon of regular grade gasoline was $2.30
(Energy Information Administration, February 27, 2006). Suppose that the standard deviation was $.10 and that the retail price per
gallon has a bell shaped distribution.
a. What percentage of regular grade gasoline sold between $2.20 and $2.40 per gallon?
b. What percentage of regular grade gasoline sold between $2.20 and $2.50 per gallon?
c. What percentage of regular grade gasoline sold for more than $2.50 per gallon?
(prob. 3.30, Sweeny Chap 3 )
Moments and Moment Ratios
Moments
Moments are the arithmetic means of the powers to which the deviations are raised. The mean of the first power of the
deviation from mean is the first moment about mean. The mean of the second power of the deviation from mean is the second
moment about mean and so on….
(i) First four moments about mean are:
(i) For ungrouped data:
(xi  x)1
m1 = n =0
(xi  x)2
m2 = n
(xi  x)3
m3 = n
(xi  x)4
m4 = n
For group data;
fi (xi  x)1

m1 =
fi
fi (xi  x)2

m2 =
fi
fi (xi  x)3

m3 =
fi
fi (xi  x)4

m4 =
fi
(ii) First four moments about origin or about x = 0 are:

For ungrouped data:
(xi  0)1 xi1
m1 = n = n
(xi  0)2 xi2 xi2 xi

m2 = n = n recall that Variance = n  ( n )2
(xi  0)3 xi3

m3 = n = n
(xi  0)4 xi4

m4 = n = n
For group data;

fi (xi  0)1 fi xi1
m1 = = n
fi
fi (xi  0)2 fi xi2
m2 = = n
fi
fi (xi  0)3 fi xi3
m3 = = n
fi
fi (xi  0)4 fi xi4
m4 = = n
fi
(ii) First four moments about any arbitrary mean ‘a’ or about x = a are:
For ungrouped data:
(xi  a)1
m1 = n

(xi  a)2
2 = n
(xi  a)3
m3 = n
(xi  a)4
m4 = n
For group data;
fi (xi  a)1
m1 =
fi
fi (xi  a)2
m2 =
fi
fi (xi  a)3
m3 =
fi
fi (xi  a)4

m4 =
fi
First Four Moments about Mean in Terms of Moments about Origin:

m1 = 0
xi2 xi
m2 = m2 - m12 = n  ( n )2
m3 = m3 - 3 m2m1+2m12
m4 = m4 - r m3m1+ 6m2m12 –3m14
Verification: (optional)
Examples (9)
The first three moments of a distribution about the value 2 of the variable are 1, 16, -40. Show that the mean is 3, the
variance is 15 and m3 = -86. Also show that first three moments about x = 0 are 3, 24, 76.
solution
1 n
m1 = n  (xi -2 ) = 1
i=1
1 n
m2 = n  (xi -2 )2 = 16
i=1
1 n
m3 = n  (xi -2 )3 = -40
i=1
m1 = m1-m1= 0
m2 = m2 - m12 =16 – 1 = 15
m3 = m3- 3m2m1 + 2m13 = -40-3(16)(1)+2(1)3 = -86
1 n 1 n
m1 = n  (xi -2 ) =n  xi – 2 = 1
i=1 i=1
1 n 1 n
1 = n  xi - 2  n  xi = 2+1=3
i=1 i=1
when a = 0
1 n 1 n 1 n
m1 = n  xi m2 = n  xi2 m3 = n  xi3
i=1 i=1 i=1
1 n
m1 = n  xi = 3
i=1
1 n 1 n 1 n 4 n
m2 = n  (xi -2 )2 =n  (xi2 + 4 -4xi) =n  xi2 - n  xi + 4 = 16
i=1 i=1 i=1 i=1
1 n 4 n
 m2 = n  xi2 = 16 + n  xi – 4 = 16 + 4(3) – 4 = 24
i=1 i=1
1 n 1 n 6 n 12 n
m3 = n  (xi -2 )3 =n  xi3 - n  xi2 + n  xi – 8 = - 40
i=1 i=1 i=1 i=1
1 n 3
 n  xi – 6  24 + 12  3 – 8 = -40
i=1
1 n
m3 = n  xi3 = ( 9  24) - (12  3) + 8 – 40 = 76
i=1
Examples (10)
First four moments of a distribution about the value 1.5 of a variable are 1, 17, 10 and 40. Calculate its coefficient of variation
and first four moments about origin.
Examples (11)
The first four moments of a distribution about x = 2 are 1, 2.5, 5.5 and 16. Calculate the first four moments about the mean
and about the origin.
Solution
m1=1, m2= 2.5, m3 = 5.5 and m4 = 16
Now we have
m1 = 0
m2 = m2 - m12 = 2.5 – (1)2 = 1.5
m3 = m3 - 3 m2 m1 + 2 m1 = 5.5 – 3(2.5)(1) – 2 (1)3 = 0
m4 = m4 - 4 m3 m1 + 6 m2 m12 - 3 m14 = 16 – 4(5.5)(1) + 6(2.5)(1)2 – 3(1)4 = 6
Moments about origin are defined as:
fi xi1 fi xi2 fi xi3 fi xi4
m1 =   
n , m2 = n , m3 = n , m4 = n
we are given moments about x = 2
1 n 1 n
m1 = n  (xi -2 ) =n  xi – 2 = 3 – 2 =1
i=1 i=1
1 n 1 n
1 = n  xi - 2 n  xi = 2+1=3
i=1 i=1
1 n 1 n 1 n 4 n
m2 = n  (xi -2 )2 =n  (xi2 + 4 -4xi) =n  xi2 - n  xi + 4 = 2.5
i=1 i=1 i=1 i=1
1 n 2 1 n 2
n  xi – 4(3) + 4 = 2.5  n  xi = 2.5 + 12 - 4 = 10.5
i=1 i=1
1 n 1 n 6 n 12 n
m3 = n  (xi -2 )3 =n  xi3 - n  xi2 + n  xi – 8 = 5.5
i=1 i=1 i=1 i=1
1 n 3 1 n 3
n  x i - 6(10.5) + 12 (3) = 5.5 
n  xi = 40.5
i=1 i=1
1 n
m4 = n  (xi -2 )4 = 16
i=1
1 n 4 1 n 3 1 n 2 1 n
n  x i -8
n  x i + 24
n  x i - 32
n  xi + 16 = 16
i=1 i=1 i=1 i=1
1 n 4
n  xi - 8 (40.5) + 24 (10.5) - 32 (3) + 16 = 16
i=1
1 n 4
 n  xi = 168
i=1
(2) Skewness
If a curve is symmetrical, then the number of values deviating from mean values below the mean and above the mean are
the same. This is called the symmetry.
Skewness is the degree of asymmetry (departure from symmetry of a distribution)
In a symmetrical distribution, the mean, median and mode coincide.

If the frequency curve of a distribution has a longer tail to the right of the central maximum than to the left, the distribution
is said to be skewed to the right or to have positive skewness.
In positive skewed distribution, the mean exceeds the mode.
If the frequency curve of a distribution has a longer tail to the left of the central maximum than to the right, the distribution
is said to be skewed to the left or to have negative skewness.
In negative skewed distribution, the mean is smaller than the mode.

Karl Pearson investigated the following formula to measure the skewness:
mean  mode
Skewness = standard deviation
Led Bowley introduced the following measure of skewness

Q3 + Q1  2Q2
Quartile coefficient of skewness =
Q3  Q1
This measure is equal to zero when quartiles are equidistant from median. Then the distribution is symmetrical. It is positive
when the upper quartile is farther from the median than the lower quartile. Then the distribution is positive skewed. This measure is
negative when the lower quartile is farther from the median than the upper quartile.
Another measure of skewness is

m3
Moment coefficient of skewness = b1 = m 3/2
2
For a perfectly symmetrical curve, this measure is zero.
Kurtosis
Kurtosis is the degree of peakness of a distribution. A distribution having relatively high peak is called Lepto-Kurtic whereas a
distribution having flat topped is called Platy Kurtic. A frequency curve which is neither very high peaked nor vary flat topped is called
Meso-kurtic or a Normal curve having a Normal distribution.
Kurtosis is measured by the following formula:

m4
Moment coefficient of Kurtosis = b2 = m 2 = 6/(1.5)2 = 2.66 (curve is plety-Kurtic, i.e. a flat curve we have)
2
For a normal distribution, b2 = 3, for Lepto-Kurtic, b2 > 3 and for Meso-kurtic, b2 < 3.
Another measure of Kurtosis is:
Q.D
Percentile coefficient of Kurtosis = k =
P90  P10
Q3Q1
Where Q.D = quartile deviation = 2
Examples (12)
The second moments about the mean of two distributions are 9 and 16, while the fourth moments about mean are 230 and
780 resp. Which of the distribution is (i) Lepto-Kurtic (ii) Meso-Kurtic and (iii) Platy-Kurtic.
Examples (13)
The 4th moment about the mean of symmetrical distribution is 243. what would be the value of standard deviation in order
that the distribution may be normal.
solution
m4 = 243, for distribution to be normal b2 = 3
m4
Now b2 = m 2
2
243
3= m 2
2
243
m2 2 = 3 = 81 2 = m2 = 9  = 3.
Hence for symmetrical distribution, standard deviation = 3.

Examples (14)
The fourth mean moment of a symmetrical distribution is 243. What would be the value of the standard deviation in order
that the distribution may be meso-kurtic.
Solution
Since we know that for a distribution to be meso-kurtic, b2 equals 3

and we are given that m4 = 243, Therefore
m4 243
m22 = b2 = 3 i.e. m22 = 3
243
 m22 = 3 = 81
 m2 = 9
 Variance = 9 (because variance = m2)
Hence the desired value of the standard deviation = 9=3
Problems (Skewness and Kurtosis)

1) What can you say of skewness in each case of the following cases;
(i) The median is 26.01, while the two quartiles are 13.73 and 38.29.
(ii) Mean = 140 and mode = 148.7
(iii) Mean = 129.5 and median = 128.7
(iv) The first three moments about the value 16 are respectively  0.35, 2.9 and 1.93
2) The second moments about the mean of the two distributions are 8.5 and 22.5 while the third moments about the mean are
5.1 and  2.8. which of the distribution is symmetrical, positively skewed and negatively skewed.
3) Which of the following is correct in a positively skewed and negatively skewed distribution
(i) The arithmetic mean is greater than the mode.
(ii) The arithmetic mean is less than the mode.
(iii) The arithmetic mean is greater than the median.
(iv) The median is greater than the mode.
4) The length of stay on the cancer floor of Apolo Hospital were organized into a frequency distribution. The mean length of
stay was 28 days, the medial 25 days and modal length is 23 days. The standard deviation was computed to be 4.2
days.
Practice Problems: 4.24, 4.62, 4.63, 5.10, 5.11
Assignment (1) (Use of Microsoft Excel)

The following table shows a sample of 100 values of the splitting tensile strength (lb/in²) of concrete cylinders.
320 380 340 410 380 340 360 350 320 370 420 400
350 370 330 320 390 380 350 340 350 360 370 350
380 370 300 420 390 330 360 380 370 330 360 300
370 390 390 440 330 390 330 360 400 370 360 390
350 370 350 350 390 370 320 350 360 340 340 350
350 390 380 340 370 400 360 350 370 380 360 340
400 360 350 390 400 350 360 340 370 420 340 360
390 400 380 370 360 400 400 370 360 360 370 340
330 370 340 360
Construct a frequency table
Using commands, also find out minimum, maximum values, mean, mode, median, quartiles, deciles, percentiles, G.M, H.M,
variance, standard deviation, skewness and kurtosis.
Instructions:
First of all generate a data of 100 values using formula =RANDBETWEEN( 100, 500)
You will get a value in first box A1, drag this box downward upto A100
Enter the data in a column,
construct class intervals like 300-310, 310-320, …
construct frequency against each class with command:
=COUNTIF(A2:A101,">=300")-COUNTIF(A2:A101,">=310")
and total frequency: =SUM(G3:G17)
mean =AVERAGE(A2:A101)
mode =MODE(A2:A101)
Q1 =QUARTILE(A2:A101,1)
Median = Q2 =MEDIAN(A2:A101)
D7 =PERCENTILE(A2:A101,0.7)
P75 =PERCENTILE(A2:A101,0.75)
Min. value =MIN(A2:A101)
Max. Value =MAX(A2:A101)
G.M =GEOMEAN(A2:A101)
H.M =HARMEAN(A2:A101)
Skewness =SKEW(A2:A101)
Kurtosis =KURT(A2:A101)
Variance =VAR(A2:A101)
S.D =STDEV(A2:A101)

Constructing histograms and frequency distributions

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Constructing histograms and frequency distributions

Uploaded by

Copyright:

Available Formats

Handout 1 (Complete) (1)

Semester : Fall 2020, Subject Teacher:

Examples (1) (a line diagram or a bar chart)

Length (cm) 10-19 20-29 30-39 40-49 50-69 70-99 100-139

No. of metal rods (Frequency) 6 7 8 10 10 9 8

C.L C.B Freq. Class Width Proportional

Mode = the most frequent value in the data = 95

classes f C.B c.f

The Weighted Mean

Area % favored Number Surveyed

Model Number sold Cost

The Harmonic Mean

X2 X 2 167653 1803 2

Empirical Rule and Chebyshev’s Theorem

(2) Chebyshev’s Theorem

fi (xi  x)1

fi (xi  x)2

fi (xi  x)3

fi (xi  x)4

(ii) First four moments about origin or about x = 0 are:

(xi  0)2 xi2 xi2 xi

(xi  0)3 xi3

(xi  0)4 xi4

For group data;

fi (xi  a)4

First Four Moments about Mean in Terms of Moments about Origin:

In a symmetrical distribution, the mean, median and mode coincide.

In positive skewed distribution, the mean exceeds the mode.

In negative skewed distribution, the mean is smaller than the mode.

Led Bowley introduced the following measure of skewness

Another measure of skewness is

For a perfectly symmetrical curve, this measure is zero.

Kurtosis is measured by the following formula:

Hence for symmetrical distribution, standard deviation = 3.

Since we know that for a distribution to be meso-kurtic, b2 equals 3

Hence the desired value of the standard deviation = 9=3

Problems (Skewness and Kurtosis)

Assignment (1) (Use of Microsoft Excel)

You might also like