1 Descriptive Analysis Complete

Fall 2020 ME 2018 (1)
Semester : Fall 2020, Subject Teacher:

5th Sem. Mechanical Engineering, 2018 MUHAMMAD NAEEM SANDHU
Cell# 03224577602
Subject : MA 242 Engineering Statistics Email: naeemuetlahore@hotmail.com
What is Statistics?
(1) Collection and Organization of Data:
(i) Graphically: through the use of charts and graphs
(ii) Numerically: through the use of tables of data
(2) Analysis of Data:
Once the data is organized, we can compute various quantities (called statistic or parameters) associated with the data.
(3) Interpretation of Data:
Once we have performed the analysis, we can use the information to make decision about the aggregate
Statistical techniques are used in
Marketing: -
Before a product is launched, various statistical techniques are used to analyze data on population, purchasing power, habits of the
customers, competitors, pricing etc.
Production: -
Statistical methods are used to carry out R&D programs for the improvement in the quality of the existing products and setting
quality control standards for new ones
Finance: -
A statistical study through correlation analysis of profits and dividends helps to predict and decide probable dividends for future
years.
Personnel: -
In the process of manpower planning, a personnel department makes statistical studies of wage rates, incentive plans, cost of living,
labor turnover rates, employment trends, accidents rates, performance appraisal and training and development programs.
The common probabilistic theories and tools like histogram, estimation of mean, variance, standard deviation, and probability
distribution functions are useful for all different disciplines of engineering.
Descriptive Statistics: Descriptive statistics includes statistical procedures that we use to describe the population. The data could be
collected from either a sample or a population, but the results help us organize and describe data. Descriptive statistics can only be
used to describe the group that is being studying. Frequency distributions, measures of central tendency (mean, median, and mode),
and graphs like pie charts and bar charts that describe the data are all examples of descriptive statistics.
Inferential Statistics: Inferential statistics is concerned with making predictions or inferences about a population from observations
and analysis of a sample. Regression analysis, test of hypothesis, significance, analysis of variance are the examples of inferential
statistics.
Example 1 : Thirty batteries were tested to determine how long they would last. The results, to the nearest minute, were recorded as:
423, 369, 387, 411, 393, 394, 371, 377, 389, 409, 392, 408, 431, 401, 363, 391, 405, 382, 400, 381, 399, 415, 428, 422, 396, 372, 410,
419, 386, 390 Construct a frequency distribution table. Also Construct a histogram and Ogive
Exercise 1 : The bureau of labor statistics has sampled 30 communities nationwide and compiled prices in each community at the
beginning and end of August in order to find out approximately how the Consumer Price Index has changed during August. The
percentage changes in prices for the 30 communities are as follows: Ref. Ex. 2.19 “Statistics for Management” 7th by Levin Rubin
0.7 0.4 0.3 0.2 0.1 0.1 0.3 0.7 0.0 0.4
0.1 0.5 0.2 0.3 1.0 0.3 0.0 0.2 0.5 0.1
0.5 0.3 0.1 0.5 0.4 0.0 0.2 0.3 0.5 0.4
Using the following four equal sized classes, starting from the minimum value as lower class limit.
.
Fall 2020 ME 2018 (2)
Week 2 Descriptive Statistics: Central Measures
Example 2 the weights of 40 male students at University are recorded to the nearest pound. Construct frequency distribution.
119 135 138 144 146 150 156 164 classes frequency Cf
119-127 3 3
125 135 140 144 147 150 157 165 127-135 2 5
126 135 140 145 147 152 158 168 135-143 10 15
143-151 12 27
128 136 142 145 148 153 161 173 151-159 6 33
160-168 4 37
132 138 142 146 149 154 163 176 169-177 3 40
.
Example 2 (Histogram with unequal classes)

A Company manufactures metal rods in different lengths. The table given below shows information of a day’s production of the company.
Length (cm) 10-19 20-29 30-39 40-49 50-69 70-99 100-139
No. of metal rods (Frequency) 6 7 8 10 10 9 8

The size of the first four intervals is equal but the sizes of 5th , 6 th
and the 7th are unequal.
In such cases we find proportional height for rectangular bars. So we construct table as follows:
And now construct histogram by taking class boundaries along x-axis and proportional height along y-axis.
C.L C.B Freq. Class Width Proportional

Height
10-19 9.5-19.5 6 1 6
20-29 19.5-29.5 7 1 7
30-39 29.5-39.5 8 1 8
40-49 39.5-49.5 10 1 10
50-69 49.5-69.5 10 2 5
70-99 69.5-99.5 9 3 3
100-139 99.5-139.5 8 4 2
Book: “Theory and Problems of Statistics” 4th Edition by Schaum’s Series; Practice Problems: 2.27, 2.28, 2.29
Measures of Location
Median
- If n is odd, then the median is the middle value.
- If n is even, the median is the average of the two middle values.
Mode
The mode is the value that is repeated most often in the data set.
e.g. The ages in years of the cars worked on by the Village Autohaus last week
5 6 3 6 11 7 9 10 2 4 10 6 2 `1 5. Mode in this case is 6
Examples (1)
A computing student received the following grades in subjects of his first semester 2007:
Y = [6; 7; 6; 8; 5; 7; 6; 9; 10; 6] Mode = 6
1,2,3,4,5,6,6,7,7 mode value is 6 and 7 called Bimodal
2,3,4,2,3,4,7,8 2,3,4, are the modes called Multimodal
2,3,4,5,6,7,8 no mode
2,2,3,3,4,4,5,5 no mode
Fall 2020 ME 2018 (3)
Exercise 1.
A semi-commercial test plant produced the following daily outputs in tonnes/day:
1.3 2.5 1.8 1.4 3.2 1.9 1.3 2.8 1.1 1.7
1.4 3.0 1.6 1.2 2.3 2.9 1.1 1.7 2.0 1.4
Find out the mode?
(ref . McCoursey Chap 4 )
Other Measures of Location
we will discuss here are quartiles, deciles and percentiles
Quartiles
Quartiles divide the distribution into four groups, separated by Q1, Q2, Q3. Note that Q1 is the same as the 25th percentile; Q2
is the same as the 50th percentile, or the median; Q 3 corresponds to the 75th percentile, as shown:
n
For Q1 we see that 4 is an integer or a non-integer
n n
If 4 is not an integer, then Q1 = [ 4 ] + 1 item in the data
th
 
n n n
If 4 is an integer, then Q1 = average of {4 th and(4 +1)th items}
2n 3n
Similarly for Q2 and Q3 we will check whether 4 and 4 is an integer or non-integer respectively, then we find the value of Q2 and Q3
same as we did in the case of Q1.
Deciles
Deciles divide the distribution into 10 groups, as shown. They are denoted by D1, D2, etc.
7n
For D7 we see that 10 is an integer or a non-integer
7n
If 10 is not an integer, then
7n
D7 = [ 10 ] + 1 item in the data
th
 
7n
If 10 is an integer, then
7n 7n
D7 = average of {10 th and(10 +1)th items}
2n 3n
Similarly for D2 and D3 we will check whether 10 and 10 is an integer or non-integer respectively, then we find the value of D2 and D3
same as we did in the case of D7.
Percentiles
Percentiles are position measures used in educational and health-related fields to indicate the position of an individual in a group.
Percentiles divide the data set into 100 equal groups.
Percentiles are symbolized by
P1, P2, P3, . . . , P99
and divide the distribution into 100 groups.
27n
For instance, For P27 we see that 100 is an integer or a non-integer
27n 27n
If 100 is not an integer, then P27 = [ 100 ] + 1 item in the data
th
 
27n 27n 27n
If 100 is an integer, then P27 = average of {100 th and(100 +1)th items}
Fall 2020 ME 2018 (4)
25n 30n
Similarly for P25 and P30 we will check whether 100 and 100 is an integer or non-integer respectively, then we find the value
of P25 and P30 same as we did in the case of P27.
Quartiles, Deciles and Percentiles with the help of Ogive. Graphically we can find out all the three quartiles as:
Similarly we can find out deciles and percentiles using ogive

Practice Problems, Chap 3 Solved problems: 3.19, 3.26, 3.35, 3.37, 3.40, 3.44, 3.45
The Weighted Mean
The weighted mean enables us to calculate an average that takes into account the importance of each value to the overall total.
The weighted mean of a variable X is obtained by multiplying each value by its corresponding weight and dividing the sum of the
products by the sum of the weights.
The formula for calculating the weighted average is
 (wxi)
xw =
w
Exercise
(1) An instructor gives four 1-hour exams and one final exam, which counts as two 1-hour exams. Find a student’s
grade if she received 62, 83, 97, and 90 on the 1-hour exams and 82 on the final exam. (Bluman Problem 3.31)
(2) An instructor grades exams, 20%; term paper, 30%; final exam, 50%. A student had grades of 83, 72, and 90,
respectively, for exams, term paper, and final exam. Find the student’s final average. Use the weighted mean.
(3) A salesperson drives 300 miles round trip at 30 miles per hour going to Chicago and 45 miles per hour returning
home. Find the average miles per hour. (Harmonic Mean)
Average speed = (30 + 45)/2 = 37.5 miles per hour this is wrong
2
H.M = 1 1 = 36 miles per hour
30+45
Verify: time taken on the way going; t1 = 300/30 = 10
Time taken on the way back; t2 = 300/45 = 6.67
total distance 600
Now speed = total time = 16.67 = 35.99  36 miles per hour
(4) A bus driver drives 50 miles to West Chester at 40 miles per hour and returned driving 25 miles per hour. Find the
average miles per hour. (Harmonic Mean)
(5) A carpenter buys $500 worth of nails at $50 per pound and $500 worth of nails at $10 per pound. Find the average
cost of 1 pound of nails.
(6) Bennett Distribution Company, a subsidiary of major appliance manufacturer, is forecasting regional sales for the
next year. The Atlantic branch, with current yearly sales of $193.8 million, is expected to achieve a sales growth of
7.25 percent; the Midwest branch, with current sales of $79.3 million, is expected to grow by 8.20 percent; and the
Pacific branch, with sales of $57.5 million, is expected to increase sales by 7.15 percent. What is the average rate of
sales growth forecasted for next year? (“Statistics for Management”, 7th Ed, by Richard Levin and David Rubin
Chap 3)
(7) A recent survey of a new diet cola reported the following percentages of people who liked the taste. Find the
weighted mean of the percentages.
Fall 2020 ME 2018 (5)
Area % favored Number Surveyed

1 40 1000
2 30 3000
3 50 800
(8) The costs of three models of helicopters are shown below. Find the weighted mean of the costs of the models
Model Number sold Cost

Sunscraper 9 $ 427,000
Skycoaster 6 $ 365,000
High-flyer 12 $ 725,000
The Harmonic Mean

This mean is useful for finding the average speed. Suppose a person drove 100 miles at 40 miles per hour and returned
40 + 50
deriving 50 miles per hour. The average miles per hour is not 2 = 45 miles per hour. Correct average is found as shown:
Since Time = distance / rate, then
100
Time 1 = 40 = 2.5 hours to make a trip and
100
Time 2 = 50 = 2 hours to return
Hence total time is 4.5 hours, and total miles driven are 200. Now the average speed is
distance 200
Rate = time = 4.5 = 44.44 miles per hour
This value can also be found by using the harmonic mean as
2
HM = 1/40 + 1/50 = 44.44
2
Definition (Harmonic Mean of the two values a and b, H.M = 1 1)
a+b
(1/xi) n
The harmonic mean is the reciprocal of the mean of the reciprocals. i.e H.M = Reciprocal of ( n ) =
(1/xi)
Examples (2)
Suppose a person drove 100 miles at 40 miles per hour and returned driving 50 miles per hour. The average miles per hour is
not 45 miles per hour, which is found by adding 40 and 50 and dividing by 2. The average is found as shown.
Since
Time = distance  rate, then
100
Time 1 = 40 = 2.5 hours to make the trip
100
Time 2 = 50 = 2 hours to return
Hence, the total time is 4.5 hours, and the total miles driven are 200. Now, the average speed is
distance 200
Rate = time = 4.5 = 44.44 miles per hour
This value can also be found by using the harmonic mean formula
2
HM = 1/40+1/50 = 44.44
Exercise 2.
Using the harmonic mean, find each of these.
(a). A salesperson drives 300 miles round trip at 30 miles per hour going to Chicago and 45 miles per hour returning home. Find the
average miles per hour.
(b). A bus driver drives the 50 miles to West Chester at 40 miles per hour and returns driving 25 miles per hour. Find the average
miles per hour.
(c). A carpenter buys $500 worth of nails at $50 per pound and $500 worth of nails at $10 per pound. Find the average cost of 1
pound of nails.
Fall 2020 ME 2018 (6)
Dispersion Measures
The main measures are
(1) Range = xmax - xmin = xn  x0
xn  x0
co-efficient of dispersion = x + x
n 0
(2) Inter-quartile Range
IQR = Q3  Q1
Q3  Q1
Q.D = 2 , called quartile deviation
Q3  Q1
co-efficient of quartile deviation = Q + Q
3 1
which is used for comparing the variation in two or more sets of data.

xi  x 
(3) Mean Deviation = M.D = n , also called absolute measure

 (xi  x ) 
n = 0 because  (xi  x ) = 0 as the sum of the deviations from mean is zero
(4) Population Variance and Standard Deviation
The variance is the mean of the squared deviations from mean values.
(xi )2
2 = N (for ungrouped data)
alternative formula,
Xi2 Xi
2 = N - ( N )2 (for ungrouped data)
The positive square root of the variance is called standard Deviation. Symbolically,
(xi)2
= N (for ungrouped data)
Note : S.D is always non-negative
The standard deviation is a very important concept that serves as a basic measure of variability. A smaller value of the
standard deviation indicates that most of the observations in the data are close to the mean while a larger value implies that the
observations are scattered widely about the mean.
It is an absolute measure of dispersion. Its relative measure called coefficient of standard deviation, is defined as
Standard Deviation
Coefficient of S.D. = Mean
Standard Deviation
Coefficient of variation = Mean  100
These measures are used for consistency, reliability and stability
(5) Sample Variance and Standard Deviation

In most cases, the expression (x - x)2/n does not give best estimate of the population variance. Therefore, instead of
dividing by n, find the variance of the sample by dividing by n  1, giving a slightly larger value and an unbiased estimate of the
population variance. The formula for the sample variance denoted by s2 , is

(xi x)2
s2 =
n1
and standard deviation of a sample (denoted by s) is
(xi)2
s=
n1

note : (i) x is an unbiased estimate of the population mean 
(ii) s2 is biased estimate of the population variance 2.
(iii) the division by n-1 is to make s2 an unbiased estimator of population parameter.
Examples (3)
The breaking strength of test pieces of a certain alloy is given as under
X: 95 103 97 130 96 73 78 95 89 68
82 79 69 67 83 108 94 87 93 117
Calculate the average breaking strength of the alloy and the standard deviation.
Fall 2020 ME 2018 (7)
X X2   X X2  
|X - X| (X - X)2 |X - X| (X - X)2
67 4489 23.15 535.92 93 8649 2.85 8.1225
68 4624 22.15 490.62 94 8836 3.85 14.823
69 4761 21.15 447.32 95 9025 4.85 23.522
73 5329 17.15 294.12 95 9025 4.85 23.522
78 6084 12.15 147.62 96 9216 5.85 34.222
79 6241 11.15 124.32 97 9409 6.85 46.922
82 6724 8.15 66.423 103 10609 12.85 165.12
83 6889 7.15 51.123 108 11664 17.85 318.62
87 7569 3.15 9.9225 117 13689 26.85 720.92
89 7921 1.15 1.3225 130 16900 39.85 1588
Total: 1803 167653 253 5112.6
X 1803
Mean = n = 20 = 90.15 (remember ( X)2   X2 )
X2 X 2 167653 1803 2

= n -( n ) = 20 - ( 20 ) = 8382.65 - 8127.0225 = 255.6275 = 15.99
 
xi  x  253  (xi  x )
Mean Deviation = M.D = n = 20 = remember n =0
Because the sum of the deviations from mean value is always zero

(xi x)2 5112
Varince = s2 = = 19 =
n1
Examples (4)
The mean of the number of sales of cars over a 3-month period is 87, and the standard deviation is 5. The mean of the commissions is
$5225, and the standard deviation is $773. Compare the variations of the two.
Solution
The coefficients of variation are
 5
C.V =   100 = 87  100 = 5.7 % (sales)
x
 773
C.V =   100 = 5225  100 = 14.8 % (commissions)
x
Since the coefficient of variation is larger for commissions, the commissions are more variable than the sales.
.
Fall 2020 ME 2018 (8)
Empirical Rule and Chebyshev’s Theorem

(1) Empirical Rule
when a distribution is bell-shaped (or what is called normal), the following statements, which make up the empirical rule, are true.
Approximately 68% of the data values will fall within 1 standard deviation of the mean.
Approximately 95% of the data values will fall within 2 standard deviations of the mean.
Approximately 99.7% of the data values will fall within 3 standard deviations of the mean.
Examples (5)
Heights of 18-year-old males have a bell-shaped distribution with mean 69.6 inches and standard deviation 1.4 inches.
(a) About what proportion of all such men are between 68.2 and 71 inches tall?
(b) What interval centered on the mean should contain about 95% of all such men?
Solution (a)

x − ks = 68.2  69.6 – k (1.4) = 68.2  k = 1

x + ks = 71  69.6 + k (1.4) = 71  k = 1

Hence 1-S.D interval about the mean x which contains approx. 68% of the data
Solution (b)

x − 2s = 69.6 – 2 (1.4) = 69.6 − 2(1.4) = 66.8

x + 2s = 69.6 – 2 (1.4) = 69.6 + 2(1.4) = 72.4
 
Since the interval from 68.2 to 71.0 has endpoints x ± s and x ± s,
by the Empirical Rule about 68% of all 18-year-old males should have heights in this range.

By the Empirical Rule the shortest such interval containing 95% of the data is x ± 2s. So the interval from
 
x − 2s = 69.6 − 2(1.4) = 66.8 to x + 2s = 69.6 + 2(1.4) = 72.4 contains 95% of the data values.
Alternatively Part (a) lets say we have the limits 68.2 and 72.4

x − ks = 68.2  69.6 – k (1.4) = 68.2  k = 1 corresponding percentage 68%

x + ks = 72.4  69.6 + k (1.4) = 72.4  k = 2 corresponding percentage 95%
Required %age = Average (68% and 95%) = 81.5% ans
Note : If value of k is different from 1 , 2 or 3, then %age will be calculated by (1−1/k 2)%
(2) Chebyshev’s Theorem

The Empirical Rule does not apply to all data sets, only to those that are bell-shaped, and even then is stated in terms of
approximations. A result that applies to every data set is known as Chebyshev’s Theorem.
For any numerical data set,

 at least (1−1/k2)% of the data lie within k standard deviations of the mean, that is, in the interval with endpoints x ± ks
 
i.e. (x - ks, x + ks) for samples and with endpoints μ ± kσ for populations, where k is any positive whole number that is
greater than 1. ( k > 1)
Fall 2020 ME 2018 (9)
Examples (6)

A sample of size n = 50 has mean x = 28 and standard deviation s = 3. Without knowing anything else about the sample, what
can be said about the number of observations that lie in the interval (22, 34)? What can be said about the number of observations that
lie outside that interval?
This means we are given .

x - ks = 22  k = 2

x + ks =34  k = 2
Then at least (1−1/k2)% = 1 – ¼) % = ¾% = 75% of 50 values = 37.5 = 38 values are contained in the given interval.
Therefore 12 observations fall outside the interval.
Exercise
The mean of a distribution is 20 and the standard deviation is 2. Use Chebyshev’s theorem.
a. At least what percentage of the values will fall between 10 and 30?
b. At least what percentage of the values will fall between 12 and 28? (Bluman ch. 3)
Exercise
The Energy Information Administration reported that the mean retail price per gallon of regular grade gasoline was $2.30
(Energy Information Administration, February 27, 2006). Suppose that the standard deviation was $.10 and that the retail price per
gallon has a bell shaped distribution.
a. What percentage of regular grade gasoline sold between $2.20 and $2.40 per gallon?
b. What percentage of regular grade gasoline sold between $2.20 and $2.50 per gallon?
c. What percentage of regular grade gasoline sold for more than $2.50 per gallon?
(prob. 3.30, Sweeny Chap 3 )
.
Fall 2020 ME 2018 (10)
Measures of Skewness and Kurtosis

(1) Skewness
Skewness is a measure of symmetry, or more precisely, the lack of symmetry. A distribution, or data set, is symmetric if it
looks the same to the left and right of the center point.
If a curve is symmetrical, then the number of values deviating from mean values below the mean and above the mean are
the same. This is called the symmetry.
Skewness is the degree of asymmetry (departure from symmetry of a distribution)
In a symmetrical distribution, the mean, median and mode coincide.

If the frequency curve of a distribution has a longer tail to the right of the central maximum than to the left, the distribution
is said to be skewed to the right or to have positive skewness.
Mean – mode = 3 ( mean – median )

In positive skewed distribution, the mean exceeds the mode.
If the frequency curve of a distribution has a longer tail to the left of the central maximum than to the right, the distribution
is said to be skewed to the left or to have negative skewness.
In negative skewed distribution, the mean is smaller than the mode.

Karl Pearson investigated the following formula to measure the skewness:
mean  mode
b1 = Skewness = standard deviation
b1 = 0 distribution is symmetrical
b1 > 0 distribution is positively skewed
b1 < 0 distribution is negatively skewed
Led Bowley introduced the following measure of skewness

Q3 + Q1  2Q2
b1 = Quartile coefficient of skewness =
Q3  Q1
Problems (Skewness)
1) What can you say of skewness in each case of the following cases;
(i) The median is 26.01, while the two quartiles are 13.73 and 38.29.
(ii) Mean = 140 and mode = 148.7
(iii) Mean = 129.5 and median = 128.7
2) Which of the following is correct in a positively skewed and negatively skewed distribution
(i) The arithmetic mean is greater than the mode.
(ii) The arithmetic mean is less than the mode.
(iii) The arithmetic mean is greater than the median.
(iv) The median is greater than the mode.
3) The length of stay on the cancer floor of Apolo Hospital were organized into a frequency distribution. The mean length of
stay was 28 days, the medial 25 days and modal length is 23 days. The standard deviation was computed to be 4.2 days.
Fall 2020 ME 2018 (11)
(2) Kurtosis
Kurtosis is a measure of whether the data are heavy-tailed or light-tailed relative to a normal distribution.
Kurtosis is the degree of peakness of a distribution. A distribution having relatively high peak is called Lepto-Kurtic whereas a
distribution having flat topped is called Platy Kurtic. A frequency curve which is neither very high peaked nor vary flat topped is called
Meso-kurtic or a Normal curve having a Normal distribution.
The kurtosis for a standard normal distribution is 3, for Lepto-Kurtic, b2 > 3 and for Meso-kurtic, b2 < 3.
Another measure of Kurtosis is:
Q.D
b2 = Percentile coefficient of Kurtosis = k =
P90  P10
Q3Q1
Where Q.D = quartile deviation = 2
Practice Problems: 4.24, 4.62, 4.63, 5.10, 5.11

1 Descriptive Analysis Complete

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

1 Descriptive Analysis Complete

Uploaded by

Copyright:

Available Formats

Fall 2020 ME 2018 (1)

Semester : Fall 2020, Subject Teacher:

Example 2 (Histogram with unequal classes)

Length (cm) 10-19 20-29 30-39 40-49 50-69 70-99 100-139

No. of metal rods (Frequency) 6 7 8 10 10 9 8

C.L C.B Freq. Class Width Proportional

Similarly we can find out deciles and percentiles using ogive

Area % favored Number Surveyed

Model Number sold Cost

The Harmonic Mean

X2 X 2 167653 1803 2

Empirical Rule and Chebyshev’s Theorem

(2) Chebyshev’s Theorem

Measures of Skewness and Kurtosis

In a symmetrical distribution, the mean, median and mode coincide.

Mean – mode = 3 ( mean – median )

In negative skewed distribution, the mean is smaller than the mode.

Led Bowley introduced the following measure of skewness

You might also like