You are on page 1of 20

8/27/2011

STATISTICAL Chapter 3. DESCRIPTION OF DATA


Frequency Distributions Grouped Data Percentiles, Deciles & Quartiles Graphical Representations Symmetry and Skewness

Objectives
Set up a frequency distribution for a mass of data. Calculate the mean, median and mode mean for grouped data. Calculate and interpret other measures of location like the deciles, quartiles & percentiles.

Calculate the standard deviation, variance, mean deviation and quartile deviation for dd t grouped data. Construct histograms, bar charts, frequency polygons, pie charts and ogives. Describe a given set of data in terms of skewness and kurtosis.

Statistical data collected should be arranged in such a manner that will allow a reader to distinguish their essential features. Depending on the type and the objectives of the person presenting the information, data may be presented using one or a combination of three forms.

8/27/2011

Three Forms of Presenting Data

Tabular Form - data is presented in rows and columns

Frequency Distribution
When that data include a large number of observations, it is convenient to group the values into mutually exclusive classes and show the number of observations occurring in each class in a tabular form. A frequency distribution is the arrangement of data that shows the frequency of occurrence of values falling within arbitrarily defined ranges of the variable known as class intervals. The smallest and largest values that fall in a given interval are called class limits.

Se

Ju ne Ju ly Au gu pte s t m b O er c to No be ve r m De be ce r m be r

Textual Form data is presented in paragraph form especially when they are purely qualitative or when very few numbers are involved.

4000 3500

Ja n

bru

Fe

Graphical Form - data is presented in visual form

3000 2500 2000 1500 1000 500 0


ril ua ry ary arc Ap M ay h

1991 1992 1993 1994 1995

8/27/2011

Steps in Making a Frequency Distribution

Class Frequency and Class Mark


Class frequency refers to the

Find the range. Determine the interval size by dividing the


range by the desired number of classes which is normally not less than 10 and not more than 20. h Determine the class limits of the class intervals. Tabulation is facilitated if the lower class limits of the class intervals are multiples of the class size. The bottom interval must include the lowest score.

number of observations falling in a particular class while the midpoint between the upper and lower class limits is called class mark/midpoint.

Problem:

List the intervals, beginning at the


bottom. Tally the frequencies. frequencies Summarize these under a column labeled f. Total this column and record the number at the bottom.

Construct a frequency distribution of the given scores on a test.


56 28 42 56 47 39 62 60 54 47 78 82 55 56 41 44 54 42 62 48 62 38 57 55 50 47 42 56 68 53 37 72 65 66 52 52 48 48 42 68

8/27/2011

Solution:
Computing for the range:

We choose 5 because it is the odd number. If i = 5, lowest limit should be 25. We choose 25 because it is the smallest multiple of the chosen interval which is smaller than the smallest value in the set set. If lowest limit is 25, the bottom interval should be 29 25. The interval 29 - 25 contains the lowest score (28).

R = 82 28 = 54
Computing for the class interval: C ti f th l i t l

54 5 .4 10

Therefore, class interval may be 5 or 6.

Classes

Tally / / / //// //// /////// ////// ////// ////// ///

f 1 1 1 4 4 7 6 6 6 3 0

84 - 80 79 - 75 74 - 70 69 - 65 64 - 60 59 - 55 54 - 50 49 - 45 44 - 40 39 - 35 34 - 30 29 - 25

For Grouped Data ( > 30 values)

MEASURES OF CENTRAL TENDENCY

MEAN

Methods : 1. Midpoint Method 2. Short Method

N f 40

8/27/2011

Midpoint Method
After the f column, make another column and enter the midpoint (Xm) of each class. Multiply the frequency with the midpoint and enter it in the next column Label the column f Xm. Get the sum column. sum. Use the formula:

Short Method
Choose a class at or near the middle of the distribution to be designated as the origin. After the f column, construct the deviation column (d). Mark the chosen class zero. In succession, write -1, -2 and so on for classes lower in value than the origin. In like manner, write 1, 2, 3 and so on for classes greater in value than the origin. Construct f x d column and get the algebraic sum.

( fX
N

Problem:
Use the formula:
Classes f 4 7 12 10 9 6 2

x z

( fxd)alg
N

For the given frequency distribution, distribution compute for the mean using:

54-50 49 45 49-45 44-40 39-35 34-30 29-25 24-20

where z = midpt. of class chosen as origin

Midpoint Method Short Method

8/27/2011

Solution: Using Midpoint Method


Classes 54-50 49-45 44-40 44 40 39-35 34-30 29-25 24-20 f 4 7 12 10 9 6 2 N = 50 Xm 52 47 42 37 32 27 22 fXm 208 329 504 307 288 162 44
m

Using Short Method

( fX
N

Classes 54-50 49-45 44-40 44 40 39-35 34-30 29-25 24-20

f 4 7 12 10 9 6 2 N = 50

d 3 2 1 0 -1 -2 -3

fd 12 14 12 0 -9 -12 -6

x z

( fxd)
N
11 (5) 50

a lg

1905 50

x 37

x 38.1

x 38.1

fX

1905

fd 11

MEDIAN
Steps: N Find 2 Find the accumulated sum of the frequencies up to the sum that contains N 2

Use the formula:

(N cf ) Md L 2 i f
where L = lower limit of class which contains N/2 f = frequency of class containing N/2 cf = cumulative sum that approaches or is equal to N/2

8/27/2011

MODE
Rough Mode( R. Mo) - obtained by inspection and is equal to the p q Xm of class having the highest frequency. Theoretical Mode( T. Mo) 3Md 2x

Problem:

For the given frequency distribution in the previous problem, compute for the: Median R. Mode T. Mode

Solution: Computing for the Median


Classes 54-50 49-45 44-40 44 40 39-35 34-30 29-25 24-20 f 4 7 12 10 9 6 2 N = 50 27 17 8 2 cf

Computing for the Mode


Classes 54-50 49-45 44-40 44 40 39-35 34-30 29-25 24-20 f 4 7 12 10 9 6 2 N = 50

i=5

R. Mode = 42

(N cf ) Md L 2 i f (2517) Md 35 (5) 10 Md 39

N 50 25 2 2

T. Mode 3Md 2x
since

Md 39

x 38.1

T. Mode 3(39) 2(38.1)


T. Mode 40.8

8/27/2011

Other Measures of Position Quartiles Deciles Percentiles


Quartiles - those which divide the distribution into 4 parts

Qk L

( kN

4 f

cf )

Deciles - those which divide the distribution into 10 parts

Percentiles - those which divide the distribution into 100 parts

Dk L

( kN

10 f

cf ) i Pk L

( kN

100 f

cf ) i

8/27/2011

Problem:
For the given frequency distribution in the previous problem, compute for:

Solution: Computing for Q1


Classes 54-50 49-45 44-40 44 40 39-35 34-30 29-25 24-20 f 4 7 12 10 9 6 2 N = 50 17 8 2 cf

i=5
kN (1)50 12.5 4 4

Q1 D3 P88

Qk L

(kN cf ) 4 i f (12.5 8) Q1 30 (5) 9 Q1 32.5

Computing for D3
Classes 54-50 49-45 44-40 44 40 39-35 34-30 29-25 24-20 f 4 7 12 10 9 6 2 N = 50 17 8 2 cf

Computing for P88


i=5
kN (3)50 15 10 10
Classes 54-50 49-45 44-40 44 40 39-35 34-30 29-25 24-20 f 4 7 12 10 9 6 2 N = 50 46 39 27 17 8 2 cf

i=5
kN (88)50 44 100 100

(kN cf ) 10 Dk L i f (158) D3 30 (5) 9 D3 33.89

(kN cf ) 100 i f (44 39) P 45 (5) 88 7 P 48.57 88 P L k

8/27/2011

For Grouped Data ( > 30 values)

VARIANCE

MEASURES OF VARIATION

RANGE
The range is computed as the difference between the upper limit of the highest class interval and the lower limit of the lowest class interval.

f ( xm x ) 2 N

STANDARD DEVIATION

f (x

x) 2

MEAN DEVIATION
D

Problem:

f x

For the given frequency distribution, determine: variance standard deviation mean deviation quartile deviation

QUARTILE DEVIATION
Q Q3 Q1 2

10

8/27/2011

Solution: Computing for the Mean


Classes f 1 1 2 3 4 4 7 6 6 6 3 9

Xm 87 82 77 72 67 62 57 52 47 42 37 32

fXm 87 82 154 216 268 248


399 312 282 252 111

Classes 89-85 84-80 79-75 74-70 69-65 64-60

f 1 1 2 3 4 4

Classes 59-55 54-50 49-45 44-40 39-35 34-30

f 7 6 6 6 3 1

89-85 84-80 79-75 74-70 69-65 64-60 64 60 59-55 54-50 49-45 44-40 39-35 34-30

( fX
N

2443 44

x 55.5

N = 44

fX

32
m

2443

Computing for the Variance


Classes 89-85 84-80 79-75 74-70 69-65 64-60 59-55 54-50 49-45 44-40 39-35 34-30 f 1 1 2 3 4 4 7 6 6 6 3 1 xm
X

(xm - X )2 992.25 702.25 462.25 272.25 132.25 42.25 42 25 2.25 12.25 72.25 182.25 342.25 552.25

f(xm - X )2 992.25 702.25 924.50 816.75 529.00 169.00 169 00 15.75 73.50 433.50 1093.50 1026.75 552.25

31.5 26.5 21.5 16.5 11.5 6.5 65 1.5 -3.5 -8.5 -13.5 -18.5 -23.5

Computing for the Standard Deviation


2
f ( xm x ) N
2

Since

2 166.57

7329 44

2
166.57
12.906

2 166.57

N = 44

f (x

x) 2 7329

11

8/27/2011

Computing for the Mean Deviation


Classes 89-85 84-80 79-75 74-70 69-65 f 1 1 2 3 4 4
/xm X / 31.5 26.5 21.5 16.5 11.5 6.5 65 1.5 3.5 8.5 13.5 18.5 23.5

Computing for the Quartile Deviation


Classes 89-85 f 1 1 2 3 4 4 7 6 6 6 3 1 cf 44 43 42 40 37 33 29 22 16 10 4 1

f /x m -

31.5 26.5 43.0 49.5 46.0 26.0 26 0 10.5 21.0 51.0 81.0 55.5 23.5
m

Qk L

kN 4 cf i
f

f x

84-80
m

79-75 74-70 69-65 64-60 59-55 54-50 49-45 44-40 39-35 34-30

kN 1(44) 11 4 4

Q1 45

11 10 5 45.83
6

64-60 64 60 59-55 54-50 49-45 44-40 39-35 34-30

7 6 6 6 3 1

465 D 44

kN 3( 44) 33 4 4

D 10.6

Q3 60

33 335 60
4

N = 44

f x

x 465

N = 44

Q3 Q1 60 45.83 2 2 Q 7.085

BAR GRAPH Types of Graphs


The bar graph is particularly useful in presenting data gathered from discrete variables on a nominal scale It uses rectangles scale. or bars to represent discrete classes of data. The base of each bar corresponds to a class interval of the frequency distribution and the heights of the bars represent the frequencies associated with each class.

12

8/27/2011

HISTOGRAM
The histogram is similar to a bar chart but the bases of each bar are the class boundaries rather than class limits.

FREQUENCY POLYGON
A frequency polygon is a line q y p yg graph of class frequencies plotted against class marks.

Problem:
C lasses f

BAR GRAPH
15 Fr e q u e n c y 10 5 0 20-24 25-29 30-34 35-39 40-44 45-49 50-54 Class Marks

For the following frequency distribution, construct: bar graph histogram frequency polygon

54-50 49-45 44-40 39-35 34-30 29-25 24-20

4 7 12 10 9 6 2

13

8/27/2011

HISTOGRAM
15 Fr e q u e n cy 10 6 5 0 Class Boundaries 2 9 10 7 4 12

FREQUENCY POLYGON
15 Fr e q u e n cy 10 5 0 20-24 25-29 30-34 35-39 40-44 45-49 50-54 Classes

PIE CHART

Problem:
The following table classifies enrolment in a certain university. Construct a pie chart to show the enrolment distribution.

A pie chart is used to represent quantities that make up a whole.

Engineering Commerce Education Arts & Sciences Law

5280 3000 1800 1320 600

Engineering Commerce Education Arts & Sciences Law

14

8/27/2011

CUMULATIVE FREQUENCY CURVE

(Ogive Curve)

Problem:

An ogive curve is a line graph obtained by plotting values from the tabular arrangement b class i t t by l intervals whose l h frequencies are cumulated. From this curve, the centile rank of a certain score can be determined. A centile rank denotes the percentage of scores that fall below a specified score in a distribution.

Construct the ogive curve for the given frequency distribution. What score correspond to C50? C88? What is the centile rank of a score of 50?

Classes 64-60 59-55 54-50 49-45 44-40 39-35 34-30 29-25 24-20 19-15 14-10 9-5

f 2 12 20 32 46 58 64 58 42 23 15 4

cf 376 374 362 342 310 264 206 142 84 42 19 4

CP (cf/N x 100) 100.0 99.5 96.3


CP

120 100 80 60 40 20 0 0 9 14 19 24 29 34 39 44 49 UL 54 59 64

91.0 82.4 70.2 54.8 37.8 22.3 11.2 5.0 1.1

Ogiv e Curve
Score 50 = C91

C50 = 33

C88 = 48

N 376

15

8/27/2011

Kurtosis and Skewness


The measures of skewness and kurtosis indicate the extent of departure of a distribution from normal and permit comparison of two or more distributions.

KURTOSIS (ku)
Kurtosis refers to the flatness or peakedness of a frequency distribution. It shows the shape of the curve or the arrangement of a set of distribution in relation to the other set of distribution. The coefficient of kurtosis is given by:

ku

Q P90 P 10

Types of Kurtosis
leptokurtic (ku < 0.263) mesokurtic (ku = 0.263) platykurtic (ku > 0.263)

SKEWNESS (sk)
Skewness refers to the symmetry or
asymmetry of a frequency distribution. The coefficient of skewness is given by:

sk

3( x md ) s

16

8/27/2011

If sk = 0, the distribution is normal.

If sk < 0, the distribution is negatively skewed.

X Md Mo

X Md Mo
( Mo Md X )

If sk > 0, the distribution is positively skewed.

Problem:
For a certain frequency distribution, the ff. data are given:

s 13.7

Q3 155.8 P90 167.5 D1 128.8

md 147 Q1 138

x 147

Mo Md X

Determine the kurtosis and skewness of the distribution. Is it a normal distribution?

( X Md Mo)

17

8/27/2011

Solution:

Q3 Q1

Q 2 ku P90 D1 P90 P 10

sk
sk

3( x md ) s

155.8 138 2 ku 0.23 167.5 128.8


Distribution is leptokurtic.

3(147 147.25) 0.05 13.7

Distribution is negatively skewed.

Part I. Answer the following:

Student Activity

1. Define each of the following: a c ass a a. class mark c histogram c. stog a b. ogive d. frequency polygon 2. What advantages does each of the following forms of presenting data offer? a. textual b. tabular c. graphical

18

8/27/2011

Part II. Solve the following using Microsoft Excel Applications. 3. Distinguish between: a. class limits and class boundaries b. skewness and kurtosis 4. Give the class mark, the class boundaries and the interval size for each of the following: a. 10 19 b. 1.5 5.0 c. 12.85 13.43

The list below gives the weekly food budget and weekly incomes for 39 households.
1. 1 Construct frequency distribution table for food budget using i = 25 and determine:
a. mean b. median c. rough and theoretical mode d skewness

F ood B udget 1598 1680 1660 1583 1476 1633 1717 1596 1613 1607 1728 1672 1572 1634 1461 1726 1732 1620 1616 1579

W eekly In com e 1553 1740 1652 1581 1481 1634 1692 1561 1566 1626 1699 1685 1589 1571 1443 1712 1724 1628 1564 1526

F ood B u dget 1639 1655 1736 1587 1622 1689 1700 1613 1615 1458 1750 1700 1654 1625 1565 1563 1566 1587 1584

W eekly In com e 1636 1677 1761 1603 1605 1631 1765 1688 1667 1479 1747 1673 1641 1613 1521 1583 1542 1567 1610

2. Construct frequency distribution table for weekly income using i = 25 and determine:
a) b) c) d) standard deviation mean deviation quartile deviation kurtosis

3. Plot a bar chart for food budget and superimpose on it the frequency polygon for weekly income.

19

8/27/2011

4. Take the difference between weekly income and food budget for each household and construct a frequency distribution d di t ib ti and cumulative frequency l ti f distribution. 5. Plot the ogive curve for the data in (4). What score corresponds to a centile rank of 71?

ProceedtoTopic4 Proceed to Topic 4

20