You are on page 1of 27

Descriptive Statistic

Introduction
Descriptive statistics are used to describe the basic features of the data in a
study. They provide simple summaries about the sample and the measures. Together
with simple graphics analysis, they form the basis of virtually every quantitative analysis
of data.
Descriptive Statistics are used to present quantitative descriptions in a
manageable form. In a research study we may have lots of measures. Or we may
measure a large number of people on any measure. Descriptive statistics help us to
simplify large amounts of data in a sensible way. Each descriptive statistic reduces lots
of data into a simpler summary
Some of the common measurements in descriptive statistics are central tendency
and others the variability of the dataset.

Objectives:
1. Identify the different types of descriptive statistics
2. Describe the importance of descriptive statistics and its application in education
3. Utilize various data management tools to process and manage quantitative
data
4. Interpret data based on the result of computation.

Discussion:
A. Frequency Measurement
The easiest method of organizing data is a frequency distribution, which converts
raw data into a meaningful pattern for statistical analysis.
The following are the steps of constructing a frequency distribution:
1. Specify the number of class intervals. A class is a group (category) of interest.
No totally accepted rule tells us how many intervals are to be used. Between 5
and 15 class intervals are generally recommended. Note that the classes must
be both mutually exclusive and all-inclusive. Mutually exclusive means the
classes must be selected such that an item can’t fall into two classes, and all-
inclusive classes are classes that together contain all the data.
2. When all are to be the same width, the following rule may be used to find the
required class interval width:

W = (L - S) / N
Where:
W = Class width
L = the largest data,
S = the smallest data,
N = number of classes

Example:
Suppose the ages of a sample of 10 students are:
20.9, 18.1, 18.5, 21.3, 19.4, 25.3, 22.0, 23.1, 23.9 and 22.5
We select N = 4 and W = (25.3 – 18.1) = 1.8 which is rounded up to 2. The frequency
table is as follows:

18 – 20 ………………………….. 3 …………………………………. 30%


20 – 22 ………………………….. 2 …………………………………. 20%
22 – 24 ………………………….. 4 …………………………………. 340%
24 – 26 ………………………….. 1 …………………………………. 10%

Note: The sum of all the relative frequency must always be equal to 1.00 or 100%. In
the above example, we see that 40% of all students are younger than 24 years old, but
older than 22 years old. Relative frequency may be determined for both quantitative and
qualitative data and is a convenient basis for the comparison of similar groups of
different size.
What Frequency Distribution Tells Us?
1. It shows how the observation cluster around a central value; and
2. It shows the degree of difference between observations.

For example, in the above problem we know that no student is younger than 18 and
the age below 24 is most typical. The most common age is between 22 and 24 , which
from general information we know to be higher than usual for the students who enter
college right after high school and graduate about age 22. The students in the sample
are generally older. It is possible that the population is made up of night students who
work on their degrees on a part-time basis while holding full-time jobs.
This descriptive analysis provides us with an image of the student sample, which is
not available from raw data.
Illustration:
Consider the following set of data, which are the scores recorded for 30 participants.
We wish to summarize this date by creating a frequency distribution of the scores.
Data Set – Scores Recorded for 30 Participants
50 45 49 50 43
49 50 49 45 49
47 47 44 51 51
44 47 46 50 44
51 49 43 43 49
45 46 45 51 46

To create a frequency distribution from this data we proceed as follows:


1. Identify the highest and lowest values in the date set. For our scores the highest
score is 51 and the lowest is 43.
2. Create a column with the title of the variable we are using, in this case score.
Enter the highest score at the top, and include all values within the range from
the highest score to the lowest score.
3. Create a tally column to keep track of the scores as you enter them into the
frequency distribution. Once the frequency distribution is completed you can omit
this column. Most printed frequency distributions do not retain the tally column in
their final form.
4. Create a frequency column, with the frequency of each value, as shown in the
tally column, recorded.
5. At the bottom of the frequency column record the total frequency for the
distribution proceeded by N=
6. Enter the name of the frequency distribution at the top of the table.
If we applied these steps to the data, we would have the following frequency
distribution.

Frequency Distribution Scores Recorded for 30 Participants


Scores Recorded Tally Frequency
51 //// 4
50 //// 4
49 ////// 6
48 0
47 /// 3
46 /// 3
45 //// 4
44 /// 3
43 /// 3
N= 30

Cumulative Frequency Distribution


A cumulative frequency distribution can be created from a frequency distribution by
adding an additional column called "Cumulative Frequency". For each score value the
cumulative frequency for that score value is the frequency up to and including the
frequency for that value.
In the cumulative frequency distribution for the high scores data below, notice that the
cumulative frequency for the lowest score (43) is 3, and the cumulative frequency for the
score 44 is 3+3 or 6. The cumulative frequency for a given value can also be obtained
by adding the frequency for the value to the cumulative value for the value below the
given value.
Example: The cumulative frequency for 45 is 10 which is the cumulative frequency for
44 (6) plus the frequency for 45 (4). Finally, notice that the cumulative frequency for the
highest value (51 in the current case) should be the same as the total of the frequency
column (30 in the case of the score data).

Cumulative Frequency Distribution Scores Recorded for 30 Participants


Scores Recorded Tally Frequency Cumulative
Frequency
51 //// 4 30
50 //// 4 26
49 ////// 6 22
48 0 16
47 /// 3 16
46 /// 3 13
45 //// 4 10
44 /// 3 6
43 /// 3 3
N= 30

In summary then, to create a cumulative frequency distribution:


1. Create a frequency distribution
2. Add a column entitled cumulative frequency
3. The cumulative frequency for each score is the frequency up to and including the
frequency for that score
4. The highest cumulative frequency should equal N (the total of the frequency
column)

Grouped Frequency Distribution


In some cases, it is necessary to group the values of the data to summarize the data
properly.
Guidelines for Classes
1. There should be between 5 and 20 classes.
2. The class width should be an odd number.
3. The classes must be mutually exclusive. This means that no data value can fail
into two different classes.
4. The classes must be all inclusive or exhaustive. This means that all data values
must be included.
5. The classes must be continuous.
6. The classes must be equal in width.
Creating a Grouped Frequency Distribution
1. Find the largest and smallest values.
2. Compute the Range = Highest Score – Lowest Score
3. Select the number of classes desired. This is usually between 5 and 20.
4. Find the class width by dividing the range by the number of classes and rounding
up.
5. Pick a suitable starting point less than or equal to the minimum value. You will be
able to cover: "the class width times the number of classes" values. You need to
cover one more value than the range. Follow this rule and you’ll be okay: The
starting point plus the number of classes times the class width must be greater
than the maximum value. Your starting point is the lower limit of the first class.
Continue to add the class width to this lower limit to get the rest of the lower
limits.
6. To find the upper limit of the first class, subtract one from the lower limit of the
second class. Then continue to add the class width to this upper limit to find the
rest of the upper limits.
7. Find the boundaries by subtracting 0.5 units from the lower limits and adding 0.5
units from the upper limits. The boundaries are also halfway between the upper
limit of one class and the lower limit of the next class.
8. Tally the data.
9. Find the frequencies.
10. Find the cumulative frequencies. Depending on what you’re trying to accomplish,
it may not be necessary to find the cumulative frequencies.
11. If necessary, find the relative frequencies and/or relative cumulative frequencies.
Illustrations:
Look at the following data set. The highest score is 59 and the lowest is 39. If
we were to create a simple frequency distribution of this data we would have 21 values.
This is greater than 20 values so we should create a grouped frequency distribution.
Data Set – Scores in the Major Examination
(Statistics)
57 39 52 52 43
50 53 42 58 55
58 50 53 50 49
45 49 51 44 54
49 57 55 59 45
50 45 51 54 58
53 49 52 51 41
52 40 44 49 45
43 47 47 43 51
55 55 46 54 41

If w

e use this data and follow the suggestions for creation of a grouped frequency
distribution, we would create the following grouped frequency distribution.
Grouped Frequency Distribution for Scores in the Major Examination
(Statistics)
Class Interval Tally Interval Midpoint Frequency
57 - 59 ////// 58 6
54 - 56 /////// 55 7
51 - 53 /////////// 52 11
48 - 50 ///////// 49 9
45 - 47 /////// 46 7
42 – 44 ////// 43 6
39 – 41 //// 40 4
N= 50

Cumulative Grouped Frequency Distribution


It is a simple matter to create a cumulative grouped frequency distribution. We just add
a cumulative frequency column to the grouped frequency distribution and we have a
cumulative grouped frequency distribution. Adding a cumulative frequency column
created the cumulative grouped frequency distribution below.
Cumulative Grouped Frequency Distribution for Scores in the Major
Examination
(Statistics)
Class Tally Interval Frequency Cumulative Frequency
Interval Midpoint
57 - 59 ////// 58 6 50
54 - 56 /////// 55 7 44
51 - 53 /////////// 52 11 37
48 - 50 ///////// 49 9 26
45 - 47 /////// 46 7 17
42 – 44 ////// 43 6 10
39 – 41 //// 40 4 4
N= 50

B. Measure of Central Tendency

CENTRAL TENDENCY

• Help you find the middle, or the average, of a data set. The 3 most
common measures of central tendency are the mode, median, and mean.
• In statistics, called a center or location of the distribution. Colloquially,
measures of central tendency are often called averages
C. Measurement of Variability

Variability-indicates how spread out the score are


-When there are large differences among scores the data are said to
contain a lot of variability

Common measures of variability The


Range
The Interquartile Range (IQR)
The Variance
The Standard Deviation

The Range
The range is the difference between the largest and smallest values in a
set of values

EXAMPLE;

Consider the following numbers:


1, 3, 4, 5, 5, 6, 7, 11

For this set of numbers, the range would be


11 - 1 or 10

EXAMPLE 2: RANGE IN GROUP DATA

Range=Upper Boundary
SCORES FREQUENCY(f) Highest interval
22-24 5 -lower boundary
19-21 6 lowest interval
16-18 7

13-15 8

10-12 4 Range = (24+.5)-(10-.5)=14

The Interquartile Range (IQR)

is a measure of variability, based on dividing a data set into quartiles


Quartiles divide a rank-ordered data set into four equal parts. The values that divide
each part are called the first, second, and third quartiles; and they are denoted by Q1,
Q2, and Q3, respectively.
Q1 is the "middle" value in the first half of the rank-ordered data set.
Q2 is the median value in the set.
Q3 is the "middle" value in the second half of the rank-ordered data set.

The data between q1 and q3 is called interquartile range

Interquartile Range

Lowest 1st quartilte Median Third quarter Highest


data
Q1 Q2 Q3 data

25% 50% 78%

Example 1:
2 3 1 4 6 8 9 10 12 3 4
Number: 11
Find the position of Q2?

First: Arrange the data in ascending order


1 2 3 3 4 4 6 8 9 10 12

Second: Use Formula for Q2

position:(n+1)50%
Q2=(11+1)x.50=6
Q2 is 4

1 2 3 3 4 4 6 8 9 10 12

Look for Q1 & Q2

Position: Q1=(n+1)x25% =(11+1)x.25=3


Q3=(n+1)x75% =(11+1)x.75=9 Find
the IQR
IQR =Q3-Q1
IQR =9-3=6
IQR is also called central half of the data

EXAMPLE 2:

1 2 3 3 4 5 6 6 7 8 8 9

Number: 12
Find Q1, Q2, Q3 and IQR?
Find Position: Q1=(n+1)x25% Q3=(n+1)x75%
Q1=(12+1)x.25
Q1=13x.25
Q1=3.25 Q3=(12+1)x75%
Q1=3+3=6/2=3 Q3=13x.75
Q3=9.75
Q2=(n+1)x50%
Q2=(12+1)x.50 IQR=Q3-Q1
Q2=13x.50 IQR=7.5-3=4.5
Q2=6.5
Q2=5+6=11/2=5.5

HOW TO FIND IQR IN GROUP DATA

Score frequency F
(cumulative)

10-12 4 4
13-15 8 12
16-18 7 19
19-21 6 25
22-24 5 30

Formula:
Q1=LQ1+(∑f/4-FQ1-1).CQ1
fQ1

Step 1: calculate
∑f/4=30/4=7.5
Step 2: look for the value that exceed ∑f
Therefore first quartile class is 13-15

Formula: LQ1: 13
Q1=LQ1+(∑f/4-FQ1-1).CQ1 Cq1: 3
fQ1 fQ1: 8
LQ1: Lower bound of the first quartile FQ1-1: 4 Cq1: width of
the 1st quartile class fQ1: actual frequency of the first
quartile Q1=13+(7.5-4/8)x3 FQ1-1: cumulative frequency of
the class Q1=13+(.4375)x3 before 1st quartile class
Q1=13+1.3125
Q1=14.3125
Q3=LQ3+(3∑f/4-FQ3-1).CQ3 First: calculate =3∑f/4=90/4=22.5
FQ3
score frequency F(cumulative) LQ3: 19 Cq3: 3 fQ3:
10-12 4 4 6 FQ3-1: 19
13-15 8 12
16-18 7 19
Q3=19+(22.5-19/6)x3
19-21 6 25
Q3=19+(.58)x3
22-24 5 30
Q3=19+1.74
Q3=20.74

IQR=Q3-Q1
IQR =20.74-14.31=

VARIANCE
-Is the mean of the square of the deviations from the mean of a frequency distribution.
-For large quantities, the variance is computed using frequency and midpoint value for
each interval, the deviation and its square, and the product of the frequency and the
squared deviation.
Variance: Where:
𝜎2=𝛴𝑓(X-𝑥 ̅)2 f= class frequency
𝛴𝑓−1 X=class Mark
x ̅ =Class Mean
𝛴𝑓=total number of frequency
EXAMPLE:
Variance = 450/ 29=15.52 SD=√variance
√15.52=3.93
D. Measures of Position (Quartile, Decile and Percentile)
Quartiles (Qk) – are the score points divided the distribution into four equal parts.
Each observation has 4 quartiles and are denoted by Q 1, Q2, …Q4. Deciles (Dk) – are
the score points which divides the distribution into 10 equal parts.
Each observation has 10 deciles and are denoted by D 1, D2, … D10.
Percentiles (Pk) - are the score points which divides the distribution into 100 equal
parts. Each observation has 99 percentiles and are denoted by P 1, P2, … P99.

Relationship among Percentile, Decile, and Quartile

• P10 = D1
• P20 = D2
• P25 = Q1
• P50 = D5 = Q2
• P75 = Q3
• P90 = D9

Quartiles, Deciles, and Percentiles of Ungrouped Data


Table 1: Tabular representation of decile, percentile, and quartile.
DECILE PERCENTILE QUARTILE
D1 P10

D2 P20 Q1

D3 P30

D4 P40
Q2
D5 P50

D6 P60

D7 P70 Q3
D8 P80

Q4
D9 P90

D10 P100

FORMULA:
For quartile:

Qk (n + 1) = a.b
V = Xa + .b(Xa+1 – Xa) For
decile:

Dk (n + 1) = a.b
V = Xa + .b(Xa+1 – Xa) For
percentile

Pk (n + 1) = a.b
V = Xa + .b(Xa+1 – Xa)

Where:
Qk = position of nth quartile
Dk = position of nth decile Pk = position of nth
percentile n = total number of terms a = the
integer of the result from the first equation .b =
the decimal from the preceding result
V = the identified number of Qk, Dk, or Pkth position
Xa = the identified number of ath position

Example:

Find Q1, D7, and P78 from the data below.


23, 15, 16, 30, 15, 12, 19, 27, 11, 20,
24, 16, 25, 17, 23, 26, 18, 13, 28, 21

1st step: arrange the data from lowest to highest.


11, 12, 13, 15, 15, 16, 16, 17, 18, 19,
20, 21, 23, 23, 24, 25, 26, 27, 28, 30

For Q1:

Qk (n + 1)

Q (20 + 1)
= .25(20 + 1)
= 5.25th

Thus, a = 5 and .b = .25

Solving for V;

V = Xa + .b(Xa+1 – Xa)

= X5 + .25(X6 – X5)
= 15 + .25(16 – 15)
= 15 + .25
= 15.25
Hence, the 25% of the distribution are below 15.25 (with the reference of 5 th term since
X5 = 15).

For D7:

Dk

D (n + 1)
= .70(20 + 1)
= 14.7th

Thus, a = 14 and .b = .7

Solving for V;

V = Xa + .b(Xa+1 – Xa)

= X14 + .7(X15 – X14)


= 23 + .7(24 – 23)
= 23 + .7
= 23.7
Hence, the 70% of the distribution are below 23.7 (with the reference of 14 th term since
X14 = 23).

For P78:

Pk (n + 1)

P
= .78(20 + 1)
= 16.38th

Thus, a = 16 and .b = .38

Solving for V;

V = Xa + .b(Xa+1 – Xa)

= X16 + .38(X17 – X16)


= 25 + .38(26 – 25)
= 25 + .38
= 25.38
Hence, the 78% of the distribution are below 25.38

Measure of Position for Grouped Data

Example:
The test scores of 27 Grade 10 Students in Mathematics. Find the Q 2, D8 and P35
Score Frequen Cumulative Lower
s cy (f) Frequency Boundary
(cf) (L)
30-34 3
25-29 9
20-24 8
15-19 5
10-14 2
n=27
Solving for Q2
1. Determine the cumulative frequencies (copied the frequency of the lowest class
interval and add the frequencies of the next class intervals)
Scores Frequency Cumulative Lower
(f) Frequency Boundary
(cf) (L)
30-34 3 27
25-29 9 24
20-24 8 15
2. Determine 15-19 5 7 the lower
boundaries 10-14 2 2 (subtract 0.5 to the
smallest n=27 number per class
interval)
Scores Frequency Cumulative Lower
(f) Frequency Boundary
(cf) (L)
30-34 3 27 29.5
25-29 9 24 24.5
20-24 8 15 19.5
15-19 5 7 14.5
10-14 2 2 9.5
n=27

3. Calculate to determine the location of the class limit/quartile class by dividing the
total frequencies (n) and 2 for the 2nd quartile Q2= 27/2= 13.5
4. Locate the value of the Q2 class which is 13. 5 in the Cumulative Frequency (cf),
if 13.5 could not be found in the cumulative frequency, look for value higher than
13. 5, which is 15.
5. Look for the value of the lower boundary
6. Look for the value of cumulative frequency from below (cfb)
7. Look for the value of the class interval (i)
8. Look for the value of the frequency of the Q2 class
9. Then solve for Q2 using the formula

Therefore: 50% of the class got a score lower than or equal to 23.56 and 50% got a score
higher than 23.56. (Note that Q2=P50=D5=Median)

Solving for D8
1. Calculate to determine the location of the class limit/decile class by multiplying
the total frequencies (n) by 8, and divide by 10 D8= (8*27)/10= 21.6
2. Locate the value of the D8 class which is 21.6 in the Cumulative Frequency (cf), if
21.6 could not be found in the cumulative frequency, look for value higher than
21.
6, which is 24.
3. Look for the value of the lower boundary
4. Look for the value of cumulative frequency from below (cfb)
5. Look for the value of the class interval (i)
6. Look for the value of the frequency of the D8 class
7. Then solve for D8 using the formula

Therefore: 80% of the class got a score lower than or equal to 28.17 and 20% got a score
higher than 28.17.

Solving for P35


1. Calculate to determine the location of the class limit/percentile class by
multiplying the total frequencies (n) by 35, and divide by 100 P35= (35*27)/100=
9.45
2. Locate the value of the P35 class which is 9.45 in the Cumulative Frequency (cf),
if 9.45 could not be found in the cumulative frequency, look for value higher than
9.45, which is 15.
3. Look for the value of the lower boundary
4. Look for the value of cumulative frequency from below (cfb)
5. Look for the value of the class interval (i)
6. Look for the value of the frequency of the P35 class
7. Then solve for P35 using the formula
Therefore: 35% of the class got a score lower than or equal to 21.03 and 65% got a score
higher than 21.03.

Summary
Descriptive Statistics are very important because it helps facilitate data visualization,
if we simply presented our raw data it would be hard to visualize what the data was
showing, especially if there was a lot of it. It allows for data be presented in a
meaningful and understandable way, which in turn, allows for a simplified interpretation
of the data set in question. Descriptive statistics are used to describe the basic features
of the data in a study. They provide simple summaries about the sample and the
measures. Together with the simple graphic analysis, they form the basis of virtually
every quantitative analysis of data.
There are four major types of Descriptive Statistics: Frequency Distribution,
Measure of Central Tendency, and Measure of Dispersion or Variation. And Measure of
Position Frequency distribution is normally presented in a table or graph is accompanied

by the count of frequency of the values occurrences’ in an interval, range or specific


group. Common charts and graphs used in frequency distribution presentation and
visualization include bar chart, histograms, pie chart, and line chart. The Mean, Median,
and Mode are the Measures of Central Tendency. The Mean, considered the most
popular measure of central tendency, is the average or most common value in a data
set. The Median refers to the middle score for a data set in ascending order. The Mode
refers to score or value that is most frequent in data set. A Measure of Dispersion or
Variation is a summary statistic reflecting the degree of dispersion in a sample. Range:
the different between the highest and lowest values. Interquartile Range: the range of
the middle half of a distribution. Standard Deviation: average distance from the mean.
Variance: average of squared distance from the mean. Measure of Position give a range
where a certain percentage of the data fall. The measures we consider her are
percentiles and quartiles. Assume that the elements in a data set are rank ordered from
the smallest to the largest.
The values that divide a rank- ordered set of elements into 100 equal parts are called
Percentiles. Quartiles divide a rank- ordered data set into four equal parts. The values
that divide each part are called the first, second, and the third quartiles; and they are
denoted by Q1, Q2 and Q3 respectively.

Assessment:
1. Which of the following provides a measure of central location for data?
a. Standard deviation
b. Mean
c. Variance
d. Range

2. A numerical value used as a summary measure for a sample, such as sample


mean, is known as a_____.
a. Population parameter
b. Sample parameter
c. Sample statistic
d. Population mean

3. Which of the following is a measure of dispersion?


a. Percentiles
b. Quartiles
c. Interquartile range
d. All of the above are measure of dispersion

4. If two groups of number have the same mean, then ____.


a. Their standard deviation must be equal.
b. Their medians must be also be equal
c. Their mode must be also equal
d. None of these alternatives is correct

5. The difference between the largest and the smallest data values is the ____. a.
Variance
b. Interquartile range
c. Range
d. Coefficient of variation

References:

• https://conjointly.com/kb/descriptive-statistics/
• Bueno, D. (2016). Introduction to Statistics (CONCEPTS AND APPLICATION
IN RESEARCH). Quezon City:Great Books Trading

• https://www.scrbbr.com

• https://www.Slideshare

• https//www.scribbr.com >statistics

• https//onlinestatbook.com>variability

• https//latrobe.libguides.com>maths

• https//stattrek.com>variability

• https://www.youtube.com/watch?v=2hAmtEFL9Jo

Prepared by:
Group 2
Abinales, Susan Abobo, Ma. Joy
R.
Agda, Benjie
Dacalos, Rochelle Gesite, Annabelle
C. Projimo, Charitess Turla, Arlene N.
Uy, Hector B.

You might also like