You are on page 1of 15

Descriptive Statistics

Statistics- is the mathematical science that involves the collection, presentation, analysis and interpretation of
data and the methods utilized therein.
Statistical Methods- are the procedures used in collecting, summarizing, and analyzing data.
Two Major Areas of Statistics
1. Descriptive Statistics- comprises statistical methods dealing with collection, tabulation and
summarization of data, so as to present meaningful information. This method can either be graphical
or computational.
2. Inferential Statistics- the technique by which decisions about a statistical population are made based only
on a sample having been observed or a judgment having been obtained. This kind of statistics is
concerned more with generalizing information or making inference about population.
Population- is the totality of the observations with which a statistician is concerned. The observation could
refer to anything of interest, such as persons, animals or object.
Size of the population – is defined as the number of observations in the population.
Sample- is a subset of a population.

Two Types of Samples


1. Biased Samples- data collected from the sample that is not representative of a population.
2. Random Samples or Unbiased Samples- the selection of sample in such a way that each sample of a
given size has precisely the same probability of being selected.
Data- the statistical facts, historical facts, principles, opinions and items of various sources like scores, ages,
I.Q., income, etc.
Ungrouped Data- that data which have not been organized or classified and usually exhibit no pattern.
Grouped Data- are data which have been organized and summarized.

Kinds of Data
1. Continuous Data- this arise from measurement of a continuous variable.
Examples: weights of 100 students, school achievement, I.Q., heights of children.
2. Discrete Data- this is characterized by gaps for which no real values may be obtained. They are made up of
items that values of which have been obtained by counting.
Examples: school enrollment, number of pencils, etc.
Variable- it is a characteristic or phenomena, which may take on different values.
Examples: weight, I.Q., and sex.
Steps in a Statistical Investigation:
1. Collection
2. Presentation
3. Analysis
4. Interpretation
Methods of Collection of Data
1. Direct or interview method. This is a personal communication with the individual you want to interview.
2. Indirect or questionnaires method. This is done by sending questionnaires to the person from whom
you would like to get the information.
3. Registration. Utilizing existing records is registration.
4. Observation. This can be done directly or indirectly.
5. Experiment. This is done by making or conducting scientific inquiry.
Methods of Presentation of Data
1. Frequency Distribution
2. Graph
3. Figure
4. Chart
PRESENTATION OF DATA
1
I. Frequency Distribution
Definition of terms

Frequency Distribution- the tabulation of the scores or measures group with class intervals
Raw Data- are collected data which have not been organized numerically.
Array- is an arrangement of raw numerical data in ascending or descending order.
Class frequency- this refers to the number of observations belonging to a class interval for the
number of items within a category.
Class interval- this is the grouping or category defined by a lower limit and an upper limit.
Class limit- are the limits of the class interval. (Lower and Upper Class Limit)
Class boundaries or true limits- they are more precise expressions of the class limits by at least 0.5
of their value. It is situated between the upper class limit of one interval and the lower class limit
of the next interval.
Class mark or midpoint- this is the midpoint of the class interval and is obtained by adding the
lower and upper limits and dividing by 2.
Class size or width- this is the width of class interval and is the difference between the lower and
upper class boundaries.

Example: The frequency distribution for the final grades in Eng’g Statistics.
No. of Class Class Interval Class Frequency Class Boundary Class Mark
1 70-74 4 69.5-74.5 72
2 75-79 8 74.5-79.5 77
3 80-84 12 79.5-84.5 82
4 85-89 10 84.5-89.5 87
5 90-94 6 89.5-94.5 92
N=40 and Class width = 5

Relative Frequency Distributions


The relative frequency of a class is the frequency of the class divided by the total frequency of all
classes and is generally expressed as a percentage. The sum of the relative frequencies of all classes is clearly
1 or 100%.
Example: The relative frequency of the class 75 – 79 is 8/40 = 0.2 or 20%.

Cumulative Frequency Distributions (Ogives)


The total frequency of all values less than the upper class boundary of a given class interval is called
the cumulative frequency up to and including that class interval.
Example: The cumulative frequency up to and including the class interval 75-79 is (4+8=12) 12/40
= 0.3 or 30%.

II. Graphical Representation of Statistical Data

General Principles:
1. Graphing is done by using the Cartesian coordinate plane.
2. The vertical line is the y-axis or ordinate, the horizontal line is the x-axis or abscissa.
3. These two lines interest at a common point called point of origin or point of intersection.
4. Distances measured along the x-axis to the right of 0 are positive; to the left of 0, the distances measured
are negative. The distances measured above 0 are positive and the distances measured below 0 are
negative.

2
A. Histogram
A frequency curve, which is composed of a series of rectangles, constructed with the steps as the
base and the frequency as the height.
Steps in Preparing a Histogram
1. Prepare the x and y-axis
2. Mark x and y scale, x representing the scores and y, the frequencies.
3. The bases of the bars are plotted on the x-axis where the width of the base corresponds to the real limits
or class boundaries of the class interval. The center of the base falls on the midpoint of the class interval.
Y(freq)
12

X(score)
69.5 74.5 79.5 84.5 89.5 94.5
0
class boundaries

B. Frequency Polygon
This is a line graph of class frequencies plotted against class marks. It is made by connecting the
midpoints of the rectangular tops in a histogram.

Steps in making a Frequency Polygon


1. Label the points on the base line.
2. Plot the midpoints, scores within the interval are concentrated on the midpoint.
3. When all midpoints are located, join them by a series of short lines, additional at both ends are needed.

14
12
class frequency

10
8
6
4
2
0
67 72 77 82 87 92 97
score(midpoint)

C. Cumulative Frequency Graph (Ogives)


The cumulative frequency graph is another way of representing a frequency distribution by means of
a diagram. In constructing this graph, the upper real limits of the class intervals are used, scores must be added
successively from the lowest class interval to the highest class interval

3
45
40
class frequency
35
30
25
20
15
10
5
0
69.5 74.5 79.5 84.5 89.5 94.5
score(class boundaries)

D. Bar Chart, Bar Graph or Bar Diagram


-The class intervals are plotted on the x-axis, the absolute frequency on the y-axis. Each interval is
represented by a rectangle whose base corresponds to a class interval and whose height is equal to frequency
associated with the class interval.

Y(freq)
12

0 X(score)
70 74 75 79 80 84 85 89 90 94

class limits

E. Pie Chart. A pie chart displays the absolute or relative frequencies of the intervals as sectors of the circle.
Each sector in a pie chart corresponds to a class interval; The ratio of the area of the sector of the circle is equal
to the relative frequency of the class.

4
POPULATION PARAMETER
The population parameter or briefly parameter is a numerical quantity that describes some
characteristic of a population. Parameters are normally represented by Greek letters. The most common
parameters are the population mean () and variance (2).

SAMPLE STATISTIC
The sample statistic or briefly statistic is a quantitative value that is calculated from the observations
in a sample. They are usually represented by lowercase English letters with other symbols. The sample mean
(x`) and sample variance (s2) are two of the common statistics derived from samples.

MEASURES OF LOCATION

I. PERCENTILE
Percentiles are values that divide the data into one hundred equal parts.

Finding the percentile from the ungrouped data.


Formula:
p* N
Pp = , th _ order
100

Example: Given data: 2, 5, 6, 10, 4, 5, 7, 9, 6, 12


Find:
a. P25 = (25x10)/100 = 2.5 or 3rd order → 5
Ascending order: 2, 4, 5, 5, 6, 6, 7, 9, 10, 12
b. P50 = (50x10)/100 = 5 → (5th and 6th)/2 → (6 + 6)/2 = 6
Ascending order: 2, 4, 5, 5, 6, 6, 7, 9, 10, 12

Finding the percentile from the grouped data.


Formula:
𝒑∗𝑵
[ 𝟏𝟎𝟎 ] − 𝑭𝑷
𝑷𝑷 = 𝑳 + ( )𝑪
𝒇𝑷

Where:
L = lower class boundary of the class interval upon which pN/100 lies.
Fp = sum of all scores upon intervals below L.
fp = number of scores within the interval upon which pN/100 falls.
C = class interval size or class width.
Example:
Find:
a. P25 = 74.5+((10-4)/8) x5= 78.25

b. P50 = 79.5 +((20 -12)/12)*5 = 81.83

5
II. DECILE
Decile are values that divide the data into ten equal parts.

Finding the decile from the ungrouped data.


Formula:
d*N
Dd = , th _ order
10
Example:
Find:
a. D2 = (2x10)/10 = 2 → (2nd and 3rd)/2 → (4 + 5)/2 = 4.5
Ascending order: 2, 4, 5, 5, 6, 6, 7, 9, 10, 12
b. D5 = (5x10)/10 = 5 → (5th and 6th)/2 → (6 + 6)/2 = 6
Ascending order: 2, 4, 5, 5, 6, 6, 7, 9, 10, 12

Finding the decile from the grouped data.


Formula:
𝒅∗𝑵
[ 𝟏𝟎 ] − 𝑭𝒅
𝑫𝒅 = 𝑳 + ( ) 𝒙𝑪
𝒇𝒅

Where:
L = lower class boundary of the class interval upon which dN/10 lies.
Fp = sum of all scores upon intervals below L.
fp = number of scores within the interval upon which dN/10 falls.
C = class interval size or class width.
Example:
Find:
a. D2

b. D5

III. QUARTILE
Quartiles are values that divide the data into four equal parts.

Finding the quartile from the ungrouped data.


Formula:
q* N
Qq = , th _ order
4
Example:
Find:
a. Q2 = (2x10)/4 = 5 → (5th and 6th)/2 → (6 + 6)/2 = 6
Ascending order: 2, 4, 5, 5, 6, 6, 7, 9, 10, 12

b. Q3 = (3x10)/4 = 7.5 or 8th order → 9


Ascending order: 2, 4, 5, 5, 6, 6, 7, 9, 10, 12

6
Finding the quartile from the grouped data.
Formula:
𝒒∗𝑵
[ 𝟒 ] − 𝑭𝒒
𝑸𝒅 = 𝑳 + ( ) 𝒙𝑪
𝒇𝒒
Where:
L = lower class boundary of the class interval upon which qN/4 lies.
Fq = sum of all scores upon intervals below L.
fq = number of scores within the interval upon which qN/4 falls.
C = class interval size or class width.
Example:
Find:
a. Q2
b. Q3

MEASURES OF CENTRAL TENDENCY


The measure of central tendency is the point about which the scores tend to cluster, a sort of average
in the series. It is the center of concentration of scores in any set of data. It is a single number which represents
the general level of performance of a group.
The Three Measures of Central Tendency
1. Mean
2. Median
3. Mode
MEAN
The mean or arithmetic mean, or arithmetic average is defined as the sum of the values in the data
group divided by the number of values.
When to use the mean:
1. When the scores are distributed symmetrically around a central point.
2. When the measure of central tendency having the greatest stability is wanted.
3. When other statistics like standard deviation, coefficient of correlation, etc. are to be computed later, since
these statistics are based upon the me
Finding the mean from Ungrouped Data
Formula:
Sample Mean, x
x
x=
n
Where:
x = sample score or measure
n = number of scores or measures
Population Mean, 
x
=
N
Where:
x = population score or measure
N = number of scores or measures
Example:
Given data: 3 4 5 6 8 9 10 12 3 6, find the mean.

7
Finding the mean from Grouped Data
Method I
Formula:
∑ 𝒇𝑪𝑴
̅=
𝒙
𝑵
Where:
f = class frequency
CM = class mark
N = Sum of the frequencies

Method II by the “Assumed Mean” or short method.


Formula:

∑ 𝒇𝒅
̅ = 𝑨𝑴 + ⌊
𝒙 ⌋∗𝑪
𝑵

Where:
AM = assumed mean
C = class size
d = deviation

MEDIAN
The median is that point on the scale of scores below which ½ of the score lie above which the other
half of the score lie.
When to use the Median:
1. When the exact midpoint of the distribution is wanted, the 50% point.
2. When there are extreme scores which would markedly affect the mean. Extreme score do not disturb the
median.
3. When it is desired that certain scores should influence the central tendency, but all that is known about
them is that they are above or below the median.

Finding the Median from Ungrouped Data.

1. When N is odd, the median is in the middle score.


Example: 30 20 15 12 10 9 8 5 3

There are 9 scores and the median is 10

2. When N is even, the median is the average of the two middle scores.
Example: 30 20 15 12 10 9 8 5 3 1

There are 10 scores and the median is (10+9)/2 = 9.5

3. When several scores have the same value as the midscore.


Example: 15 15 14 13 12 12 8 8 5 2 1

Median is 12
8
Finding the median from Grouped Data
Formula:
𝑵
[ 𝟐 − 𝑭𝟐 ]
𝑴𝒅 = 𝑳 + ( ) 𝒙𝑪
𝒇𝟐

Where:
L = Lower class boundary of the interval where the median lies.
N = Number of scores
F2 = Cumulative frequency up to the class immediately preceding the median
f2 = frequency of the median class
C = class size

MODE
Mode is that single measure or score which occurs most frequently. When data are grouped into
frequency distribution, the crude mode is usually taken to be the midpoint of that interval which contains the
largest frequency.
When to use the Mode.
1. When a quick and approximate measure of the central tendency of all that is wanted.
2. When the measure of central tendency should be the most typical value.
Finding the Mode from Ungrouped Data
Example:
1. A set of numbers 1 2 4 5 6 7 8 8 8 9 10, the mode is 8.
2. A set of numbers 1 2 4 5 6 7 8 9 10, no mode.
3. A set of numbers 1 2 6 6 6 7 8 8 8 9 10, the modes are 6 and 8 (bimodal).

Finding the Mode from Grouped Data.


Formula:

𝒅𝟏
𝑴𝒐 = 𝑳 + [ ]∗𝑪
𝒅𝟏 + 𝒅𝟐

Where:
L = exact lower class boundary of the modal class.
d1 = difference between the frequency of the modal class and the
frequency of the class below it.
d2 = difference between the frequency of the modal class and the
frequency of the class above it.
C = class width.

MEASURES OF DISPERSION/VARIATION
A measure of dispersion is a method of measuring the degree by which numerical data or values tend
to spread from or cluster about the central point of average.
The most common measures of dispersion are the ff:
1. Range
2. Quartile Deviation
3. Average Deviation
4. Standard Deviation
5. Variance

9
RANGE
The Range for Ungrouped Data:

The range of a set ungrouped data is the difference between the highest and lowest values.

𝑹 = 𝑯𝑺 − 𝑳𝑺

Example: Find the range of the ff. set of number.


11 14 23 18 16 20 and 19

R = 23 – 11 = 12
The Range for Grouped Data
The range for grouped data is generally defined as the difference between the upper boundary of the
highest class interval and the lower boundary of the lowest class interval.

𝑹 = 𝑼𝒃𝒉 − 𝑳𝒃𝒍

QUARTILE DEVIATION
It is also called the semi-quartile range. It is defined as the amount of dispersion present in the middle
of 50 % of the value. Hence, the equation is given by

𝑸𝟑 − 𝑸𝟏
𝑸. 𝑫. =
𝟐

Where:
Q.D. = the quartile deviation
Q1 = the first quartile
Q3 = the third quartile

AVERAGE DEVIATION, A.D. (MEAN ABSOLUTE DEVIATION, M.A.D.)


The average deviation measures the extent by which each individual value in a distribution deviates
from the mean of that distribution.

From the Ungrouped Data:

∑|𝒙 − 𝒙
̅|
𝑨. 𝑫. = (𝒔𝒂𝒎𝒑𝒍𝒆)
𝒏

Where:
x= the individual value
𝑥̅ = sample mean
n = total number of observation

∑|𝒙 − 𝝁|
𝑨. 𝑫. = (𝒑𝒐𝒑𝒖𝒍𝒂𝒕𝒊𝒐𝒏)
𝒏

Where:
 = population mean

10
From the Grouped Data

∑ 𝒇 ∗ |𝑪𝑴 − 𝒙
̅|
𝑨. 𝑫. = (𝒔𝒂𝒎𝒑𝒍𝒆)
𝒏

∑ 𝒇 ∗ |𝑪𝑴 − 𝝁
̅|
𝑨. 𝑫. = (𝒑𝒐𝒑𝒖𝒍𝒂𝒕𝒊𝒐𝒏)
𝒏

STANDARD DEVIATION
The standard deviation of a set of N numbers is denoted by s and is defined by

̅) 𝟐 ]
[∑(𝒙 − 𝒙
𝑺=√ (𝒔𝒂𝒎𝒑𝒍𝒆)
𝒏

Thus s is the root mean square of the deviations from the mean or, as it is sometimes called, the root
mean square deviation.
For a given population, the process for finding the standard deviation is given by the formula.

[∑(𝒙 − 𝝁)𝟐 ]
𝝈=√ (𝒑𝒐𝒑𝒖𝒍𝒂𝒕𝒊𝒐𝒏)
𝑵

FOR GROUPED DATA


The Standard deviation of a frequency distribution can be computed by the following formula

̅)𝟐 ]
[∑ 𝒇(𝑪𝑴 − 𝒙
𝑺=√
𝒏
Where:
CM = class mark
𝑥̅ = sample mean
f = class frequency
n = total frequency
For easier computation, the above formula can be converted to the form

𝟐
[∑ 𝒇𝑪𝑴 𝟐 ] ∑ 𝒇𝑪𝑴
𝑺=√ − [ ]
𝒏 𝒏

To compute for the standard deviation of a population in a frequency distribution, use any of the
following:

[∑ 𝒇(𝑪𝑴 − 𝝁)𝟐 ]
𝝈=√
𝑵

𝟐
[∑ 𝒇𝑪𝑴 𝟐 ] ∑ 𝒇𝑪𝑴
𝝈=√ − [ ]
𝑵 𝑵

11
THE VARIANCE
The variance of a set of data is defined as the square of the standard deviation. Thus s2 and 2 would
represent the sample variance and population variance respectively.
SKEWNESS AND KURTOSIS
Skewness is the degree of asymmetry, or departure from symmetry, of a distribution.
If the frequency curve of a distribution has a longer “tail” to the right of the central maximum than
to the left, the distribution is said to be skewed to the right or to have positive skewness.
+SK
mean > median

note: skewed is a set of observation that is not


symmetrically distributed.

Long right tail

If the reverse is true it is said to be skewed to the left or to have negative skewness.
-SK mean < median

Long left tail

If the longer tail of the curve is located at the center, it is said to have symmetrical skewness. In
symmetrical skewness, the mean, median and mode are all equal
𝑥̅ = Md = Mo

C.T.

12
For skewed distributions the mean tends to lie on the same side of the mode as longer tail. Thus a
measure of the asymmetry is supplied by the difference ( Mean-Mode). This can be made dimensionless on
division by a measure of dispersion, such as standard deviation, leading to the definition

𝑀𝐸𝐴𝑁 − 𝑀𝑂𝐷𝐸
𝑆𝐾𝐸𝑊𝑁𝐸𝑆𝑆 =
𝑆𝑇𝐴𝑁𝐷𝐴𝑅𝐷 𝐷𝐸𝑉𝐼𝐴𝑇𝐼𝑂𝑁
̅ − 𝑴𝑶
𝑿
𝑺𝑲 =
𝑺

To avoid use of the mode, we can employ an empirical formula and define

3(𝑀𝐸𝐴𝑁 − 𝑀𝐸𝐷𝐼𝐴𝑁)
𝑆𝐾𝐸𝑊𝑁𝐸𝑆𝑆 =
𝑆𝑇𝐴𝑁𝐷𝐴𝑅𝐷 𝐷𝐸𝑉𝐼𝐴𝑇𝐼𝑂𝑁
̅ − 𝑴𝑫 )
𝟑(𝑿
𝑺𝑲 =
𝑺
KURTOSIS
Kurtosis is the degree of peakedness of a distribution, usually taken relative to a normal distribution.
A measure of kurtosis based on both quartiles and percentiles is given by

𝑄. 𝐷.
𝐾=
𝑃90 − 𝑃10

A. Leptokurtic
– a distribution having a relatively high peak.

B. Platykurtic
- a distribution which is flat-topped

C. Mesokurtic
- a distribution which is not very high peaked or very flat-topped.

13
SOLUTIONS:
Problem 1- 5
For the given data:
30 40 50 55 45 80 40 63 70 25
In Ascending Order:
25 30 40 40 45 50 55 63 70 80

Calculator technique:
To set the mode to Stat and Single-variable:
MODE→ 3:STAT → 1:1-VAR
To Input the data:
SHIFT →1(STAT) → 2:DATA
then INPUT the values (1ST Column)

1. Find the Mean.


A. 48.9
B. 49.8
C. 475.5
D. 16.563
To compute the Mean:
SHIFT →1(STAT) →4:VAR → 2:𝑥̅
Mean = Average = 49.8

2. Find the Mode


A. 40
B. 47.5
C. 48.9
D. 16.563
In Ascending Order:
25 30 40 40 45 50 55 63 70 80

3. Find the Median


A. 40
B. 47.5
C. 48.9
D. 45
In Ascending Order:
25 30 40 40 45 50 55 63 70 80
Median = (45+50)/2 = 47.5

4. Find the Standard deviation (population)


A. 40
B. 475.5
C. 274.36
D. 16.563
To compute the Standard deviation:
SHIFT →1(STAT) →4:VAR → 3:𝜎𝑥

5. Find the Range


A. 5
B. 55
C. 10
D. 16.563
Range = 80 – 25 = 55

14
Problem 6 – 8
For the given Frequency Distribution
Interval Freq. Class mark
10 – 19 6 14.5
20 – 29 7 24.5
30 – 39 8 34.5
40 – 49 10 44.5
50 – 59 9 54.5

To set the mode to Stat and Single-variable:


MODE→ 3:STAT → 1:1-VAR
To change the number of frequency
SHIFT → MODE → ARROW DOWN →4:STAT → 1:ON
To Input the data:
SHIFT →1(STAT) → 2:DATA
then INPUT the values (1st column (x)→ class mark and 2nd column (freq)→ frequency)
6. Find the Mean
A. 34.7
B. 8
C. 13.865
D. 36.75
To compute the Mean:
SHIFT →1(STAT) →4:VAR → 2:𝑥̅

Mean = Average = 36.75


7. Find the standard deviation
A. 13.865
B. 7.35
C. 8
D. 40
To compute the Standard deviation:
SHIFT →1(STAT) →4:VAR → 4:s𝑥

s = 13.865
8. Find the variance
A. 192.24
B. 54.0225
C. 64
D. 1600
Variance = s2 = 192.24

15

You might also like