You are on page 1of 137

Basic Statistical Analysis

Descriptive Statistics
August 13-17, 2018, Angeles City, Pampanga

Resource Persons
Michael Dominic C. del Mundo
Rocky T. Marcelino

PHILIPPINE STATISTICAL RESEARCH AND TRAINING INSTITUTE


Course Outline

BASIC STATISTICAL ANALYSIS 2


Course Outline

 Introduction
 Methods of Collecting Data
 Methods of Presenting Data
• Summary Measures
• Software in Constructing Charts and
Generating Descriptive Statistics

BASIC STATISTICAL ANALYSIS 3


Basic Statistical Analysis
Descriptive Statistics
Summary Measures

PHILIPPINE STATISTICAL RESEARCH AND TRAINING INSTITUTE


Chapter Outline

BASIC STATISTICAL ANALYSIS 5


Course Outline

Summary Measures
o Measures of Central Tendency
o Measures of Dispersion
o Measures of Location
o Measures of Skewness
o Measures of Kurtosis
o Rates, Ratios, Proportions,
Percentage, Percent Change

BASIC STATISTICAL ANALYSIS 6


Learning Objectives

BASIC STATISTICAL ANALYSIS 7


Learning Objectives

• To describe data using summary measures;


• To use MS Excel® in computing for the
different summary measures; and
• To interpret correctly MS Excel® summary
measures output.

BASIC STATISTICAL ANALYSIS 8


Illustration

Describing the Data


You want to determine the change in the
physical characteristics of janitor fish in
Laguna de Bay. You measured the length of
seven fish samples that were collected in 2016
and 2017. The data obtained are as follows:
Year Length (mm)
2016 82 82 83 83 83 84 91
2017 129 83 82 83 50 46 115

BASIC STATISTICAL ANALYSIS 9


Summary Measures

BASIC STATISTICAL ANALYSIS 10


Summary Measures

A summary measure is a single numeric figure


that describes a particular feature of the
collection of observations.
If the summary measure is computed
using population data then it is a
parameter. While if the summary measure
is computed based on sample data then
it is as a statistic.

BASIC STATISTICAL ANALYSIS 11


Summary Measures

o Measures of Central Tendency


o Measures of Location
o Measures of Variability
o Measures of Skewness
o Measures of Kurtosis
o Proportions, Rates, Ratios, Percent Change

BASIC STATISTICAL ANALYSIS 12


Measures of Central Tendency

BASIC STATISTICAL ANALYSIS 13


Measures of Central Tendency

A measure of central tendency is a summary


measure that can be used to represent all
the other values in the collection.
Notes:
• Some people refer to this measure as
the “average”.
• This measure tells us where the “center”
of the distribution lies.

BASIC STATISTICAL ANALYSIS 14


Measures of Central Tendency

• Arithmetic Mean
• Median
• Mode

Note:
The use of this measure will facilitate the
comparison of two or more data sets.

BASIC STATISTICAL ANALYSIS 15


Measures of Central Tendency: Arithmetic Mean

The arithmetic mean is the sum of all


observed values divided by the total
number of observations.
Let the population data = {x1, x2, …, xN} with
N population units
Let the sample data = {x1, x2, …, xn} with
n sample units

BASIC STATISTICAL ANALYSIS 16


Measures of Central Tendency: Arithmetic Mean

N n

 xi
x 1  x 2  ...  x N  xi
x 1  x 2  ...  x n
μ  i1
 x  i
1

N N n n

Population Mean Sample Mean


is a parameter is a statistic

DESCRIPTIVE STATISTICS 17
Measures of Central Tendency: Arithmetic Mean

Examples:
1. Five foresters reported the number of illegal
loggers they have apprehended as follows:
1, 2, 5, 5, 7
1  2  5  5  7 20
μ   4
5 5
The mean number of illegal loggers
apprehended is 4.

BASIC STATISTICAL ANALYSIS 18


Measures of Central Tendency: Arithmetic Mean
Examples:
2. The change in fixed assets (in percent)
of 6 government agencies are as follows:
1.2, -1.5, 3.4, 2.1, -2.7, 4.1
1.2  1.5  3.4  2.1  2.7  4.1 6.6
μ    1.1
6 6
The mean percent change in fixed assets
of the government agencies is 1.1

BASIC STATISTICAL ANALYSIS 19


Measures of Central Tendency: Arithmetic Mean

The Mean as a “Center” of Mass

o What would happen if the first measurement


had been 7 instead of 8?
o What would happen this time if the last
measurement had been 1000 instead of 10?

BASIC STATISTICAL ANALYSIS 20


Measures of Central Tendency: Arithmetic Mean
Effect of an Outlier on the Mean
Outliers are observations that are markedly
different from the rest of the data items.
• As “center of mass”, its value is gravely
affected by outliers.
• An outlier will pull the value of the mean in
its direction and away from the location
of majority of the observations.

BASIC STATISTICAL ANALYSIS 21


Measures of Central Tendency: Arithmetic Mean

0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17

1  2  3  4  5 15 1  2  3  4  16 26
 3   5.2
5 5 5 5

The mean is affected by extremes.


DESCRIPTIVE STATISTICS 22
Measures of Central Tendency: Arithmetic Mean

Effect of an Outlier on the Mean


• With the presence of outliers, the mean might
not be a suitable measure of central
tendency because it may not be a good
representative of the observations in the
collection.

BASIC STATISTICAL ANALYSIS 23


Measures of Central Tendency: Arithmetic Mean

Example: (Effect of an Outlier on the Mean)


Monthly income of five households in a certain
community.
P8,000 P5,000 P12,000 P5,000 P200,000
Compute for the mean.

BASIC STATISTICAL ANALYSIS 24


Measures of Central Tendency: Arithmetic Mean

Characteristics of the Mean


• The mean is the “center of mass”.
• It uses all the observed values in the
calculation.
• It may or may not be an actual
observed value in the data set.
• Its value is gravely affected by outliers.

BASIC STATISTICAL ANALYSIS 25


Measures of Central Tendency: Arithmetic Mean

Characteristics of the Mean


• The mean of a finite collection always
exists and is unique.
• Data values should be measured using
at least an interval scale for it to be
interpretable.

BASIC STATISTICAL ANALYSIS 26


Measures of Central Tendency: Weighted Mean

Weighted Mean assigns weights (or measures


of relative importance) to the observations to
be average. If observation xi assigns is assigned
to a weight wi where i = 1, 2, 3, …, n, then the
weighted sample mean is
n

 wi xi
w 1x 1  w 2 x 2  ...  w n x n
x  i
1

n
w 1x 1  w 2 x 2  ...  w n

i
wi
1

BASIC STATISTICAL ANALYSIS 27


Measures of Central Tendency: Weighted Mean
Examples:
A government agency is granting scholarship to
staff pursuing graduate degree. Course in
graduate programs earn credits of 1 to 5 units. A
staff only get a partial scholarship for next term, if
he/she get a weighted mean of 1.5 to 1.75;
otherwise, a full scholarship will receive for a
weighted mean better than 1.5 (from 1.0 to 1.49).
Determine the kind of scholarship that the two
staff will get considering their obtained grades.
BASIC STATISTICAL ANALYSIS 28
Measures of Central Tendency: Weighted Mean
Examples:
Staff A Staff B
Subject Units Grade Subject Units Grade
A 1 1.00 A 1 2.00
B 2 1.25 B 2 1.75
C 3 1.50 C 3 1.50
D 4 1.75 D 4 1.25
E 5 2.00 E 5 1.00

BASIC STATISTICAL ANALYSIS 29


Measures of Central Tendency: Weighted Mean
Examples:
Weighted Average of Staff A
= 1.00(1) + 1.25 (2) + 1.50 (3) + 1.75 (4) + 2.00(5)
1+2+3+4+5
= 1 + 2.5 + 4.5 + 7 + 10 = 1.67
15

BASIC STATISTICAL ANALYSIS 30


Measures of Central Tendency: Weighted Mean
Examples:
Weighted Average of Staff B
= 2.00(1) + 1.75 (2) + 1.50 (3) + 1.25 (4) + 1.00(5)
1+2+3+4+5
= 2 + 3.5 + 4.5+ 6 +5 = 1.33
15
Staff A will get a partial scholarship while
Staff B will get a full scholarship.

BASIC STATISTICAL ANALYSIS 31


Measures of Central Tendency: Median

The median divides the sorted observations


into two equal parts

BASIC STATISTICAL ANALYSIS 32


Measures of Central Tendency: Median

How to Determine the Median:


• Sort the observations from lowest to highest.
Let {x(1), x(2),…, x(n)} be the sorted observations.
• Thus, x(1) = smallest observation and
x(n) = largest observation.

BASIC STATISTICAL ANALYSIS 33


Measures of Central Tendency: Median

How to Determine the Median:


• Use the following formula to get the median:
Case 1: n is odd
(the median is the value in the
Md  x  n 1 
 
 2 
middle of sorted observations)
Case 2: n is even x  n2   x ( n2  1)
(the median is the average of Md 
the 2 middle observations 2

BASIC STATISTICAL ANALYSIS 34


Measures of Central Tendency: Median

Examples:
1. The following are the total receipts of 7
mining companies (in million pesos):
1.2, 4.5, 6.5, 7.2, 10.4, 12.5, 50.6
The median is 7.2.
At least fifty percent of the seven mining
companies have total receipts less than or
equal to 7.2 million pesos.

BASIC STATISTICAL ANALYSIS 35


Measures of Central Tendency: Median

Examples:
2. The following are the number of operating
years of 8 mining companies:
8, 10, 10, 11, 16, 17, 17, 18
The median is (11+16) / 2 = 13.5
At least half of the eight mining companies
have been operating for at most 13.5
years.

BASIC STATISTICAL ANALYSIS 36


Measures of Central Tendency: Median

0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17

x(3)= 3 x(3) = 3

The median is not affected by extremes.

DESCRIPTIVE STATISTICS 37
Measures of Central Tendency: Median

Characteristics of the Median


• The median is also a measure of location.
It indicates the relative position of an
observation in the distribution.
• If the observation is smaller than the median
then it belongs in the lower half of the
distribution; while if the observation is larger
than the median then it belongs in the upper
half of the distribution.

BASIC STATISTICAL ANALYSIS 38


Measures of Central Tendency: Median

Characteristics of the Median


• The median is affected by the position of
each observation in the sorted data but
not by the value of the observation.
Consequently, outliers do not affect the
median.
• It is interpretable even if the level of
measurement is as low as ordinal.

BASIC STATISTICAL ANALYSIS 39


Measures of Central Tendency: Mode

The mode is the most common


observation in a data set.
It is determined by the frequency
of each value and finding the
value with the highest frequency
of occurrence

BASIC STATISTICAL ANALYSIS 40


Measures of Central Tendency: Mode
Examples: Find the mode
1. The following are waistlines (in inches) of 12 males:
25, 26, 29, 30, 30, 29, 30, 30, 30, 31, 34, 36

2. Given the number of children of 20 male


respondents:
2,5,5,2,2,5,1,3,5,4,2,5,5,2,2,5,5,2,2,1

BASIC STATISTICAL ANALYSIS 41


Measures of Central Tendency: Mode

3. The following are ages (in years) of 12 participants


in descriptive statistics training course: 25, 26, 29,
30, 30, 29, 30, 30, 30, 31, 34, 36
4. Given the type of waste disposal of six
households: compost pits, garbage collector,
compost pits, garbage collector, garbage
collector, dump in open pits

BASIC STATISTICAL ANALYSIS 42


Measures of Central Tendency: Mode

0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10

Mo = Does Not Exist Mo = Does Not Exist

The mode may or may not exist.

DESCRIPTIVE STATISTICS 43
Measures of Central Tendency: Mode

Characteristics of the Mode


• It does not always exist; and if it does, it may
not be unique.
• It is not recommended if there are only a few
observations.
• It is not affected by outliers.
• The mode can be used even if the level of
measurement is as low as nominal.

BASIC STATISTICAL ANALYSIS 44


Measures of Location

BASIC STATISTICAL ANALYSIS 45


Measures of Location

A measure of location indicates the


relative position of an observation in the
distribution.
• Percentiles
• Quartiles
• Deciles

BASIC STATISTICAL ANALYSIS 46


Measures of Location: Percentile

Percentiles divide the sorted observations


into 100 equal parts.
There are 99 percentiles where:
P1 is read as the first percentile.
P2 is read as the second percentile.
:
P99 is read as the ninety-ninth percentile

BASIC STATISTICAL ANALYSIS 47


Measures of Location: Percentile

The kth percentile, Pk is a value such


that at least k% of the ordered data
are less than or equal to it and at least
(100 - k)% are greater than or equal to
it, where k = 1, 2, 3, …, 99.

BASIC STATISTICAL ANALYSIS 48


Measures of Location: Percentile
Example:
The 80th percentile of a distribution is a value such that
at least 80% of the ordered observations are less than its
value and at least 20% of the ordered observations are
larger than its value.
If P80 = 75: At least 80% of the ordered
observations are less than 75. or
At least 20% of the ordered
observations are larger than 75.

BASIC STATISTICAL ANALYSIS 49


Measures of Location: Percentile

Example:
Then, any observation that is smaller than
P80 value belongs in the lower 80% of the
distribution while any observation greater
than P80 value belongs in the upper 20% of
the distribution.

BASIC STATISTICAL ANALYSIS 50


Measures of Location: Percentile

Example:
Consider the forest cover (in thousand hectare) of
the different regions in the Philippines during 2010:
778, 125, 1045, 521, 2, 264, 917, 202, 138, 174, 379,
429, 249, 684, and 306. Find the 75th percentile.
P75 = 602.5
Any region with forest cover that is lower than
602,500 hectares belongs in the lower 75% of the
distribution.

BASIC STATISTICAL ANALYSIS 51


Measures of Location: Examples of Percentile
Example:
The annual per capita poverty threshold in pesos of
the different regions of the Philippines are as follows:
15,693, 13,066, 12,685, 11,128 13,760, 13,657, 11,995,
11,372, 11,313, 9,656, 9,518, 9,116, 10,503, 10,264,
10,466, 10,896, 12,192. Find the 75th percentile.

The 75th percentile is 12,685. This implies that any


region with annual per capita poverty threshold that is
lower than PhP12,685 belongs in the lower 75% of the
distribution.
BASIC STATISTICAL ANALYSIS 52
Measures of Location: Example of Percentiles
Example:
The following are the number of telephone lines
of 16 regions for the year 2004: 2799079,
94079,190335, 42860, 410841, 1049413, 125157,
427497, 470299, 151652, 35945, 147513, 295334,
82616, 117116, 33315.
Find the 50th percentile.

The 50th percentile is 149,582. Thus, any region with


number of telephone lines lower than 149,582 belongs
in the lower 50% of the distribution.
BASIC STATISTICAL ANALYSIS 53
Measures of Location: Quartile

Quartiles divide the sorted observations


into 4 equal parts.
1st Quartile = 25th Percentile
2nd Quartile = 50th Percentile
3rd Quartile = 75th Percentile

BASIC STATISTICAL ANALYSIS 54


Measures of Location: Quartile

• Third quartile, Q3, divided the bottom 75% of


the sorted observations from the top 25%.
Thus, it is equal to P75.
• Second quartile, Q2, divided the bottom 50%
of the sorted observations from the top 50%.
Thus, it is equal to P50and the median.
• First quartile, Q1, divided the bottom 25% of
the sorted observations from the top 75%.
Thus, it is equal to P25.

BASIC STATISTICAL ANALYSIS 55


Measures of Location: Decile

Deciles divide the sorted observations


into 10 equal parts.
Each part contains 10 percent of the
observations.
There are nine deciles and these are
D1, D2, D3, . . . , D9.

BASIC STATISTICAL ANALYSIS 56


Measures of Location: Decile
Example: Table 1. Average Income, Average Expenditure and Average Savings of Families
at Current Prices per Capita Income Decile, Philippines 2012 and 2015.
Per Capita Income Decile 2015 (in thousand pesos) 2012 (in thousand pesos)
Income Expenditure Savings Income Expenditure Savings
Philippines 267 215 52 235 193 42
First Decile 86 89 (3) 69 73 (4)
Second Decile 114 110 4 92 91 1
Third Decile 133 122 11 108 102 6
Fourth Decile 156 140 16 130 121 9
Fifth Decile 182 161 22 153 139 15
Sixth Decile 218 189 29 182 161 22
Seventh Decile 259 217 42 229 196 32
Eight Decile 320 260 60 286 237 49
Ninth Decile 415 326 89 381 302 79
Tenth Decile 786 534 252 715 503 213
Ratio of Tenth
Decile to First 9.1 6.0 10.4 6.9
Note: Details may not add up to total due to rounding.
Source: Philippine Statistics Authority, Family Income and Expenditure Survey, 2012 and 2015

BASIC STATISTICAL ANALYSIS 57


Illustration

You want to determine the change in the


physical characteristics of janitor fish in
Laguna de Bay. You measured the length of
seven fish samples that were collected in 2016
and 2017. The data obtained are as follows:
Year Length (mm)
2016 82 82 83 83 83 84 91
2017 129 83 82 83 50 46 115

Compare data between years using MCT.

BASIC STATISTICAL ANALYSIS 58


Measures of Dispersion

BASIC STATISTICAL ANALYSIS 59


Measures of Dispersion

A measures of dispersion indicate the


extent to which observations in the
data differ from the average value.

BASIC STATISTICAL ANALYSIS 60


Measures of Dispersion: Importance

MCTs are not enough to describe the data


BASIC STATISTICAL ANALYSIS 61
Measures of Dispersion: Two Types
Measures of Absolute Dispersion
• carries the unit of measure of the observations
• can be used to compare data sets with the
same means and the same units of
measurement
Measures of Relative Dispersion
• unitless so it can be used to compare the
dispersion of two or more data sets with different
means or different units of measurement.
BASIC STATISTICAL ANALYSIS 62
Measures of Dispersion: Two Types
Measures of Absolute Dispersion
• Range
• Standard deviation
Measures of Relative Dispersion
• Coefficient of Variation
• Standard Score

BASIC STATISTICAL ANALYSIS 63


Measures of Absolute Dispersion: Range

The range is the difference between the


maximum and minimum values of a
data set.
Range = maximum - minimum

DESCRIPTIVE STATISTICS 64
Measures of Absolute Dispersion: Range

Properties of the Range


• It does not take into account middle
observations.
• It is affected by outliers
• It tends to be smaller for smaller samples
than for larger samples.

BASIC STATISTICAL ANALYSIS 65


Measures of Absolute Dispersion: Variance

The variance describes how far the


observations are from the mean. It
comes in square of the unit of
measure of the observations.

DESCRIPTIVE STATISTICS 66
Measures of Central Tendency: Variance

N n
 (x i  μ ) 2
 i
( x  x ) 2

σ  2 i 1
s2  i 1
N n 1
Population Variance Sample Variance
is a parameter is a statistic

where xi = ith observation of the variable X


N = number of observations in the population
n = number of observations in the sample
DESCRIPTIVE STATISTICS 67
Measures of Absolute Dispersion: Standard Deviation

The standard deviation is the


positive square root of the variance.
Its unit is the same as the unit of
measurement of the observations.
• The population standard deviation σ  σ
2

• The sample standard deviation s  s


2

DESCRIPTIVE STATISTICS 68
Measures of Absolute Dispersion: Standard Deviation

Example:
Consider the pre-test scores of eight sampled
participants in descriptive statistics training course:
10, 12, 14, 15, 17, 18, 18, 24
10  12  14  15  17  18  18  24 128
x    16
8 8

BASIC STATISTICAL ANALYSIS 69


Measures of Absolute Dispersion: Standard Deviation

Example:
Consider the pre-test scores of eight sampled
participants in descriptive statistics training course:
10, 12, 14, 15, 17, 18, 18, 24
2 2 2
s 
10  16   12  16     24  16 
 4.3095
8-1
On the average, the pre-test scores of the sampled
participants deviates from 16 by 4.3095

BASIC STATISTICAL ANALYSIS 70


Measures of Absolute Dispersion
Example:
Site A: Heights (inches) of five trees

180” 180” 180” 180” 180”


Find the mean, range, and standard deviation.

BASIC STATISTICAL ANALYSIS 71


Measures of Absolute Dispersion
Example:
Site B: Heights (inches) of five trees

130” 188” 170” 194” 120”


Find the mean, range, and standard deviation.

BASIC STATISTICAL ANALYSIS 72


Comparing Standard Deviations

Mean = 15.5
Data A 11 12 13 14 15 16 17 18 19 20 21 s = 3.338

Mean = 15.5
Data B 11 12 13 14 15 16 17 18 19 20 21 s = 0.9258

Mean = 15.5
Data C
11 12 13 14 15 16 17 18 19 20 21 s = 4.57

BASIC STATISTICAL ANALYSIS 73


Measures of Absolute Dispersion: Standard Deviation

Remarks:
If there is a large amount of variation in the
data set, then on the average, the data
values will be far from the mean. Hence, the
standard deviation will be large; otherwise
the standard deviation will be small.

BASIC STATISTICAL ANALYSIS


Measures of Absolute Dispersion: Standard Deviation

Properties of the Standard Deviation


• It is affected by the value of every
observation. It may be distorted by
outliers.
• It cannot be negative.

BASIC STATISTICAL ANALYSIS 75


Measures of Relative Dispersion: Coefficient of Variation

The coefficient of variation is unitless


and is used to compare the scatter
of one distribution with the scatter of
another distribution.

DESCRIPTIVE STATISTICS 76
Measures of Relative Dispersion: Coefficient of Variation

σ s
CV  100% CV  100%
μ x
Population CV Sample CV
is a parameter is a statistic

  population standard deviation s  sample standard deviation


  population mean x  sample mean

DESCRIPTIVE STATISTICS 77
Measures of Relative Dispersion: Coefficient of Variation
Example:
Suppose you have two options in buying a stock. Stock 1 is
currently priced at P2000 per share and stock 2 is priced
P550 per share. In buying stocks, risk is reduced by
choosing a stock with stable price. However, once could
take a chance on a stock that shows greater variation in
price, hoping the prices go up rather than down. A sample
of prices of Stock 1 and Stock 2 were collected at the
close of trading for the past months.

BASIC STATISTICAL ANALYSIS 78


Measures of Relative Dispersion: Coefficient of Variation
Example:
Stock Mean Price Std. Deviation
1 P1975 P578
2 P 565 P85
578 85
CV stock1   100  29.3% CV stock2   100  15.0%
1975 565
Stock 1 price is more variable than stock 2 price.

BASIC STATISTICAL ANALYSIS 79


Measures of Relative Dispersion: Standard Score

The standard score (z-score) helps determine


the relative position of an observed value in
the collection where the observed value
came from.
A positive z-score measures the number of
standard deviations an observation is
above the mean while a negative z score
gives the number of standard deviations an
observation is below the mean.

DESCRIPTIVE STATISTICS 80
Measures of Relative Dispersion: Standard Score

x μ x x
Z  Z 
σ s
Population z-score Sample z-score
  population standard deviation s  sample standard deviation
  population mean x  sample mean

DESCRIPTIVE STATISTICS 81
Measures of Relative Dispersion: Standard Score
Example:
The mean score of participants in Exercise 1 of the
training course is 70% with standard deviation of 10%;
while in Exercise 2, the mean score is 80% with a
standard deviation of 10%.
1. If you got a score of 75% in Exercise 1 and a score of
85% in Exercise 2, in which exercise did you perform
better if we consider the score of the other
participants in the two training courses?

BASIC STATISTICAL ANALYSIS 82


Measures of Relative Dispersion: Standard Score
Example:
75 − 70 85 − 80
Zexer1 = = 0.5 Zexer2 = = 0.5
10 10

Considering the scores of the other participants in the


two exercises, your score in Exercise 1 is just as good as
your score in Exercise 2. Based on the z-scores, your
scores in both training courses are 0.5 standard
deviations above their respective mean scores.

BASIC STATISTICAL ANALYSIS 83


Measures of Relative Dispersion: Standard Score
Example:
2. Your seatmate got a grade of 70% in both Exercises 1
and 2. In which exercise did your seatmate perform
better if we consider the scores of the other
participants in the two training courses?
3. One participant got a perfect score in Exercise 1.
Compute for the z-score and interpret.

BASIC STATISTICAL ANALYSIS 84


Measures of Relative Dispersion: Standard Score

Remark on the Standard Score


• It can be used in identifying possible
outliers in the data set. By rule of thumb, if
the absolute value of the standard score is
at least 3 then that observation is marked
as a possible outlier.

BASIC STATISTICAL ANALYSIS 85


Measures of Dispersion
You want to determine the change in the
physical characteristics of janitor fish in
Laguna de Bay. You measured the length of
seven fish samples that were collected in 2016
and 2017. The data obtained are as follows:
Year Length (mm)
2016 82 82 83 83 83 84 91
2017 129 83 82 83 50 46 115

Compare the variability of the janitor fishes in


terms of length between years.
BASIC STATISTICAL ANALYSIS 86
Measures of Skewness

BASIC STATISTICAL ANALYSIS 87


Summary Measures: Shapes of the Data Distribution

BASIC STATISTICAL ANALYSIS


Symmetric Distribution

• The graph of the frequency distribution or


relative frequency distribution is
symmetric if it can be folded along the
vertical axis so that the left hand side is
the mirror image of the right hand side.
• If the distribution has one mode and is symmetric, the
mean, the median, and the mode are equal.

BASIC STATISTICAL ANALYSIS


Other Symmetric Distributions

BASIC STATISTICAL ANALYSIS


Skewed Distribution

• If the two sides do not coincide, the


distribution is said to be asymmetric
• A distribution that is asymmetric
with respect to a vertical axis is said
to be skewed.

BASIC STATISTICAL ANALYSIS


Skewness: Types

Positively Skewed (Skewed to the Right)


The longer upper tail indicates that there are
observations in the data whose values are so much
larger compared to the others. Consequently, these
observations will pull the mean to the right.
Frequency Polygon of Annual Fam ily Incom e in the Philippines:
2000
The mean will then be larger
than the median. The median
No. of Families (in thousands)

1600

1400

1200

1000

will be larger than the mode.


800

600

400

200

0
95000 195000 295000 395000 495000 595000 695000

Income

BASIC STATISTICAL ANALYSIS


Skewness: Types

Negatively Skewed (Skewed to the Left)


The longer lower tail indicates that there are
observations in the data whose values are so much
smaller compared to the others so consequently
these observations will pull the mean to the left.
FIGURE 1c. Example of a Skewed to the Left
Distribution

35

30
The mean will then be smaller
than the median. The median
Number of Provinces

25

20

will be smaller than the mode.


15

10

0
1 2 3 4 5 6 7 8 9

mean < median < mode

BASIC STATISTICAL ANALYSIS


Measures of Skewness
Pearson’s First and Second Coefficient of
Skewness
x  mo 3  x  md 
SK  SK 
s s
First-order Second-order

s  sample standard deviation md  sample median


x  sample mean mo  sample mode

BASIC STATISTICAL ANALYSIS


Measures of Skewness: Interpretation
Pearson’s First and Second Coefficient of
Skewness
SK = 0 SK > 0 SK > 0
symmetric positively skewed negatively skewed
x  md  mo x  md  mo x  md  mo

BASIC STATISTICAL ANALYSIS


Measures of Skewness
You want to determine the change in the
physical characteristics of janitor fish in
Laguna de Bay. You measured the length of
seven fish samples that were collected in 2016
and 2017. The data obtained are as follows:
Year Length (mm)
2016 82 82 83 83 83 84 91
2017 129 83 82 83 50 46 115

Describe the distribution of the lengths of the


sampled fishes in each year.
BASIC STATISTICAL ANALYSIS 96
Measures of Kurtosis

BASIC STATISTICAL ANALYSIS 97


Measure of Kurtosis

A measure of kurtosis indicates the


extent of peakedness or flatness of the
distribution

BASIC STATISTICAL ANALYSIS


Measures of Kurtosis: Interpretation

Coefficient of Kurtosis
K=0 K>0 K<0
Mesokurtic Leptokurtic Platykurtic
Normal Heavy-tailed Light-tailed

BASIC STATISTICAL ANALYSIS


Box-and-Whiskers Plot or Boxplot

BASIC STATISTICAL ANALYSIS 100


Box-and-Whiskers Plot
• It is used to display the following features
of the data:
o location
o spread
o symmetry
o extremes
o outliers
• It is a simple graphical method used to
display the 5-number summary.
BASIC STATISTICAL ANALYSIS 101
Box-and-Whiskers Plot

Step in constructing the Boxplot


1. Construct a rectangle with one end at Q1
and the other end at Q3.
2. Draw a line within the rectangle at the value
of the median.
3. Compute for the IQR, interquartile range.
IQR = Q3 – Q1

BASIC STATISTICAL ANALYSIS 102


Box-and-Whiskers Plot

Step in constructing the Boxplot


4. Compute for the lower and upper fences.
Lower fence = Q1 – 1.5IQR
Upper fence = Q3 + 1.5IQR
These are the outlier cutoffs.
5. Excluding outliers, identify the two data
values that are closest to the lower and
upper fences. Draw a line from these values
to each side of the rectangle (whiskers).
BASIC STATISTICAL ANALYSIS 103
Box-and-Whiskers Plot

Step in constructing the Boxplot


6. Plot outliers at their corresponding values
using an x mark or any symbol.

..
Q1 Md Q3
55 60 75 80 85 98 100

BASIC STATISTICAL ANALYSIS 104


Box-and-Whiskers Plot

Examining the components of the


boxplot:
1. The line inside the rectangle shows the
location of the median, the measure of
central tendency
2. The sides of the rectangle, which are
plotted either at the fourths or the
quartiles, indicate where the middle 50%
of observations lie
BASIC STATISTICAL ANALYSIS 105
Box-and-Whiskers Plot

Examining the components of the


boxplot:
3. The length of the rectangle represents the
IQR’s magnitude, the measure of dispersion;
4. The relative position of the line inside the
rectangle to its sides gives idea of the
degree and direction of symmetry since this
shows the respective distances of the
median to the lower and upper fourths
BASIC STATISTICAL ANALYSIS 106
Box-and-Whiskers Plot

Examining the components of the


boxplot:
• A line that is in the middle of the rectangle
indicates that the distribution is symmetric
• A line that is closer to the 1st quartile indicates
that the distribution is skewed to right
• A line that is closer to the 3rd quartile indicates
that the distribution is skewed to the left

BASIC STATISTICAL ANALYSIS 107


Box-and-Whiskers Plot

Examining the components of the


boxplot:

BASIC STATISTICAL ANALYSIS 108


Box-and-Whiskers Plot

Examining the components of the


boxplot:
5. If there are no outliers, then the ends of the
whiskers indicate the respective values of
both extremes. But, if there are outliers then
the farthest outlier is the extreme
6. The outliers are clearly identified by the
distinctive mark (x) used to plot them

BASIC STATISTICAL ANALYSIS 109


Box-and-Whiskers Plot
Example:

Figure 1. Daily air quality measurements in New York,


May to September 1973

BASIC STATISTICAL ANALYSIS 110


Proportions, Ratios, Rates, Percent Change

BASIC STATISTICAL ANALYSIS 111


Proportion

The proportion among elements in the


collection belonging in a given category
is defined as:
the number of elements belonging in the
category divided by the total number of
elements in the collection.

BASIC STATISTICAL ANALYSIS


Proportions
Example:
Proportion of males in the population
no. of males in the population
N

Proportion of males in the sample


no. of males in the sample
n

BASIC STATISTICAL ANALYSIS 113


Proportions
Example:
SDG indicator 5.5.1:
Proportion of seats held by women in
(a) national parliaments and
(b) local governments
no. of local government seats occupied by women
total no. of local government seats occupied by women

BASIC STATISTICAL ANALYSIS 114


Proportions
Note:
The proportion is actually a special case
of the arithmetic mean.
Let P= population proportion of males
where:
X 1  X 2  ...  X N
P 
N
1, if ith element in the population is male
Xi  
0, otherwise
BASIC STATISTICAL ANALYSIS 115
Proportions
Table 8. CLASSIFIED WATER BODIES BY REGION AS OF 2016
Example: Region Principal River Other Rivers Marine Waters Lakes Total
1 14 10 1 0 25
2 26 26 0 0 52
3 16 43 5 0 64
4A 26 32 1 2 61
4B 31 27 18 1 77
5 29 26 7 4 66
6 31 27 15 0 73
7 19 14 11 0 44
8 23 29 9 1 62
9 17 27 5 0 49
10 12 29 3 1 45
11 17 16 5 0 38
12 14 28 6 2 50
CAR 11 28 0 1 40
CARAGA 19 13 6 1 39
NCR 2 3 1 0 6
Total 307 378 93 13 791

BASIC STATISTICAL ANALYSIS 116


Proportions

Example:
Proportion of principal river classified
waterbodies as of 2016:
= 307/791
= 0.39
Note:
The sum of the proportions in the different
categories of the variable is 1.00

BASIC STATISTICAL ANALYSIS 117


Percentages

Percent means “per hundred”, “by


the hundred”, or “out of a hundred”.
A proportion can be converted to a
percentage by multiplying it by 100.

BASIC STATISTICAL ANALYSIS


Proportions

Example:
Percentage of principal river classified
waterbodies as of 2016:
= 0.39x100
= 39%
For every 100 classified waterbodies as of 2016,
there are 39 principal rivers.

BASIC STATISTICAL ANALYSIS 119


Ratios

Averaging Percentages:
Arithmetic Mean vs. Weighted Mean
There are different ways of averaging
percentages, each one assigning a different
set of weights to the percentages.

BASIC STATISTICAL ANALYSIS 120


Ratios
Example: (Averaging Ratios)
Exam 1 Exam 2 Exam 3
No. of correct answers 48 70 10
Total no. of items 50 80 100
Percentage 96% 87.5% 10%
Arithmetic Mean
= (96% + 87.5% + 10%)/3 = 64.5% .
All of the 3 exams are given the same
weights.

BASIC STATISTICAL ANALYSIS 121


Ratios
Example: (Averaging Ratios)
Exam 1 Exam 2 Exam 3
No. of correct answers 48 70 10
Total no. of items 50 80 100
Percentage 96% 87.5% 10%
Weighted Mean
= [(50)(96%) + (80)(87.5%) + (100)(10%)]/(50+80+100)
= 55.65%.
The percentage with the largest base is given the
greatest importance.
BASIC STATISTICAL ANALYSIS 122
Ratios

The ratio of a number x to another number


y expresses the size of one measure x with
respect to the size of another measure y.
• It is written as x:y and is read as “x is to y”.
• When the measure x is divided by the
measure y, the relationship that x bears to
y is then expressed as a ratio to one.
• The measure y in the denominator is called
the base.

BASIC STATISTICAL ANALYSIS


Ratios

Example:
Pupil-Teacher Ratio = Total pupils / total teachers
= 33,681/991 = 34
There are 34 pupils to 1 teacher.

BASIC STATISTICAL ANALYSIS 124


Ratios
Example: Table 2: Sex Ratio and Dependency Ration by Region 2010*
Region Sex Ratio Dependency Ratio
Philippines 102 221
NCR National Capital Region 97 196
CAR Cordillera Administrative Region 104 57
I Ilocos Region 102 60
II Cagayan Valley 104 58
III Central Luzon 102 57
IV-A CALABARZON 100 56
IV-B MIMAROPA 106 70
V Bicol Region 104 75
VI Western Visayas 103 61
VII Central Visayas 102 62
VIII Eastern Visayas 106 73
IX Zamboanga Peninsula 104 66
X Northern Mindanao 104 64
XI Davao Region 105 60
XII SOCCSKSARGEN 105 63
XIII Caraga 106 67
ARMM Autonomous Region in Muslim Mindanao 99 80

*Excludes 2,739 Filipinos in Philippine Embassies, consulates and mission abroad


Sex ratio = (Total males/Total females) x 100
Dependency Ratio = (0 to 14 years + 65 years and over)/Total of 15 to 64 years x 100
Source: Philippine Statistics Authority

BASIC STATISTICAL ANALYSIS 125


Ratios
Ratio of Sums or Ratio of Means
If the collection of measurements consists of
ratios, X 1 , X 2 ,..., X N , then the average of
 
 Y1 Y2 YN 
these ratios wherein those ratios with larger
bases are given heavier weights is the ratio
of sums, R,
X1  X 2   XN
R 
Y1 Y2  YN

BASIC STATISTICAL ANALYSIS 126


Ratios

Example: (Ratio of Sums)


Data=  48 , 70 , 10 
 50 80 100 
The average of these ratios wherein those
ratios with larger bases are given heavier
weights is
48  70  10 128
R   0.5565
50  80  100 230
Note: This is exactly the same as the weighted mean
computed in the previous example.
BASIC STATISTICAL ANALYSIS 127
Ratios

Example: (Ratio of Sums)


Per capita income: the numerator
characteristic is income while the denominator
characteristic is the number of members in a
household.
Household 1 2 3 4 5
Income(in thousands) 10 15 20 25 15
No. of Members 3 4 2 5 3
Ratio 3 ⅓ 3.75 10 5 5
(Income/no. of members)
BASIC STATISTICAL ANALYSIS 128
Ratios
Example: (Ratio of Sums)
10  15  20  25  15
Per capita income 
34253
 P5 thousand per person
The same as getting the weighted mean of the individual
ratios (income/no of members), where the ratio for the largest
household is given the largest weight.
3  3.33  4  3.75   2 10   5 10   3 5 
Per capita income 
34253
 P5 thousand per person
BASIC STATISTICAL ANALYSIS 129
Ratios
Example:

BASIC STATISTICAL ANALYSIS 130


Ratios

Example:
Policemen to population ratio (Philippines)
total no. of policemen: total population = 1:817
(divide both sides by total no. of policemen)
Total population 95814244
  817.165
Total no. of policemen 117252
For every 1 policeman in the Philippines, there
are 817 people in his charge.

BASIC STATISTICAL ANALYSIS 131


Rates

Example:
No. of policemen per 100,000 population (Phils.)
total no. of policemen: total population
=122:100,000
Total no. of policemen 117252
 100000   100000  122.374
Total population 95814244
There are 122 policemen for every 100,000
persons in the Philippines.

BASIC STATISTICAL ANALYSIS 132


Percent Change

When the new amount is less than the


original amount, the number on top will
be a negative number and the result will
be a percent decrease; otherwise, the
percentage change is positive and is
called a percent increase.
Percent Change = new amount - original amount x 100
original amount

BASIC STATISTICAL ANALYSIS 133


Percent Change

Example:
The population of the Philippines in 2007
and 2010 were 88,548,366 and 92,337,852,
respectively.
Find the percent change in the Philippine
population from 2007 to 2010.

BASIC STATISTICAL ANALYSIS 134


Percent Change
Example:
From the given problem,
new amount = 92,337,852
original amount = 88,548,366
= 92,337,852 – 88,548,366 x 100
88,548,366
= 4.3%
There was a 4.3% increase in the Philippine
population from 2007 to 2010.

BASIC STATISTICAL ANALYSIS 135


Exercise 4

BASIC STATISTICAL ANALYSIS 136


Basic Statistical Analysis
Descriptive Statistics
Resource Persons
Michael Dominic C. del Mundo
Rocky T. Marcelino

PHILIPPINE STATISTICAL RESEARCH AND TRAINING INSTITUTE

You might also like