Course Code:
Course Title
MODULE NO. 03
TITLE DESCRIPTIVE STATISTICS
OVERVIEW This module deals with the Measures of Central
Tendency, Measures of variability, Frequency
Distribution and Running Descriptive Statistics
through SPSS.
I
NTRODUCTION This module will show the statistical methods that
can be used to summarize data. After collecting
data, researchers are faced with pages of
unorganized numbers, stacks of survey responses,
etc. The goal of descriptive statistics is to
aggregate the individual scores (datum) in a way
that can be readily summarized. A frequency
distribution table can be used to get “picture” of
how scores were distributed. It organizes and
presents large data sets using tables and graphs.
LEARNING OUTCOMES The students will learn the Measures of Central
Tendency, Measures of Variability and Frequency
Distribution.
LEARNING OBJECTIVES At the end of the discussion, the learners would be
able to:
1. Differentiate the measures of Central
Tendency, its uses and limitations.
2. Identify the Measures of Variability.
3. Organize and display data using tables and
graphs.
Let’s read
Measures of Central Tendency
A measure of central tendency is a summary statistic that represents the center
point or typical value of a data set. These measures indicate where most values in a
distribution fail and are also referred to as central location of a distribution. You can
think of it as the tendency of data to cluster around a middle value, In statistics, the
three most common measures of central tendency are the mean, median, and mode.
Each of these measures calculates the location of the central point using a different
method. Choosing the best measure of central tendency depends on the type of data
you have.
Three measures of central tendency
The Mean
The Median
Mode
Unfortunately, no single measure of central tendency works best in all circumstances.
Nor will they necessarily give you the same answer.
Example
SAT scores from a sample of 10 college applicants yielded the following:
Mode: 480
Median: 505
Mean: 526
Which measure of central tendency is most appropriate?
The Mean
The mean is simply the arithmetic average.
The mean would be the amount that each individual would get if we took the total
and divided it up equally among everyone in the sample.
Alternatively, the mean can be viewed as the balancing point in the distribution
of
scores (i.e., the distances for the scores above and below the mean cancel out).
The Median
The median is the score that splits the distribution exactly in half.
50% of the scores fall above the median and 50% fall below.
The median is also known as the 50th percentile, because it is the score at
which 50% of the people fall below.
Special Notes
A desirable characteristic of the median is that it is not affected by extreme
scores.
Example:
Sample 1: 18, 19, 20, 22, 24
Sample 2: 18, 19, 20, 22, 47
Thus, the median is not distorted by skewed distributions.
The Mode
The mode is simply the most common score.
There is no formula for the mode.
When using a frequency distribution, the mode is simply the score (or
interval)
that has the highest frequency value.
When using a histogram, the mode is the score (or interval) that corresponds
to the tallest bar.
Uses of the Measures of Central Tendency
The Mean is used . . .
For interval and ratio measurements
When there are no extreme values in a distribution since it is easily affected be
extremely high or extremely low scores.
When higher statistical computations are wanted.
When the greatest reliability of the measure of Central tendency is wanted since its
computation include all the given values.
The median is used . . .
For ordinal and ranked measurements
When there are extreme values, thus, the distribution is markedly skewed.
For an open-end distribution, that is, the lowest or the highest class interval or both
are defined (i.e. 50 and below or 100 and above)
When one desires to know whether the cases fall within the upper halves or the
lower halves of a distribution.
The Mode is used . . .
For nominal and categorical data.
When a rough or quick estimate of a central value is wanted.
When the most popular or the most typical case or value in a distribution is wanted.
Limitations of the Measure of Central Tendency
Limitations of the Mean . . .
It is the most widely used average, since it is the most familiar. However, it is often
misused. It can not be used if the clustering of values or items is not substantial.
If the given values do not tend to cluster around a central value, the mean is a poor
measure of central location.
It is easily affected by extremely large or small values. One small value can easily
pull down the mean.
The mean can not be used to compare distribution since the means of 2 or more
distributions may be the same but their other characteristics may be entirely
different. The means of distribution A whose values are 80, 85, and 90 and
distribution B whose values are 86, 85, 84 are both 85. We can not imply however,
that both distributions posess the same characteristics since their patterns of
dispersions or variations are markedly different despite having the same mean.
Limitations of the Median . . .
It is easily affected by the number of items in a distribution.
It can not be determined if the given values are not arranged according to
magnitude.
If several values are contained in a distribution, it becomes laborious task to arrange
them according to n
magnitude.
Its value is not accurate as the mean since it is just an ordinal statistic.
Limitations of the Mode . . .
It is seldom or rarely used since it does not always exist.
Its value is just a rough estimate of the center of concentration of a distribution.
It is very unstable since its value easily changes depending on the approaches used
in finding it.
Distribution Shape and Central Tendency
In a normal distribution, the mean, median, and mode will be
approximately equal.
x́
Med
Mo
Skewed Distribution
In a skewed distribution, the mode will be the peak, the mean will be pulled toward
the tail, and the median will fall in the middle.
Mo Med x́
Choosing the Proper Statistic
Continuous Data
Always report the mean
If data are substantially skewed, it is appropriate to use the median as well
Categorical Data
For nominal data you can only use the mode
For ordinal data the median is appropriate (although
people often use the
mean)
Measures of
Variability
A measure of variability is a summary statistic that represents the amount of dispersion
in a dataset. How spread out are the values. Measures of variability define how far
away the data points tend to fall from the center. We talk about variability in the context
of a distribution of values. A low dispersion indicates that the data points tend to be
clustered tightly around the center. High dispersion signifies that they tend to fall further
away.
In statistics, variability, dispersion, and spread are synonyms that denote the width of
the distribution, Just as there are multiple measures of central tendency, there are
several measures of variability. The most common measures of variability – the range,
variance and standard deviation.
Measure of Variability
Range
Standard Deviation
Variance
Range
Range is the distance between two extreme scores.
It informs us about the dispersion of our distribution.
The larger the range the larger the dispersion from the mean value.
Although the mean of the scores of two distributions can be identical their ranges
may be different.
Drawbacks to the Range
Good preliminary measure, but one single extreme value can influence the range
significantly.
The calculation of the range is derived from the highest and lowest values and
doesn’t tell us anything about the variability of the different values.
Standard Deviation
Defined as the variability of the scores around the mean
Each score in a distribution varies from the mean by a greater or lesser amount,
except when the score is the same as the mean.
Deviations from the mean can be noted as either positive or negative deviations from
the mean.
The average of these deviations would equal “zero.”
Large SD
Small SD
Variance
The variance and the closely-related standard deviation are measures of how spread
out a distribution is.
Frequency Distribution Tables
Overview
After collecting data, researchers are faced with pages of unorganized numbers,
stacks of survey responses, etc.
The goal of descriptive statistics is to aggregate the individual scores (datum) in a
way that can be readily summarized.
A frequency distribution table can be used to get “picture” of how scores
were distributed.
Frequency Distributions
A frequency distribution displays the number (or
percent) of individuals that obtained a particular score or fell in a particular category.
As such, these tables provide a picture of where people respond across the range of
the measurement scale.
One goal is to determine where the majority of respondents were located.
When to Use Frequency Tables
Frequency distributions and tables can be used to answer all descriptive research
questions.
It is important to always examine frequency distributions on the IV and DV when
answering comparative and relationship questions.
Three Components of a Frequency Distribution Table
Frequency - the number of individuals that obtained a particular score (or
response).
Percent - the corresponding percentage of individuals that obtained a particular
score.
Cumulative Percent - the percentage of AGE RECOMMENDED
individuals that fell at or below a particular score 31 2
(not relevant for nominal variables).
26 3
32 4
Example 37 5
18 4
What are the ages of students in an online
course? 31 5
38 4
Are students likely to recommend the course to 49 2
others?
35 4
37 3
43 4
41 5
49 4
40 2
Step 1: Input the Data into SPSS
Step 2: Run the Frequencies
Analyze Descriptive Statistics Frequencies
Move variables to the Variables box (select the variables and click on the arrow).
Click OK.
Example
Frequency distribution showing the ages of students who took the online
course.
Student responses when asked whether or not they would recommend the online
course to others.
Most would recommend the course.
Running Descriptive Statistics
Example
Are there differences in the anxiety levels of STATS ANXIETY
students who have had statistics before HISTORY SCORE
versus students who have never had 1 95
statistics?
1 85
Step 1: Input the data into SPSS 1 65
1 90
1 85
2 65
2 45
2 35
2 75
2 65
Step 2: Run the descriptive statistics
Analyze Compare Means Means
Anxiety = Dependent List Stats History = Independent List
Click Options
Move Median over
Move Minimum over
Move Maximum over
Click Continue
Click OK
Step 3: Create a Histogram for Anxiety with a normal curve option
Graphs Legacy Dialogues Histogram
Variable = anxiety
Check the “Display normal curve” check box
Click Ok
Histogram for Anxiety
Step 4:
Write up
the results
Descriptive statistics revealed that students who had previous experience with
statistics (M = 57.00, SD = 16.43) had lower anxiety at the beginning of the semester
than students who did not have any previous experience with statistics (M = 84.00, SD
= 11.40).
Summary of when to use the mean, median and mode
Please use the following summary table to know what the best measure of central
tendency is with respect to the different types of variable.
Type of Variable Best measure of central tendency
Nominal Mode
Ordinal Median
Interval/Ratio (not skewed) Mean
Interval/Ratio (skewed) Median
Activity
Direction : Prepare an SPSS frequency distribution table and histogram for the
following data. Do a screen shot of both input (both data view and variable
view) and the output and convert it into pdf format. Submit the pdf format.
Also upload the input and output SPSS file.
1. The following are the height (in cm) of applicants in the PNP
177 192 163 198 175
208 192 186 164 169
189 172 165 206 182
184 193 173 164 162
168 165 185 182 173
189 186 169 169 192
201 210 187 201 162
188 202 175 181 205
163 198 196 166 202
168 187 182 192 186
191 177 163 171 185
177 198 178 208 208
a. Find the Mean age
b. Find the median age
c. What is the maximum age
d. What is minimum age
e. Find the 1st Quartile, 3nd Quartile and the 5th Decile
2. Bailey has been playing golf on the weekends for the past three years. Recently, she
started keeping track of her recorded scores. Her scores for June and July at her
favorite 9-hole (par 36) golf course are provided below.
45
49
42
56
41
36
34
38
41
40
42
41
39
38
40
39
36
41
Find the Range, Standard Deviation, and Variance for the above data.
What does this information tell you about the variability of Bailey's golf game?
Prepared by:
LILIA B. CATULIN, D.P.A.
Professor