You are on page 1of 100

Research Statistics

Art Walden A. Dejoras


Chapter 2
Descriptive Statistics
Recall: Descriptive Statistics

Descriptive statistics deals with the collection,


organization, summarization and presentation of
data.
Descriptive statistics ncludes the following:
•Frequency Distribution
•Measures of Central Tendency
•Measures of Variation
•Measures of Position
•Normal Distribution (Measures of Skewness & Kurtosis)
Frequency Distribution

Distribution is a list of scores taken on some particular


variable.

Example: The following is a distribution of 10 students'


scores on science test

69, 77, 77, 77, 84, 85, 85, 87, 92, 98


Frequency (f ) of a particular data set is the number of times
a particular observation occurs in the data.
Frequency Distribution is the pattern of frequencies of
observations or listing of case counts by category. It can show
either the actual number of observations falling in each range
or the percentage of observations (the proportion per
category). They can be shown using as frequency tables or
graphics.

Frequency Table is a chart presenting statistical data that


categorizes the values along with the frequency of each value.
Example 1:
Example 2:
Example 3:
Example 4: Grouped Frequency Distribution Table

f
Common Graphical Representations
of Frequency Distributions
• Bar Graphs
• Histogram
• Pie Chart
• Line Graph
Bar Graph

A bar graph is a chart with rectangular bars. The length


of each bar is proportional to the value it represents.

Bar graph is used for nominal and ordinal data to


indicate the frequency of distribution.
Bar Graph for Example 2
Vertical Bar Graph
Horizontal Bar Graph
Group Vertical Bar Graph
Stacked Bar Graph
Histogram

Histogram is a graph that consists of a series of


columns; each column represents an interval having a
category of a variable. The frequency of occurrence is
represented by the column's height.

Histogram is useful in graphically displaying interval


and ratio data.
Example
Example
Creating Histogram using SPSS
Step 1: Open SPSS.

Step 2: Click on the circle corresponding


to “Type in data”.

Step 3: Enter your data in one column.

Step 4: Click on “Variable View” tab.

Step 5: Type in name for the variable.

Step 6: In the measure column, pick “Scale”.

Step 7: Click Graphs > Legacy Dialogs > Histogram...

Step 8: Move your variable to “Variable:” box.

Step 9: Click “OK”.


Creating Histogram using SPSS (Output)
Pie Chart

A pie chart is a circular chart that provides a visual


representation of the data (100% = 360 degrees). The
pie is divided into sections that corresponds to a
category of the variable (i.e. age, height, etc.). The size
of the section is proportional to the percentage of the
corresponding category.

Pie charts are especially useful for summarizing nominal


data.
Example
Example
Line Graph

Line graph represents data that use points connected by


line in order to show the trend of changes in the data
on a given period of time (independent variable)at an
equally spaced interval (hourly, daily, weekly, monthly,
yearly, etc.).
Data (from dependent variable) like changes in
temperature and population can be represented by line
graph.
Example
Example
Example
Boxplot

A boxplot is a standardized way of displaying the


distribution of data based on a five number summary
(“minimum”, first quartile (Q1), median, third quartile
(Q3), and “maximum”). It can tell you about your
outliers and what their values are. It can also tell you if
your data is symmetrical, how tightly your data is
grouped, and if and how your data is skewed.
Five-Number Summary
1. First Quartile (Q1 or 25th Percentile): the middle number between
the smallest number and the median of the dataset.

2. Median (Q2 or 50th Percentile): the middle value of the dataset.

3. Third Quartile (Q3 or 75th Percentile): the middle value between


the median and the highest value of the dataset.

Formula for ith Quartile: observation

where the n number of observations are arranged from least to


highest.

If Q1 and Q3 are not integers, the quartiles are found by


interpolation.
Interquartile Range (IQR): 25th to the 75th percentile.
Formula: IQR = Q3 - Q1

4. Maximum: The highest observation in the data set after


the outliers are removed

5. Minimum: The least observation in the data set after the


outliers are removed

Outliers are extreme observations in the data set.

If an observation is greater than Q3 + 1.5(IQR) or


less than Q1 - 1.5(IQR), then it is considered as an
outlier.
Example:
Give the 5-number summary for the following data set. Then
draw a boxplot illustrating the 5-number summary. Determine
if
there exists an outlier in the given data set.

{1, 2, 2, 2, 3, 4, 4, 5, 5, 5, 6, 6, 6, 6, 6, 8, 8, 9, 10, 27}.


Creating Boxplot using SPSS

Given: {1, 2, 2, 2, 3, 4, 4, 5, 5, 5, 6, 6, 6, 6, 6, 8, 8, 9, 10, 27}.

Step 1: Open SPSS.


Step 2: Click on the circle corresponding to “Type in data”.
Step 3: Click on “Variable View” tab.
Step 4: Type in name for the variable corresponding to scores.
Step 5: In the measure column, pick “Scale”.
Step 6: Type in name for the variable corresponding to group.
Step 7: In the measure column, pick “Nominal”.
Step 8: Click on “Data View” tab.
Step 9: Enter the data in the column corresponding each variable.
Step 10: Click Graphs > Legacy Dialogs > Boxplot... > Simple > Define
Step 11: Move your score variable to “Variable:” box.
Step 12: Move your group variable to :Category Axis:” box.
Step 13: Click Ok.
Creating Boxplot using SPSS (OUTPUT)
Given: {1, 2, 2, 2, 3, 4, 4, 5, 5, 5, 6, 6, 6, 6, 6, 8, 8, 9, 10, 27}.
Question:

Consider the results of the post-test from students for the year 2000
and the year 2010 being illustrated in boxplots. What do these results
tell us about how students performed on the 29-question post-test for
the two years?
Answer:

If we compare only the lowest and highest


scores between the two years, we might
conclude that the students in 2010 did better
than the students in 2010. This conclusion
seems to follow since the lowest score of 8 in
2010 is greater in value than the lowest score of
6 in 2000. Also, the highest score of 28 in 2010
is greater in value than the highest score of 27
in 2000.
But the box portion of the illustration gives us
more detailed information. The middle bar in
each box shows us that the median score of 20
in 2000 is greater in value than the median
score of 17 in 2010. Further, we note that the
box and whiskers divide the illustration into four
pieces. Each of these four pieces represents
the same portion of students. So, the upper half
of the students in 2000 scored in the same
score range as the upper one-fourth of the
students in 2010, see the illustration at a score
of 20.
By considering the upper one-fourth,
upper half, and upper three-fourths
instead of just the lowest and highest
scores, we would conclude that the
students as a whole did much better in
2000 than in 2010. We would conclude
that as a whole the students in 2010 are
less prepared than the students in 2000.
Consider this...
What is the representative height
of this group of students?
If you were to join any of these two teams, which team would you
choose? Why?
Measures of Central Tendency

Measure of Central Tendency (Measure of


Average) is a single value used to represent or
summarize the entire data set. It describe where the
data are centered.
Three Measures of Central Tendency.
•Mean
•Median
•Mode
Mean
Median
Mode
Levels of Measurement and the
Applicable Measure of Central Tendency
Levels of Measurement and the Best
Measure of Central Tendency
Advantages & Disadvantages of the
Measures of Central Tendency
Question:
Consider the scores of 14 students from each of the two
classes in the 50-point mathematics ability test.

Which class is a better


performer in terms of
mathematics ability?
Why?
Introduction to Measures of Variability
Below are two sets of employee performance scores taken from two
work divisions in a company. Assume that the two sets of data are
populations.

The two divisions obtain the same mean score of 77.75 points in the
test. Can we say that the quality of performance in the two divisions
are identical?
very poor score

very high score

The scores of Division A are more widely dispersed compared to


Division B

The scores of Division B are more closely located about the mean
of 77.75 indicating a more consistent or homogeneous set of
workers in terms of performance.
Histograms
Boxplots
Measure of central tendency
presents only half of the picture
or the description of the data.
Measure of variability completes
the description of the data.
Measure of Variability

Measure of Variability is a single number that tells how


varied the observations are in a distribution. It is also called
measure of variability, dispersion or spread.
Three Common Measures of Variation are Range, Variance &
Standard Deviation
The closer the measure of variation is from zero, the less
varied or homogeneous the observations are.
The farther the measure of varaition is from zero, the more
varied or the more heterogeneous the observations are.
FORMULAS

Range = Highest Value - Lowest Value


Example: Determine the range, standard deviation and variance
of the 10 students' scores on Science test.
Interpretation of Quantitative Data using the
Mean and Standard Deviation

⌘ If we want to describe a data set, it may sometimes be useful to


present a frequency distribution table or graphics, but these are
usually more information than is needed. Two items are often
sufficient:
※ Mean - A measure that will tell us what a typical member of

data set is like.


※ Standard Deviation - A measures which tells us about how

spread out the other members of the data set are around the
mean.
Consider this...
In this distribution, there are only few fish with extreme length and
many have average length... so NORMAL.

This distribution looks like a NORMAL DISTIBUTION.


The Normal Distribution

Normal distribution is a frequency distribution that


follows a normal curve (or bell-shaped curve).
Properties:
Empirical Rule

For normally distributed data:


• Approximately 68% of the data values will be within
standard deviation of the mean.
•Approximately 95% of the data values will be within
standard deviation of the mean.
•Almost all of the data values will fall within
standard deviations of the mean.
Data falling beyond standard deviations from the mean are OUTLIERS.
Example: Suppose the scores of students in a 35-Point Biology
achievement test are normally distributed with mean of 21
and standard deviation of 3.2. Illustrate the associated
normal curve.

68% of the students got a score between 17.8 and 24.2 points.
95% of the students got a score between 14.6 and 27.4 points.
Almost all students got a score between 11.4 and 30.6
points.
Exercise

Suppose the scores of students in a 50-Point Physics


achievement test are normally distributed with mean of 32
and standard deviation of 4.5.
Show on the normal curve the range of test scores
obtained by 68%, 95% and almost all of students who
took the test.
Effect of Mean and Standard
Deviation on Normal Distributions
Effect of Mean and Standard
Deviation on Normal Distributions
Effect of Mean and Standard
Deviation on Normal Distributions
Skewness

Skewness refers to the measure of the symmetry/asymmetry of a


frequency distribution. Its formula is

If Sk < 0 , then you have a negatively skewed distribution.


If Sk = 0 , then you have a normal distribution.
If Sk > 0 , then you have a positively skewed distribution.
Normal & Skewed Distributions
Examples of a Normal Distribution
In general:
□SAT Scores
□Heights of People
□IQ Test Scores
□Intelligence Test Scores
□Psychological Test Scores
□Behavioral Test Scores
□Ability Test Scores
Example of a Positively Skewed Distribution
(Skewed to the Right)
Example of a Negatively Skewed Distribution
(Skewed to the Left)
SPSS Interpretation of Skewness

In SPSS, if -1 < skewness < 1, then the distribution


can be considered within the range of normality.
• If skewness is less than or equal to - 1, the
distribution is highly negatively skewed.
• If skewness is greater than or equal to +1, the
distribution is highly positively skewed.
Kurtosis

Kurtosis is a measure that describes the shape of a


distribution's tails in relation to its overall shape.
Leptokurtic Distribution is characterized with long
tails.

Mesokurtic Distribution is similar to the normal


distribution.

Platykurtic Distribution is characterized with short


tails.
Formula and Interpretation of
Kurtosis k

The distribution is said to be mesokurtic if k = 3, leptokurtic if k > 3


and platykurtic if k < 3.

SPSS tool reports that value in excess of 3. Thus, in SPSS, the


distribution is said to be mesokurtic if k=0, leptokurtic if k > 0 and
platykurtic if k < 0. If -1 < k < 1, then the distribution can be
considered within the range of normality.
Both the skewness and kurtosis of
the distribution should be between
-1 and 1 in order to assume the
distribution to be approximately
normally distributed.
Running Descriptive Statistics
in SPSS
Consider the folowing data on 30 students' 50-point mathematics
achievement test.

29 38 29 28 42 40

32 32 37 35 28 46

41 41 40 26 28 29

44 44 46 27 31 27

32 32 29 33 27 25

Determine the mean, standard deviation, skewness and kurtosis.


Determine if the distribution is approximately distributed.
Output :

Reporting:
The mean score of the students is 33.93 (SD = 6.63).The scores
are non-normally distributed with skewness of 0.50 (SE = 0.43)
and kurtosis of -1.17 (SE = 0.83).
Alternative Tests of Normality:

• Kolmogorov-Smirnov Test
• Shapiro-Wilk Test
Normality Tests:
□Shapiro-Wilk test
if sample size is less than 50
□Kolmogorov-Smirnov test
if sample size is greater than 50

If a significance level (or sig.) is greater


than 0.05, then normality can be assumed.
Another Way of Running Descriptives in
SPSS....

You might also like