You are on page 1of 58

Republic of the Philippines

Laguna State Polytechnic University


Province of Laguna

College of Teacher Education


Graduate Studies and Applied Research
Master of Arts in Education

Educ 200: Advanced Statistics

Alberto D. Yazon, Ph.D.


Assistant Professor IV

Lecture 3 August 25, 2018


Descriptive Statistics
• Frequency Count and Percentage
• Mean, Median and Mode
• Range, Variance, and Standard
Deviation
• Skewness and Kurtosis
Exploring SPSS
Data View (where the data are entered)
Variable View (where the data entered in the
data view are named or classified)
Exploring SPSS and Descriptive Statistics
Consider the following test scores of 40 Grade 10 students in
Statistics:
10 – Newton 10 – Einstein
62 71 80 81 74 77 87 91
62 72 80 83 75 77 87 91
69 73 80 83 76 78 87 92
70 73 81 83 76 79 88 93
70 74 81 85 76 79 89 95

Using SPSS, find the descriptive statistics of each


set of scores then compare the results.
Steps:
1. Enter the scores of 10 – Newton in the first
column of data view interface then enter the
scores of 10 – Einstein in the second column.
2. Click variable view (bottom left) then rename
the data you entered as 10Newton and
10Einstein (In naming your variables, DO NOT
use space bar).
3. To get the Frequency and Percentage of your
data set, click Analyze FrequenciesSend the
data in the left box to the rightClick OK.
Steps:
4. To get the Descriptive measures of your data
set, click Analyze DescriptivesSend the
data in the left box to the rightClick
OptionsTick all small boxes Click OK.
Exploring Microsoft Excel and Descriptive Statistics
Consider the following test scores of 40 Grade 10 students in
Statistics:
10 – Newton 10 – Einstein
62 71 80 81 74 77 87 91
62 72 80 83 75 77 87 91
69 73 80 83 76 78 87 92
70 73 81 83 76 79 88 93
70 74 81 85 76 79 89 95

Using Microsoft Excel, find the descriptive


statistics of each set of scores then compare the
results.
Click the Microsoft Office Button and then click
Excel Options (Fig.1).
1. Click Add-Ins (Fig.2), and then in the Manage
box, select Excel Add-Ins (Fig.3), (for Office 2003
and below, Add-ins and Data Analysis are on
Tools).
2. Click Go.
3. In the Add-Ins available box (Figure 4), select
the Analysis ToolPak check box, and then click
OK.
4. Tip: If Analysis ToolPak is not listed in the Add-
Ins box, click Browse to locate it.
Figure 1. Excel Options
Figure 2. Add-Ins Button
Figure 3.
Manage
Button
Figure 4.
The
Dialogue
Box of
Microsoft
Excel Add-
Ins
5. If you get prompted that the Analysis ToolPak
is not currently installed on your computer, click
YES to install it.

6. After you load the Analysis ToolPak, the Data


Analysis command is available in the Analysis
group on the Data Tab.

7. Now you are ready to process the data.


Type the following data in two columns, click on
Data then Data Analysis.
62 71 80 81 74 77 87 91
62 72 80 83 75 77 87 91
69 73 80 83 76 78 87 92
70 73 81 83 76 79 88 93
70 74 81 85 76 79 89 95

The dialogue box in Figure 5 will appear (For


Office 2003, data analysis is on Tools).
Figure 4. Dialogue Box of Data Analysis
In this dialogue box, choose Descriptive
Statistics. Another dialogue box will appear.
Figure 4. Dialogue Box of Descriptive Statistics
You will find Figure 6 (above) asking for Input
Range. Highlight all the data you have just typed
…and the range will automatically register in the
Input Range box. Check on Labels if you label. If
not just ignore it.
In the dialogue box check Summary statistics
then the Output Range.
Click your cursor inside the box of the inside the
box of the Output Range then select one cell near
your data your data where you want to put the
output.
Then click OK or Enter. The Microsoft Excel output
in Table 1 has been generated.
10 – Newton 10 – Einstein
Mean 75.65 Mean 83.35
Standard Error 1.56 Standard Error 1.61
Median 77.00 Median 83.00
Mode 80,81,83 Mode 76,87
Standard Deviation 6.98 Standard Deviation 7.18
Sample Variance 48.66 Sample Variance 51.61
Kurtosis -0.67 Kurtosis -1.71
Skewness -0.55 Skewness 0.16
Range 23.00 Range 21.00
Minimum 62.00 Minimum 74.00
Maximum 85.00 Maximum 95.00
Sum 1513.00 Sum 1667.00
Count 20.00 Count 20.00
Excel gives the following descriptive
statistics into the spreadsheet:
• Mean. A measure of the ‘average’ score in a
set of data. The mean is found by adding up all
the scores and dividing by the number of
scores.
• Standard error of the mean. The standard
deviation of sample means that indicates by
how much the sample means can be expected
to differ if other samples from the same
population are used.
• Median. If we order a set of data from lowest to
highest (or vice-versa) the median is the point
that divides the scores into two, with half the
scores below and half above the median.

• Mode. The score which has occurred the highest


number of times in a set of data.

• Standard Deviation. A measure of the standard


(‘average’) difference (deviation) of a score from
the mean in a set of scores. It is the square root
of the variance.
• Variance. A measure of how much a set of
scores vary from their mean value. It is the
square of the standard deviation.

• Kurtosis. Describes the peakedness or flatness


of the data set in the distribution as compared
to the normal distribution. A negative value
characterizes a relatively flat distribution,
while a positive value characterizes a relatively
peaked distribution.
• Skewness. Describes the asymmetry
(irregularity) of a distribution. Negative
skewness indicates that the longer tail extends
in the direction of low values in the
distribution. Positive skewness indicates that
the longer tail extends in the direction of high
values in the distribution.
• Range. The lowest value from the distribution
subtracted from the highest value.
• Minimum. The lowest value in the
distribution.
• Maximum. The highest value in the
distribution.
• Sum. The sum of the values in the distribution.
• Count. The number of observations in the
distribution.
Frequency Count and Percentage
• Frequency count is done by counting or
tallying the actual number of observation that
falls within a given class interval. The sum of
all frequencies is equal to the number of
sample size.
• Percentage is obtained by dividing a particular
frequency by the total number of sample size
multiplied by 100.
Measures of Central Tendency

• Single values that attempt to describe a data


set by identifying the central position within
the set of data.
• Also called measures of central location.
• Mean, median, and mode are all valid
measures of central tendency.
Mean (or Average)
• Most popular and commonly used measure of
central tendency.
• Can be used with both discrete and continuous
data (quantitative only).
• It is the most reliable since it takes into account
every item in the set of data.
• It is greatly affected by an outlier or extreme
values.
• It is used only if the data are interval or ratio and
when normally distributed.
• It lends itself to a higher statistical treatment.
Median
• The score or class in a distribution, below
which 50% of the score fall and above which
another 50% lie.
• Not affected by outlier or extreme scores.
• Used when the data are ordinal.
• It exists in both quantitative and qualitative
data.
Mode
• It is used when we want to find out the value
which occurs most often.
• It is a quick approximation of the average.
• It is an inspection average.
• The most unreliable since its value is
undefined in some data set.
• It exists both in quantitative and qualitative
data.
Example:
• Consider the following mathematics
proficiency test scores of 50 eighth grade
students in ABC National High School.
Male: 73, 65, 87, 90, 70, 77, 81, 69, 86, 89, 74, 93,
80, 83, 95, 72, 85, 75, 76, 84, 78, 83, 79, 81, 82.

Female: 78, 82, 65, 92, 71, 85, 66, 80, 72, 79, 81, 67,
80, 82, 85, 78, 66, 76, 77, 69, 73, 82, 81, 77, 79.
Questions:
1. What is the mean, median and mode of a)
males? b) females? and c) whole class’
scores?
2. Which do you think is the most appropriate
measures of central tendency in describing
the mathematics proficiency test scores of
eighth grade students? Justify your answer.
3. Based on the mean score, which group
performed better in the mathematics
proficiency test? Give some insights.
Answers:
1. Mean Median Mode
Male
Female
Class

2.
Answers:
1. Mean Median Mode
Male 80.28 81 81, 83
Female 76.92 78 82
Class 78.60 79 81

2. Mean is the most appropriate measures of


central tendency to use since the data set are
continuous and interval in nature.
3. Based on the mean scores of the two groups,
male performed better in mathematics
proficiency test. A difference of almost 4 points
in the test scores in favor of males signifies that
on average, male students are more proficient in
mathematics than their female counterparts.*
*insights may vary
Measures of Variability
• Also known as measures of spread or
measures of dispersion.
• Describe how spread out or scattered a set of
data are.
• Include but not limited to range, variance and
standard deviation.
Range
• The simplest measure of variability to
compute and understand since it is just the
difference between the highest and the lowest
scores in the distribution.
• However, it is unstable and unreliable because
it is based only on the most extreme scores in
the distribution and does not fully reflect the
pattern of variation within a distribution.
Variance
• Is a measure based on the squared deviations
of individual scores from the mean. It can be
calculated using the formula:
 ( x  x ) 2
( 2 ) 
N (Population) or
( x  x) 2
(s ) 
2
(Sample)
n 1
Standard Deviation
• It is the positive square root of the variance. It
is the mean or average of the deviations of
each score around the mean. It can be
calculated using the formula:
( x  x ) 2
 (Population) or
N
( x  x ) 2
(Sample)
s
n 1
• A small measure of variability would indicate
that the data are:
1. clustered closely around the mean;
2. more homogeneous;
3. less variable;
4. more consistent; and
5. more uniformly distributed.
• A large measure of variability would indicate
that the data are:
1. far away from the mean;
2. heterogeneous;
3. more variable;
4. less consistent; and
5. less uniformly distributed.
Example:
• Using the data of 50 eighth grade students in
mathematics proficiency test:
1. Calculate the range, variance and standard
deviation of males, females, and the whole
class’ scores.
2. Compare the variability of scores of male and
female students.
Answer:
1. Range Variance SD
Male
Female
Class

2.
Answer:

Range Variance SD
1.
Male 30 57.21 7.56
Female 27 46.49 6.82
Class 30 53.67 7.33
2. Females’ scores are closely clustered
around the mean. It can be inferred that
their scores are more homogeneous, less
variable, more consistent, and uniformly
distributed than their male counterparts.
Skewness (Sk)
Normal Distribution

• It is a symmetrical
distribution (one-half is
exactly the same as the
other half).
• When illustrated, it looks like Sk = 0
a bell. It uses a normal curve.
• The mean, the median and
the mode are equal.
Positively Skewed Distribution

• There are more low scores


than high scores.
• The test is very difficult so
that the class performed
poorly in it.
Low Scores High Scores
• The mean has the largest
value followed by the Sk > 0
median and the mode in
descending order.
Negatively Skewed Distribution
• There are more high scores
than low scores.
• The test is very easy so
that the class performed
very well in it.
• The mode has the largest
value followed by the Low Scores High Scores
median and the mean in
descending order. Sk < 0
Kurtosis (Ku)
• Tells whether the data are peaked or flat
relative to a normal distribution.
• Mesokurtic is a normal distribution. (Ku = 0)
• Leptokurtic is more peaked or taller than the
normal distribution. (Ku > 0) or +
• Platykurtic is flatter or shorter than the
normal distribution. (Ku < 0) or -
Homogeneous Normal Heterogeneous
(small SD) (large SD)
Example:
• Describe the following skewness and kurtosis
values:

Sk Ku
Male -0.004 -0.407
Female -0.152 -0.263
Class 0.010 -0.305
Possible Answers:
• Though both negative, male students’ scores have
larger skewness value than female students which
means that the former’s (males’) score distribution is
more negatively skewed than the latter’s (females’).
• Further, male students’ score distribution is more flat
(platykurtic) than females’ which indicates that males’
scores have wider range than females’.
• Generally, the class has almost normal score
distribution since its skewness value of 0.010 is very
close to zero.
Application: Likert - Type Question
Table 1. Students’ personal confidence in
learning Statistics
Verbal
Statement Mean SD Skewness
Description

1. I am sure that I can learn


4.14 0.76 -0.773 Confident
Statistics
2. I think I can handle difficult
3.69 0.83 -0.065 Confident
lessons in Statistics.
3. I can get good grades in
3.86 0.76 -0.678 Confident
Statistics.
Overall 3.90 0.81 -0.466 Confident
Application: Likert - Type Question

Table 1. Students’ personal confidence in learning Statistics


Descriptive
Statement Mean SD
Interpretation

1. I am sure that I can learn


4.14 0.76 Confident
Statistics
2. I think I can handle difficult
3.69 0.83 Confident
lessons in Statistics.
3. I can get good grades in
3.86 0.76 Confident
Statistics.
Overall 3.90 0.81 Confident
Legend:
4.50 - 5.00 - With High Confidence
3.50 - 4.49 - With Much Confidence
2.50 - 3.49 - With Confidence
1.50 - 2.49 - With Little Confidence
1.00 - 1.49 - With No Confidemce
Possible Findings:
Based on the results in Table 1, the students are
confident that they can learn Statistics; that they
can get good grades in the subject; and that
they can handle difficult lessons in Statistics
with a mean score of 4.14, 3,86, and 3.69,
respectively. Generally, the students are
confident in learning Statistics with an overall
mean score of 3.90.
cont.
The closer values of standard deviations
indicate that their responses in the three
statements are almost similar. Meanwhile, the
negative skewness coefficients denote that the
students have higher mean self-evaluation on
their confidence in learning statistics.
Possible Insights/Inferences
If this trend will continue, then all upcoming
students will be confident that they will learn
statistics. Likewise, they will find the subject
easy and interesting.
Use the following scale and descriptive interpretation
for Exercise 3 (Assignment)
Legend: (for Motivation, Perception and Challenges in
Doing Research)
5.50 – 6.00 – Very True of Me
4.50 - 5.00 – True of Me
3.50 - 4.49 – Mostly True of Me
2.50 - 3.49 – Sometimes True of Me
1.50 - 2.49 – Rarely True of Me
1.00 - 1.49 – Not True of Me
Legend: (for Self-Efficacy)
7.50 – 10.00 – High
4.49 – 7.49 – Average
1.00 – 4.49 – Low

You might also like