You are on page 1of 27

SW388R7

Data Analysis &


Computers II Assumption of normality
Slide 1

Assumption of normality

Transformations

Assumption of normality script

Practice problems
SW388R7

Assumption of Normality
Data Analysis &
Computers II

Slide 2

 Many of the statistical methods that we will apply


require the assumption that a variable or variables
are normally distributed.

 With multivariate statistics, the assumption is that


the combination of variables follows a multivariate
normal distribution.

 Since there is not a direct test for multivariate


normality, we generally test each variable
individually and assume that they are multivariate
normal if they are individually normal, though this is
not necessarily the case.
SW388R7

Evaluating normality
Data Analysis &
Computers II

Slide 3

 There are both graphical and statistical methods for


evaluating normality.

 Graphical methods include the histogram and


normality plot.

 Statistical methods include diagnostic hypothesis


tests for normality, and a rule of thumb that says a
variable is reasonably close to normal if its skewness
and kurtosis have values between –1.0 and +1.0.

 None of the methods is absolutely definitive.


SW388R7

Transformations
Data Analysis &
Computers II

Slide 4

 When a variable is not normally distributed, we can


create a transformed variable and test it for
normality. If the transformed variable is normally
distributed, we can substitute it in our analysis.

 Three common transformations are: the logarithmic


transformation, the square root transformation, and
the inverse transformation.

 All of these change the measuring scale on the


horizontal axis of a histogram to produce a
transformed variable that is mathematically
equivalent to the original variable.
SW388R7

When transformations do not work


Data Analysis &
Computers II

Slide 5

 When none of the transformations induces normality


in a variable, including that variable in the analysis
will reduce our effectiveness at identifying statistical
relationships, i.e. we lose power.

 We do have the option of changing the way the


information in the variable is represented, e.g.
substitute several dichotomous variables for a single
metric variable.
SW388R7

Problem 1
Data Analysis &
Computers II

Slide 6

In the dataset GSS2000.sav, is the following


statement true, false, or an incorrect application of a
statistic? Use 0.01 as the level of significance.

Based on a diagnostic hypothesis test of normality,


total hours spent on the Internet is normally
distributed.

1. True
2. True with caution
3. False
4. Incorrect application of a statistic
SW388R7

Computing “Explore” descriptive statistics


Data Analysis &
Computers II

Slide 7

To compute the statistics


needed for evaluating the
normality of a variable, select
the Explore… command from
the Descriptive Statistics
menu.
SW388R7

Adding the variable to be evaluated


Data Analysis &
Computers II

Slide 8

Second, click on right


arrow button to move
the highlighted variable
to the Dependent List.

First, click on the


variable to be included
in the analysis to
highlight it.
SW388R7

Selecting statistics to be computed


Data Analysis &
Computers II

Slide 9

To select the statistics for the


output, click on the
Statistics… command button.
SW388R7

Including descriptive statistics


Data Analysis &
Computers II

Slide 10

First, click on the


Descriptives checkbox
to select it. Clear the
other checkboxes.

Second, click on the


Continue button to
complete the request for
statistics.
SW388R7

Selecting charts for the output


Data Analysis &
Computers II

Slide 11

To select the diagnostic charts


for the output, click on the
Plots… command button.
SW388R7

Including diagnostic plots and statistics


Data Analysis &
Computers II

Slide 12

First, click on the


None option button
on the Boxplots panel
since boxplots are not
as helpful as other
charts in assessing
normality.

Finally, click on the


Continue button to
complete the request.

Second, click on the


Normality plots with tests Third, click on the Histogram
checkbox to include checkbox to include a
normality plots and the histogram in the output. You
hypothesis tests for may want to examine the
normality. stem-and-leaf plot as well,
though I find it less useful.
SW388R7

Completing the specifications for the analysis


Data Analysis &
Computers II

Slide 13

Click on the OK button to


complete the specifications
for the analysis and request
SPSS to produce the
output.
SW388R7

The histogram
Data Analysis &
Computers II

Slide 14

Histogram An initial impression of the


normality of the distribution
50
can be gained by examining
the histogram.

40 In this example, the


histogram shows a substantial
violation of normality caused
30 by a extremely large value in
the distribution.

20
Frequency

10
Std. Dev = 15.35
Mean = 10.7
0 N = 93.00
0.0 20.0 40.0 60.0 80.0 100.0
10.0 30.0 50.0 70.0 90.0

TOTAL TIME SPENT ON THE INTERNET


SW388R7

The normality plot


Data Analysis &
Computers II

Slide 15

Normal Q-Q Plot of TOTAL TIME SPENT ON THE INTERNET


3

The problem with the normality of this


variable’s distribution is reinforced by the
Expected Normal

-1
normality plot.

-2 If the variable were normally distributed,


the red dots would fit the green line very
closely. In this case, the red points in the
-3
upper right of the chart indicate the
-40 -20 0 20 40 60 80 100 120
severe skewing caused by the extremely
large data values.
Observed Value
SW388R7

The test of normality


Data Analysis &
Computers II

Slide 16

Tests of Normality
a
Kolmogorov-Smirnov Shapiro-Wilk
Statistic df Sig. Statistic df Sig.
TOTAL TIME SPENT
.246 93 .000 .606 93 .000
ON THE INTERNET
a. Lilliefors Significance Correction

Problem 1 asks about the results of the test of normality. Since the sample
size is larger than 50, we use the Kolmogorov-Smirnov test. If the sample
size were 50 or less, we would use the Shapiro-Wilk statistic instead.

The null hypothesis for the test of normality states that the actual
distribution of the variable is equal to the expected distribution, i.e., the
variable is normally distributed. Since the probability associated with the
test of normality is < 0.001 is less than or equal to the level of significance
(0.01), we reject the null hypothesis and conclude that total hours spent on
the Internet is not normally distributed. (Note: we report the probability as
<0.001 instead of .000 to be clear that the probability is not really zero.)

The answer to problem 1 is false.


SW388R7

The assumption of normality script


Data Analysis &
Computers II

Slide 17

An SPSS script to produce all


of the output that we have
produced manually is
available on the course web
site.

After downloading the script,


run it to test the assumption
of linearity.
Select Run Script…
from the Utilities
menu.
SW388R7

Selecting the assumption of normality script


Data Analysis &
Computers II

Slide 18

First, navigate to the folder containing your


scripts and highlight the
NormalityAssumptionAndTransformations.SBS
script.

Second, click on
the Run button to
activate the script.
SW388R7

Specifications for normality script


Data Analysis &
Computers II

Slide 19

First, move variables from


the list of variables in the
data set to the Variables to
Test list box.

The default output is to do all of the


transformations of the variable. To
exclude some transformations from the Third, click on the OK
calculations, clear the checkboxes. button to run the script.
SW388R7

The test of normality


Data Analysis &
Computers II

Slide 20

Tests of Normality
a
Kolmogorov-Smirnov Shapiro-Wilk
Statistic df Sig. Statistic df Sig.
TOTAL TIME SPENT
.246 93 .000 .606 93 .000
ON THE INTERNET
a. Lilliefors Significance Correction

The script produces the same output that we


computed manually, in this example, the tests
of normality.
SW388R7

Problem 2
Data Analysis &
Computers II

Slide 21

In the dataset GSS2000.sav, is the following


statement true, false, or an incorrect application of a
statistic?

Based on the rule of thumb for the allowable


magnitude of skewness and kurtosis, total hours
spent on the Internet is normally distributed.

1. True
2. True with caution
3. False
4. Incorrect application of a statistic
SW388R7

Table of descriptive statistics


Data Analysis &
Computers II

Slide 22

Descriptives

Statistic Std. Error


TOTAL TIME SPENT Mean 10.731 1.5918
ON THE INTERNET 95% Confidence Lower Bound 7.570
Interval for Mean Upper Bound
13.893

5% Trimmed Mean 8.295


Median 5.500
To answer problem Variance 235.655
2, we look at the Std. Deviation 15.3511
values for skewness
Minimum .2
and kurtosis in the
Maximum 102.0
Descriptives table.
Range 101.8
Interquartile Range 10.200
Skewness 3.532 .250
Kurtosis 15.614 .495

The skewness and kurtosis for the variable both exceed the rule of
thumb criteria of 1.0. The variable is not normally distributed.

The answer to problem 2 if false.


SW388R7

Problem 3
Data Analysis &
Computers II

Slide 23

In the dataset GSS2000.sav, is the following


statement true, false, or an incorrect application of a
statistic? Use 0.01 as the level of significance.
Based on a diagnostic hypothesis test of normality,
"total hours spent on the Internet" is not normally
distributed. A logarithmic transformation of "total
hours spent on the Internet" results in a variable that
is normally distributed.

1. True
2. True with caution
3. False
4. Incorrect application of a statistic
SW388R7

The test of normality


Data Analysis &
Computers II

Slide 24

Tests of Normality
a
Kolmogorov-Smirnov Shapiro-Wilk
Statistic df Sig. Statistic df Sig.
Logarithm of NETIME
.047 93 .200* .994 93 .951
[LG10(NETIME)]
Square Root of NETIME
.118 93 .003 .868 93 .000
[SQRT(NETIME)]
Inverse of NETIME
.288 93 .000 .495 93 .000
[1/(NETIME)]
*. This is a lower bound of the true significance.
a. Lilliefors Significance Correction
Problem 3 specifically asks about the results of the test of
normality for the logarithmic transformation. Since our sample
size is larger than 50, we use the Kolmogorov-Smirnov test.

The null hypothesis for the Kolmogorov-Smirnov test of


normality states that the actual distribution of the transformed
variable is equal to the expected distribution, i.e., the
transformed variable is normally distributed. Since the
probability associated with the test of normality (0.200) is
greater than the level of significance, we fail to reject the null
hypothesis and conclude that the logarithmic transformation of
total hours spent on the Internet is normally distributed.

The answer to problem 3 is true.


SW388R7

Other problems on assumption of normality


Data Analysis &
Computers II

Slide 25

 A problem may ask about the assumption of normality


for a nominal level variable. The answer will be “An
inappropriate application of a statistic” since there is
no expectation that a nominal variable be normal.

 A problem may ask about the assumption of normality


for an ordinal level variable. If the variable or
transformed variable is normal, the correct answer to
the question is “True with caution” since we may be
required to defend treating an ordinal variable as
metric.

 Questions will specify a level of significance to use and


the statistical evidence upon which you should base
your answer.
SW388R7
Data Analysis & Steps in answering questions about the
assumption of normality – question 1
Computers II

Slide 26

The following is a guide to the decision process for answering


problems about the normality of a variable:

Is the variable to be No Incorrect application


evaluated metric? of a statistic

Yes

Does the statistical No


evidence support False
normality assumption?

Yes

No
Are any of the metric True
variables ordinal level?

Yes

True with caution


SW388R7
Data Analysis & Steps in answering questions about the
assumption of normality – question 2
Computers II

Slide 27

The following is a guide to the decision process for answering


problems about the normality of a transformation:

Is the variable to be No Incorrect application


evaluated metric? of a statistic

Yes

Statistical evidence
No Statistical evidence No
supports normality?
for transformation False
supports normality?

Yes

No
Either variable
ordinal level? True

Yes

True with caution

You might also like