Professional Documents
Culture Documents
SPSS Tutorial PDF
SPSS Tutorial PDF
c h i l l i b r e e z e
p u b l i c a t i o n
c h i l l i b r e e z e
p u b l i c a t i o n
Contents
SECTION I
Back to Basic
A Review of the Statistics that you learnt (or did not learn?) in College.... 6
I. Research Design.............................................................................................................. 7
a. ANIMAL RESEARCH ............................................................................. 7
a) Selection of species...............................................................................7
b) Selection of controls..............................................................................7
c) Feeding of controls................................................................................7
d) Treatment of controls............................................................................8
e) Ethical guidelines..................................................................................8
b. HUMAN RESEARCH............................................................................... 8
a) Case histories ......................................................................................9
b) Descriptive studies ...............................................................................9
c) Prospective studies (cohort studies).......................................................9
d) Retrospective studies (case-control studies):........................................ 10
e) Retrospective-prospective study: a combination....................................12
f) Open trials:.........................................................................................12
g) Cross-over trials:.................................................................................12
h) Blind trials:.........................................................................................12
i) Double-blind cross-over interventional trial:...........................................13
j) Metabolic studies:................................................................................13
k) In vitro studies:...................................................................................13
II) Evaluation of Measuring Instruments................................................................. 14
a) Measuring Sensitivity and Specificity of a test................................... 14
b) Sampling and Sample Size................................................................. 15
a) Sampling procedures...........................................................................15
b) Sample Size........................................................................................15
(III) Some useful terms to know.................................................................................... 16
(IV) Types of Data and Appropriate Statistical Tests.............................................. 17
a) The t-test............................................................................................ 17
b) Analysis of Variance (AoV).................................................................. 19
c) Correlation..........................................................................................20
d) Regression.......................................................................................... 21
e) Chi-Square test...................................................................................22
f) McNemar test......................................................................................23
g) Sign test.............................................................................................24
h) Mann-Whitney U test...........................................................................25
h) Statistical Abuses...............................................................................26
j) Quiz to test yourself.............................................................................27
SECTION II
SPSS at Last
SPSS at last.......................................................................................................................... 32
Creating and Editing a Data File...........................................................33
I) Typical SPSS Session...................................................................................................... 35
(II) Creating a New Data File with the Data Editor.................................................. 38
(III) Loading an existing Data File into the Data Editor......................................... 42
(IV) Creating and Executing SPSS Commands....................................................... 44
k) The EXPLORE command.....................................................................44
l)The FREQUENCIES command...............................................................46
m) The DESCRIPTIVES command...........................................................49
n) The IF and COMPUTE commands........................................................ 51
o) The MEANS command........................................................................53
p) The T-TEST command.........................................................................54
q) The ONE-WAY ANALYSIS OF VARIANCE command............................59
r) Scattergrams and Regression...............................................................65
s) Multiple Regression.............................................................................69
t) CHI-SQUARE test using CROSSTABS................................................. 74
u) Selection of a Subset of Cases for Analysis..........................................78
v) The Nonparametric Tests.....................................................................79
w) Mann-Whitney U test...........................................................................80
x) Bivariate Correlations..........................................................................82
y) Survival...............................................................................................83
SPSS Syntax Windows...................................................................................................... 89
SECTION I
Back to Basic
Research Design
Evaluation of Measuring Instruments
Sampling and Sample Size
Mean, variance, Standard deviation, Degrees of freedom
T-test
Analysis of variance
Correlation and Regression
Chi-square test, Sign test, Man-Whitney U test etc
Statistical abuses
I. Research Design
In order to get good data, you need good research design skills. Even if you are
reading someone elses research, to understand it better, it helps to know something
about correct and incorrect research design. There are lots of ways to design a research
project and not all of it is documented. Here, we will describe some of the more common
methods plus will give you a few tips for your research design too.
a. ANIMAL RESEARCH
Most biomedical research is first conducted using animal models. Much of this
initial animal research cannot be performed in humans due to ethical, cost and/or time
considerations.
a) Selection of species
The particular species to be used as a model is chosen for one/more of the following
reasons:
1.
2.
Small size for ease and economy in housing, feeding, manipulation, etc.
3.
Relatively short life span to allow for life-time studies, studies over more than
one generation, etc.
4.
b) Selection of controls
This is one of the most important considerations in the design of an animal experiment.
Let us say the investigator is performing a study to determine the deficiency of a
particular nutrient on some measurable parameter. Animals in the experimental group
would be fed a diet containing required amounts of all known nutrients except the
nutrient under study (or the diet would be low in the nutrient). The control group would
receive a normal (same in all respects but with the nutrient present) diet.
The controls need to be similar to the experimental animals in all respects like
weight, age, genetic strain, etc. In fact, a group of animals need to be randomly divided
into experimental animals and controls. To avoid bias, someone other than the principal
investigator can perform the actual separation.
c) Feeding of controls
Some animals will consume less of an incomplete diet than they will of a nutritionally
complete diet. If the experiment is run as designed above, the investigator may get false
results that show that deficiency of the nutrient influences the variable being measured.
In reality, the variable may have changed because of the low quantity of food consumed
by the experimental animals. To avoid this, it may be best to have a second control
CHILLIBREEZE PUBLICATIONS | http://www.chillibreeze.com
group that is pair-fed to the experimental group. The amount of food that is consumed
by the experimental group is measured and then that amount of the control diet is fed
to the pair-fed controls. In another approach, the mean intake of food eaten by the
experimental animals is calculated each day and all of the control animals are fed this
amount the following day.
d) Treatment of controls
Do unto your controls as you would do unto your experimental animals. Basically,
try and eliminate other factors which might be different between experimental and
control animals. If some procedure like an injection or a surgery is performed on the
experimental animals, the controls too should receive a sham procedure. Other variables
to consider in animal experiments are: water consumption, non-dietary sources of
nutrients (e.g. zinc from nibbling on cage bars), time of day procedures are done,
amount of space allocated to the animals, amount of exercise, location within the cage,
number of animals per cage, nature of animals in neighboring cages, etc. Also, there
should be concern for diseases that the animals might transmit to each other or to the
investigator and for diseases that the investigator might transmit to the animals.
e) Ethical guidelines
Ethical guidelines should be followed meticulously to ensure that research is done
in a humane fashion. Animals and their cages should be kept clean, temperature
should be correct, undue and unnecessary distress should be avoided, etc. Usually,
there are committees on campuses that oversee the treatment of animals. Also, your
university is bound to have courses that you need to take before you take on an animal
experiment.
b. HUMAN RESEARCH
In human research, there are a lot of other guidelines and considerations. One can
divide the types of research design into
1. OBSERVATIONAL STUDIES where the investigator does not alter the natural
occurrence of events but records them and formulates hypotheses and/or conclusions
about what he/she observes. Observational studies are of several types including:
Case histories
Descriptive studies
Prospective studies (Cohort studies)
Retrospective studies (Case-control studies)
2. INTERVENTIONAL STUDIES As opposed to the passive role of the investigator
in observational studies, the researcher takes an active part in these studies. In
interventional studies, the subjects are exposed to (or denied exposure to) a factor or
method of treatment and followed over time to determine the outcome. Individuals may
serve as their own controls or separate groups of control individuals may be used. These
kind of studies have several research design methods:
Open trials
Cross-over trials
Blind trials
Double-blind cross-over trials
Metabolic studies
In vitro studies
Let us have a closer look at these different kinds of research designs and what
information each can provide us with. This is important for statistical analysis because,
in order to interpret your data, you need to know the usefulness and limitations of each
kind of study and how to classify any particular study. Lets say you have results of a
Retrospective study and you ask the statistical program to compute Absolute Risk, this
would be meaningless because Retrospective studies can only measure Relative Risk.
a) Case histories
These studies are often referred to as anecdotal evidence. They are widely used as
testimonials in advertising. In science, they may not be of much value as data, but they
do provide an insight into areas of possible further research. But case histories cannot
give definitive evidence that a certain factor is causal for a certain disease or tat a certain
treatment is effective. Many journals separate these articles into a different section of
each issue. These studies serve as a method of rapid communication of clinical findings
and hypotheses to the scientific community and help generate new leads for future
research.
Example:
In a recent report, a physician described a case-history in which a patient with
Common cold, experienced complete remission of symptoms following one week of
supplementation with Vitamin C. Let us examine this closely.
Does this report demonstrate that most common cold patients can be treated
successfully with Vitamin C? No. The study has been performed on a single
patient and the results cannot be extrapolated to a population.
Does this report show that this particular patient was cured by Vitamin C?
No/Maybe. The patient may have been cured due to other factors/drugs/
his own immunity/other remedies etc. There was no control for this case.
There is no way of knowing if the patient would have been cured without
intervention.
b) Descriptive studies
These are often large population studies in which data on lots of different variables
is collected. It is somewhat like a census. Statistical analyses on the data collected may
show various relationships that lead to hypotheses for further study. They may also
provide estimates of the magnitude of a particular problem and the frequency of certain
behaviors among the population. Sometimes they may also generate meaningless
correlations (example: most alcoholics in a particular area send their children to
private schools) that need to be neglected. Descriptive studies are also referred to as
epidemiological studies or surveys.
c) Prospective studies (cohort studies)
A prospective study consists of two samples, one of which has been exposed to the
suspected risk factor (a + b) and one not exposed (c + d). The two samples are followed
through time to determine which group has the higher incidence (or cause-specific
mortality). In other words, a prospective study compares the absolute risk (of illness or
CHILLIBREEZE PUBLICATIONS | http://www.chillibreeze.com
death) of those exposed with the absolute risk of those not exposed. An example of a
table of data from this type of study is shown below.
Disease
No Disease
Total
Exposed
a+b
Not Exposed
c+d
Total
Smoker*
227
99,773
100,000
Nonsmoker
99,993
100,000
cases and the group without the disease is the group of controls. The typical way of
expressing data from such a study would be as below
Disease (cases)
No Disease (controls)
Exposed
Not Exposed
Total
a+c
c+d
1.
Cases and controls are compared with respect to the proportions that have
been exposed to the risk factor. The proportion of cases exposed (a / a +
c) is compared to the proportion of controls exposed (b / b + d) i.e. rate of
exposure in diseased/rate of exposure in the non-diseased.
2.
If, in the population at large, the number of persons who have the disease
is quite small compared to the non-diseased population, Relative Risk may
be estimated by using the odds ratio or cross-product ratio or relative
odds
ad
_____
bc
****** Neither absolute risk (incidence) nor attributable risk can be inferred from a
retrospective study without reference to outside information.
Then, you may wonder, why are these tests done at all?
The reasons may be:
1.
2.
3.
4.
5.
These studies are less unethical because we are not denying/ causing
exposure to any suspected curative/ causative agent.
6.
Totals
Smoker
464
167
631
Nonsmoker
36
333
369
Totals
500
500
1000
11
g) Cross-over trials:
In cross-over trials, two groups of subjects are studied. One group receives the
active treatment for an initial period of time while the second receives a placebo. At
the end of the designated period, the first group is switched to placebo and the second
to the active treatment. This design is used to help eliminate effects caused just due
to participation in an experiment and those due to seasonal variations in the variables
being measured.
h) Blind trials:
Here, some of all of the participants and/ or researchers are prevented from knowing
the identity of the group receiving the treatment until the conclusion of the experiment.
In a single-blind trial, the subjects are denied knowledge of whether they are receiving
the active treatment or placebo but the researchers are aware of the identities of the
experimental groups and controls. In a double-blind trial, neither the subjects nor the
researchers know which individuals receive the active treatment or placebo until the
experiment is over and the data has been collected. The latter experimental design helps
12
to control for the placebo effect as well as to control for experimental bias.
You may ask- Why should the investigator be blinded? The answer remains the sameTo avoid bias. Lets say the investigator gets a good result from the diabetes experiment
(the herb reduces fasting blood sugar); he will not repeat the experiment because the
good result is in the experimental group, which strengthens his hypothesis. If the
same result is obtained in the control group (the placebo reduces fasting blood sugar),
he will definitely repeat the experiment thinking that it is an error. So the treatment
of experimental groups and controls may not be exactly the same. To avoid this, the
investigators unbiased colleague does the labeling, feeding and keeping of records.
Example: Let us say the investigator wants to perform the same experiment- test if
the herb extract influences fasting blood sugar in diabetic women. A group of diabetic
women are randomly divided into two groups of equal size. All relevant characteristics
are similar in both groups (weight, height, age, blood sugar levels, diet etc.) A capsule
is developed for the experiment- one with the extract and one with the placebo. The two
capsules are indistinguishable in appearance. One group receives the extract capsule
for 10 weeks and the other receives the placebo capsule for 10 weeks. The distribution
is done by an individual not directly related to the project. After the final blood samples
have been analyzed and decisions made as to who improved and who did not, the code
is revealed to the investigator as to which group received the extract.
13
True outcome
Test A results
+ve
-ve
total
+ve
530 (a)
165 (b)
695
-ve
5 (c)
300 (d)
305
total
535 (a + c)
465 (b + d)
1000
Test A results
a = true positives
Sensitivity = (a / a + c) x 100 = 530/535 x 100 = 99.07 %
b = false positives
c = false negatives
d = true negatives Specificity = ( d / b + d) x 100 = 300/465 x 100 = 64.52 %
14
Simple Random Sample: This is something like pulling out names from a
hat. It may involve the use of a computer and a random number generator.
A complete randomized list is necessary.
2.
3.
b) Sample Size
The sample size in an experiment is an important feature in its design and the
interpretation of its results. If the sample size if too small, it may be tough or impossible
to find statistically significant results. But that does not automatically imply that the
larger the sample size, the better our experiment. If the sample size is too large, trivial
differences will have very statistically significant values and we will be easily impressed
with the findings. E.g. If we want to determine the differences in mathematical skills in
boys and girls, and we use 100, 000 boys and 100, 000 girls for this study, we might end
up with a statistically significant result even if the actual difference may be unimportant.
Hence, the number of subjects used for a study needs to be chosen very carefully.
There are several methods and formulae for selection of appropriate sample size and
these must be used before we begin an experiment.
15
between variables.
Standard Deviation: a measure of variability. It is equal to the square root of the
variance.
Standard Error: a special measure of variability inappropriately used by many researchers
16
-
-
-
-
t-test
Analysis of variance
Correlation
Regression
-
-
-
-
Chi-Square test
McNemar test
Sign test
Mann-Whitney U test
a) The t-test
Many experiments seek to compare data collected from 2 groups of subjects or
animals (an experimental group and a control group). If the data collected are for
a continuous variable, the best test to use would be the t-test or the Students ttest (Student was a pseudonym used by W. S. Gosset in his statistical writings). The
t distribution was developed to overcome a major problem with using the normal
distribution for hypothesis testing. The normal distribution and tests of hypothesis
related to it, require that we have data from every member of the population to calculate
the mean () and the variance (s2). We rarely have this information. Usually we must
rely on data from samples of individuals drawn from a population. The t distribution is
used in describing samples and testing hypotheses related to them. (Actually there are
an infinite number of t distributions, each one determined by the degrees of freedom
of s2. When the degrees of freedom (n 1) reaches infinity, the t distribution is same
as normal distribution.)
Use of a t-test requires that the following assumptions are true:
1. The sample is randomly selected.
2. The sample is drawn from an underlying normal population.
17
We
1.
2.
3.
The hypothesis being tested by a t-test is that two samples are equal
Ho: x1 x2 = 0
The alternate hypothesis is that the means are not equal.
Different research situations call for slightly different versions of the formula for
calculation of the t-test. If the variance of the two samples is equal, a pooled-variance
t-test is conducted. E.g. If we are studying blood sugar of males and females, we expect
the variance to be similar for the two groups.
The formula for pooled-variance t-test is:
x1 x2
t = --------------------------
sp
______________________
(n1 1)s12 + (n2 1)s22
where sp = ------------------------------------------n1 + n2 -2
t=
x1 x2
---------------------------------_____________
s12 + s22
------ n1
n2
[(s12/n1) + (s22/n2)]2
-------------------------------------------[(s12/n1)2/ (n1 + 1)] + [(s22/n2)2/ (n2 + 1)]
After selecting the appropriate formula, we calculate the t-value using the data from
our experiment and then look up the critical of t in a table. If our t-value exceeds the
critical value of t at our degrees of freedom, then we must reject the null hypothesis
and infer that the alternate hypothesis is true (The means are significantly different).
Example:
Experimental group
100 gms
Control group
88 gms
Standard deviation
Sample size
3
25
3
25
Our variances are equal (3 squared = 9) therefore we will use the pooled-variance
t-test formula.
100 88
12
t = ----------------------- = ----------------- = 14.142
___________
_____
3 1/25 + 1/25
3 0.08
From a table of t-values we find that with 48 degrees of freedom, our t-value exceeds
the critical value of t at the 0.001 level. Therefore we report that the protein supplement
had a significant effect on weight gain.
Paired t-test: This is a special t-test to use for research designs in which the data are
from paired samples e.g. if we collect data from the same individual before and after a
particular treatment. Here, everyone is his own control. We can calculate a mean change
or mean difference from our experiment and similarly a variance of the differences. Then
we can use the paired t-test formula:
t=
x d
-------------------_____
sd nd
Note about statistical tables: The tables for looking up p-values are readily available
online and also in online calculators (the simplest method according to us). Just look for
p-value calculators. And one more thing- Once you start using SPSS for your analysis,
SPSS will look up the p-values for you. So do not worry about the tables. This is true
for all the tests that follow including Analysis of variance, Sign test etc. etc.
b) Analysis of Variance (AoV)
It would be erroneous to use the t-test when we have more than 2 groups. In this case,
we need to use another test. The most common test for analyzing more than 2 groups
is called Analysis of variance. Analysis of variance uses the F-distribution which, like
the t-distribution is a modification of normal distribution. If we are only interested in the
differences between group means, then we would use a one-way analysis of variance.
If we are also interested in testing for differences within groups, we would need to use
a two-way analysis of variance (not very commonly used in biotechnological research).
If we are to calculate a one-way AoV, we need the following information.
GT = the sum of all observations (to get this, just add all data points)
Ti = the sum of the observations in the i-th sample (In a given group, what is the
total sum)
ni = the number of observations in the i-th sample
N = total number of observations
(Ti) 2
------ = the sum of the square of each samples total divided by its sample size
(for
CHILLIBREEZE PUBLICATIONS | http://www.chillibreeze.com
19
ni
every group)
r=
20
xy (x) (y)/n
______________________________________
_____________________________
(x2 (x)2 / n) (y2 - (y)2/ n)
CHILLIBREEZE PUBLICATIONS | http://www.chillibreeze.com
height (in)
weight (lb)
1
2
3
4
5
6
7
8
9
10
64
65
71
67
63
62
67
64
65
64
130
148
180
175
120
127
141
118
120
119
(d.f =
The correlation coefficient can be tested for statistical significance using a table of r
values or by using a special t-test. Either way, it is necessary to calculate the d.f. (degrees
of freedom) which for this calculation (correlation coefficient) is equal to n-2. The special
t-test is calculated using the following formula. Here we ignore the sign of r.
______________
t =
r (n-2) / (1 r2)
Using the above data we would get:
______________
t =
0.8353
(8) / (0.3023)
=
4.297 with 8 degrees of freedom.
From a t-table, we would find that p< 0.01.
This means that there is a significant correlation between the variables.
The correlation coefficient is a measure of the strength of the association between
2 variables on a scale of 0 to 1.0. A better way to interpret the association is to square
the coefficient. Then r2 will measure the proportion of the variation in the two variables
that is common to both variables, i.e. what percentage of the variation in one variable
can be predicted by the other.
In this case:
r2 = 0.8353 2
= 0.6977
or in other words: 69.77 % of the variation in weight can be explained by height.
d) Regression
Another related way of looking at the relationship between two quantitative variables
is regression analysis or regression. In regression analysis, one of the variables is logically
dependent on (or influenced by) the other variable. For example, heart rate might be
influenced by age. In the jargon of regression analysis, heart rate would be a dependent
CHILLIBREEZE PUBLICATIONS | http://www.chillibreeze.com
21
variable and age would be the independent variable, since it is not logically influenced
by heart rate.
By convention, in regression analysis, the y-axis of a plot is used for the dependent
variable and the x-axis for the independent variable. As in correlation analysis, we can
use a scatter plot, but in addition, regression analysis will provide the best fitting straight
line through our plot. To describe the line, we must have two pieces of information: the
slope of the line, b and the y-intercept of the line, a. The slope measures how much the
y- variable changes for each unit of change in the x-variable (+ means the line rises and
means the line falls). The y-intercept tells us where the line starts (i.e. the value of y
when x = 0). Once we have this information we can use the formula for a straight line:
y = a + bx
The slope of the line can be calculated by the following formula:
xy (x) (y) / n
b=
______________________________
x2 (x)2 / n
The table with data (that we used for the correlation coefficient) can be used here,
but is not a real example for regression analysis. Using the table gives us:
B = 448.4/59.6 = 7.52 (means that for every unit of x increase y will increase 7.52
units).
The y-intercept can be calculated since logic dictates that the best fitting line must
pass through the pint at the mean of x, mean of y. And since:
y = a + bx
Or equivalently
then: y = a + b x
a = y - b x
All that is required is that we calculate x and y to be able to obtain a. Using the
data table, the means of y and x are 137.8 and 65.2 respectively. So:
a = 137.8 (7.52 x 65.2)
= -352.73
This type of calculation can be very useful in the laboratory. For example, if we
know the concentration of a standard protein and we measure absorbance of unknown
samples, using regression, we can calculate the unknown concentrations of the protein
samples. This is a very commonly used calculation.
e) Chi-Square test
For research situations in which frequency and/or categorical data are collected, a
different set of statistical procedures must be used. One of the most common of these
procedures is called the Chi-square test. This test can handle any number of groups
and any number of possible outcomes. For simplicity, we will look at a test with a 2 x
2 situation (two groups and two possible outcomes). The formula for calculating Chisquare is as follows:
__
__
2
( O E)
2 =
----------------
22
Where:
O = the observed frequency
E = the expected frequency = (Tr x Tc ) / N
Tr = row total
Tc = column total
N = total number of observations
d.f. = (r-1) (c-1)
r = number or rows and c = number of columns
Example :
10
40
50
90
80
170
Total
100
120
220
Calculations:
O
10
90
40
80
E
22.73
77.27
27.27
92.73
O-E
-12.73
12.73
12.73
-12.73
(O-E) 2
162.0529
162.0529
162.0529
162.0529
Chi-Square value
(O-E) 2/ E
7.1295
2.0972
5.9425
1.7476
________
16.9168
We then compare this value (16.2) to the critical value of Chi-square in the Chisquare table and make our decision about our hypothesis. Here p < 0.005. So the new
drug causes low birth weight in rats compared to the placebo.
One problem with Chi-square test: Here if E < 5.0 then there is distortion in data.
When we do (O-E), the gap between the points decreases as the numbers get smaller.
This can be prevented by using Yates correction i.e. (| O E | - 0.5 ) 2 is used as
numerator in the formula in such cases. (Do not worry about this at all. SPSS does this
automatically. We put in this information just to show off a little bit.)
f) McNemar test
In case of the Chi-square test, one prerequisite is that the groups are independent
and not related, matched or paired. For experiments in which paired samples or before
and after experiments, the more appropriate test would be the McNemar test. It is a
modified type of Chi-square test. The data are arranged as follows:
AFTER
BEFORE
23
In this test, we are only concerned with the observations that changed during the
experiment i.e. cells A and D of the table. Since A + D represents the total number of
observations that changed (A + D) would be expected to change in one direction
and (A + D) in the other direction if our experiment had no effect on outcome (If null
hypothesis is true, A and D would be the same).
(A D) 2
=
--------(A + D)
Example: Suppose we want to test whether or not the presence of a particular object
in the room influenced the biting behavior of dogs caged in the room. To conduct our
experiment, we would need a room full of caged dogs, the suspected object and an
artificial hand for the dogs to bite. The hand could be placed in the cage of each dog in
the absence and presence of the suspected object. If the hand is bitten, we would record
a +ve response and if not bitten, a ve response. Our data can be shown as follows:
2
BEFORE
2
10
50
+
-
AFTER
+
20
120
= (110) 2 / 130
= 93.08
The interpretation is similar to the Chi-square test. So the null hypothesis can be
rejected.
g) Sign test
Occasionally, in some research designs, quantitative data are impossible or
impractical, but it may be possible to rank with respect to each other, the two members
of a pair. The sign test is applicable to a research design of 2 related samples when the
experimenter wishes to establish that two conditions are different. E.g. a skin rash can
only be classified as mild, moderate and severe. Or the degrees of pain can be classified
by the patient as same, improved or worsened and so on. The only underlying assumption
of this test is that the variable under consideration has a continuous distribution.
In this test, we assign a plus (+) or a minus (-) sign to each pair for the variable of
interest. If the experimental conditions have made no difference, then we would expect
an equal number of pluses and minuses. (If the members of a pair are not different they
can be dropped from the analysis.) We need to only determine N (number of pairs) and
the x (number of fewer signs). And compare to the table for sign test.
Example:
Suppose we were attempting to determine the effect of low iron intake on taste acuity
for sourness. After eight weeks on a low iron diet, a group of 17 volunteers were asked
to taste two sour solutions ( one 1 % citric acid and the other 0.5 % citric acid) and
compare if the first was sourer (+), same (0) or less sour (-) than the second.
24
Subject #
1
2
3
4
5
6
7
8
9
Result
0
+
0
+
Subject #
10
11
12
13
14
15
16
17
Result
+
0
-
The effect of zinc deficiency on learning was studied on rats. Ten control rats are fed
a nutritionally complete diet while experimental rats are fed a low zinc diet for 10 weeks.
Initially, all rats had been trained to imitate a leader rat in a T maze. The experimenter
records the number of trials each rat requires to reach a criterion of 10 (in a row) correct
imitations in 10 trials. The more the trials required, the more the memory affected. The
data are presented below:
Control rats
Experimental rats
78
64
75
45
82
77
62
76
48
90
110
70
53
51
93
68
57
54
25
Group C C E E E E C C E E C C C C C C E E
Score 45 48 51 53 54 57 62 64 68 70 75 76 77 78 82 90 93 110
We determine U by counting the number of C scores preceding each E score.
U = 2 + 2 + 2 + 2 + 4 + 4 + 10 + 10 = 36
From the table for Mann-Whitney U test, we see that to achieve the 0.05 level of
significance, U value must be equal or less than 17. Our U value is 36. So we infer that
zinc deficiency did not affect learning in this experiment.
h) Statistical Abuses
Scientific research abounds with misapplication of statistical methods. Many a
time, reviewers detect these errors and investigators have to repeat their statistical
analysis, causing a lot of expenditure of time and effort. But sometimes, some errors of
application go unnoticed and end up being published. This is rarely due to an intention
to mislead; rather it is often the result of insufficient deliberation and study of the
particular experimental problem. Its probably right to say that the majority of problems
and difficulties in handling ones experimental data are caused by haste and consequent
superficiality if not errors on the part of the investigator. The researcher who rushes
into an experiment, hurries the data collection and rushes the publication of his results
runs the risk of wasting his entire effort just to save a little time in the beginning. It
has been said that if you really want to mess things up, use a computer. Indeed, the
widespread use of computerized statistical software packages lead to misapplications
and misinterpretations of results.
The major areas of errors include:
1. Sample Size selection
2. Inappropriate Statistical tests
3. Inappropriate display of results
Sample size selection
Very small samples are unlikely to give good estimates of true population values.
One unusual animal or subject can have a big effect on the outcome. The findings of
such studies are viewed by skepticism by most scientists. Would you use a new drug
if it were launched after testing it in 2 rats? Very small samples make it tough to find
statistical significance even if there may be biological significance.
Very large samples also have their own problems. If the sample is sufficiently large,
any numerical difference between groups can be shown to be statistically significant,
whether there is any actual biological significance or not.
Inappropriate statistical tests
These are difficult to spot without a certain amount of statistical expertise. Most
common of these would be the use of t-tests when there are more than two groups under
study, using tests meant for continuous data on discrete data and false assumptions
about the independence of samples and homogeneity of variances.
Examples:
Example 1:
t-tests are designed for no more than 2 samples. If we have an experiment in which
there are five groups of subjects (Groups A-E) we might wish to make the following
comparisons:
A vs B, A vs C, A vs D, A vs E, B vs C, B vs D, B vs E, C vs E and D vs E
26
Sometimes, literature may have findings in which t-tests were used in the analysis of
categorical or frequency data. These findings are meaningless if based on the outcome
of the wrong statistical procedure. If we use 1s to code for males and 2s to code for
females, what does it mean if we get a mean of 1.37? Absolutely nothing.
Example 3:
One of the most common errors will be when the researcher uses a statistical procedure
that makes a waste of his data. Conversion of measurement data to ranks or categories
or percentages may allow us to use a statistical procedure that is arithmetically easier
to compute but makes a much weaker statement about our findings.
Inappropriate display of results obtained
Scientific literature and scientific meetings (more so) are filled with examples of the
misuse of data representation techniques. These are also extremely common on television
and in presentations before government committees. Many of them are conscious
attempts to mislead. There are specific ways to present data and misrepresentation
may make data seem more significant than it really is.
In the above graph, A is the mean heart rate of 30 females and B is the mean heart
rate of 30 males. A and B look significantly different from each other, in the graph. This
is because the graph is plotted from 60 to 76. If the graph had been plotted from 0 to
100, the difference between A and B would not seem so high. In reality, the difference
between A and B is not significant. But if we were unethical and wanted to present this
data to an audience to convince them that the heart rate of males was much higher than
females, we could easily represent the data in this manner and achieve our end. This is
extremely misleading and an incorrect way of representing statistical results.
j) Quiz to test yourself
Have a look at the questions/problems below. You will be able to solve them yourself.
Our tutorial has not provided solutions. If you find it impossible to get the solution for
any particular problem, feel free to contact us anytime.
I) Name the type of association between the risk factor and disease in each of the
examples below:
A.
It has been observed that people who have low cholesterol are more likely to
be interested in travel than individuals who do not. What is the most likely
type of association to explain this finding?
27
B.
C.
II) Give the appropriate statistical test for each of the following research designs
A.
Eight groups of mice (n = 40) were fed either a complete normal diet (Group
I or controls) or diets restricted in protein as follows: Group II = 10 % less
protein, Group III = 25 % less protein, Group IV = 20 % less protein, and
Group V = 30 % less protein. Mean weight gains were compared after feeding
these diets for 12 weeks. Investigators compared each mean to the control
group.
B.
C.
D.
E.
A group of rats was split into a control group and experimental group. The
experimental group received three weeks of zinc deficient diet while the
control group received an adequate diet. At the end of the experiment, the
rats were scored as to their ability to solve a maze problem. The investigators
wished to determine if the experimental groups have lower ranking scores
than the controls.
F.
interviewed to determine their history of coffee consumption. The results are presented
below:
Breast cancer cases
Healthy controls
Total
350
375
725
150
125
275
Total
500
500
1000
28
What is the absolute risk of having breast cancer among women drinking more than
2 cups of coffee per day?
IV) Your project for your grant proposal will examine the effect of various diets on
the prevention of protein cross linkage in rats treated with the pro-oxidant doxorubicin.
You intend to compare three different antioxidants as dietary supplements. You feed four
groups of rats a semi-purified diet which contains either no antioxidant (control group)
or one of the antioxidants under study. You need to now tell your professor how many
animals to buy. How do you determine the sample size for each group of rats?
V) For each of the following, state the type of study design:
A.
A national survey indicated that high fiber diets were associated with low
incidence of stroke.
B.
C.
D.
The association between high fiber diets and colon cancer was studied in a
group of 5000 vegetarians and a group of 4000 individuals who ate mostly
beef and did not have high fiber diets. The two groups were followed for a
period of 10 years. At the end of the period incidences of colon cancer were
compared.
VI) An investigator developed a new assay for determination of sickle cell anemia.
The table below displays the results of an experiment. (A positive test indicates a
sickle cell patient)
Traditional Methodology
New test
Sickle cell
Normal
Total
Sickle cell
91
12
103
Normal
38
47
Total
100
50
150
29
c.
d.
e.
103/150 = 68.7 %
38/50 = 76 %
47/150 = 31.3 %
As we have said before, if you get stuck on any of the above problems, do ask us
for the solution.
30
SECTION II
SPSS at Last
SPSS at last
Now that we have given you a brief (was it?) introduction to the basics of statistics,
we think that you will be able to understand SPSS much better and faster and will not
struggle with every tiny step of the tutorial that SPSS offers with its package. In this
SPSS tutorial, we shall be covering the basics of how to use SPSS. Students usually
use SPSS for their classes and more importantly for their research analysis. Let other
people talk about how tough it is to work on SPSS. We will show you how simple it is.
The versions of SPSS may keep changing. Do not panic and go to the store to buy the
latest version every time it is launched. The basic features have stayed the same since
a very long time just like any other software program (Think about it- Do you see any
change in MS Word in the past five years?)
Now straight to the tutorial. When you open SPSS (obviously by double-clicking on
the icon) you will see this window.
If you want to run the tutorial, you know what to do. You can access the tutorial
anytime via the Help pull down menu on the data editor.
32
We suggest you take the tutorial at some point of time, especially if you need
information about every teeny-weeny aspect of every icon. As for now, you can take this
tutorial and start actually using SPSS. After all that is what you bought this for, right?
In order to use SPSS for statistical analysis, you must first have a file containing data
to be analyzed.
You can create your data file using the Data Editor of SPSS. This may be the
easiest way to work with SPSS. (We will tell you how to go about this).
You can use a spreadsheet software package such as MS Excel. You might
want to do this because lets say you already have a lot of data on Excel and
want to transfer it to SPSS you would not want to type everything all over
again. You would then need to follow the instructions associated with that
software package. WARNING: This is not a simple Copy and Paste procedure.
Be sure to save your data file as a tab-delimited text file. Otherwise, this
process itself will become your greatest headache. (You can specify your
variable names on the first line of your Excel spreadsheet and load them
directly into SPSS when you Open the file. You can get more instructions
regarding this in the SPSS Help section).
33
You can enter your data into a command file between the BEGIN DATA
and END DATA commands.
NOTE: This should be done only for very small data sets.
Regardless of how you create your data file, the first step is to determine what the data
are and how you plan to organize the whole thing. It is recommended that you use fixedfield format for your data until you become an SPSS-pro. You dont have to worry about
this. SPSS itself uses a fixed-format system for your data by default. Free-field format
may seem tempting, but before you step into that arena, remember that- it is difficult to
edit if your files are large and a lot of steps and additional typing will be required if you
have missing data. (Any real experiment is bound to have some missing data).
Once you have created a data file you can start performing statistical analysis on
your data using SPSS.
34
35
If you want to run the tutorial, you know what to do. You can access the tutorial
anytime via the Help pull down menu on the data editor.
We suggest you take the tutorial at some point of time, especially if you need
information about every teeny-weeny aspect of every icon. As for now, you can take this
tutorial and start actually using SPSS. After all that is what you bought this for, right?
In order to use SPSS for statistical analysis, you must first have a file containing data
to be analyzed.
Creating and Editing a Data File
36
You can create your data file using the Data Editor of SPSS. This may be the
easiest way to work with SPSS. (We will tell you how to go about this).
You can use a spreadsheet software package such as MS Excel. You might
want to do this because lets say you already have a lot of data on Excel and
want to transfer it to SPSS you would not want to type everything all over
again. You would then need to follow the instructions associated with that
software package. WARNING: This is not a simple Copy and Paste procedure.
Be sure to save your data file as a tab-delimited text file. Otherwise, this
process itself will become your greatest headache. (You can specify your
variable names on the first line of your Excel spreadsheet and load them
directly into SPSS when you Open the file. You can get more instructions
regarding this in the SPSS Help section).
You can enter your data into a command file between the BEGIN DATA
and END DATA commands.
CHILLIBREEZE PUBLICATIONS | http://www.chillibreeze.com
NOTE: This should be done only for very small data sets.
Regardless of how you create your data file, the first step is to determine what the data
are and how you plan to organize the whole thing. It is recommended that you use fixedfield format for your data until you become an SPSS-pro. You dont have to worry about
this. SPSS itself uses a fixed-format system for your data by default. Free-field format
may seem tempting, but before you step into that arena, remember that- it is difficult to
edit if your files are large and a lot of steps and additional typing will be required if you
have missing data. (Any real experiment is bound to have some missing data).
Once you have created a data file you can start performing statistical analysis on
your data using SPSS.
Step 1 : Open SPSS. The program will load and the Data entry Window will open.
Step 2: Get your data. How you do this depends on the method used to create your
data file.
Step 3: Generate SPSS commands. Select your commands from the pull down
menus of the toolbar at the top of the screen or load them into SPSS as an
SPSS syntax file.
Getting Started
We suggest that you begin each SPSS session in the following way (especially if you
plan to print out your results for your professor or for whatever reason). This will minimize
the amount of paper output you generate (Not all universities offer free printing. At 10
cents a page, you can get a burger by saving 25 pages. Just dont blame us when you
get fat). Select Options from the Edit pull down menu on the toolbar at the top of the
screen. From the Options dialog box, select the Viewer tab at the top of the dialog box.
Click on Infinite in the Text Output Page Size section of the dialog box. Also, click
Display commands in the log to cause your commands to appear on your output. Then
click on OK. This will cause SPSS to use the minimum amount of paper possible when
you print your output at the end of your session. SPSS, left by itself, issues a lot of end
of page commands that are pretty much gibberish for you and your professor and
consume too many papers while printing.
37
38
Let us say you type Age under variable name, the other values appear as they are
set by default. You can change all the values as per your requirements.
39
The following have other specific meanings for SPSS and cannot be used
as variable names by themselves- ALL, AND, BY, EQ, GE, GT, LE, LT, NE,
NOT, OR, TO, WITH (Though it is possible to use ALL1/ALL2 and so on)
Our tip : Avoid using symbols in variable names. That way, you wont need to
remember which ones are okay and which ones are not.
Enter a name for your variable in the 1st box. By default, the variable type is numeric
with a w.d. (width. decimal) format of 8.2. If this is incorrect, click on the button in
the Type box to specify the correct type in the Variable Type dialog box. If the column
width and number of decimal points are incorrect, fix them in their boxes.
Variable LABELS and VALUE labels can be used to make your output easier to
understand. A variable LABEL is used to descriptively label the variables. Its use makes
the output easier to read and can be very useful if the output is used over a long time
period. Each label cannot exceed 60 characters (most procedures will only print 40 at
a time and some will print even fewer)
Example of a variable LABEL:
Liver weight in Grams
Plasma Cholesterol mg/dL
A VALUE label is used to descriptively label the values of a variable. Its use makes
the output simpler to read and can be very useful if the output is used very a long time
period. Similar to variable labels, each label must not exceed 60 characters (Most
procedures will only print 20 of them).
Example of a VALUE label:
For a variable named COMMUNITY,
A value of 1 could have the label Asian, a value of 2 could have the label African
American, and a value of 3 could have the label Hispanic and so on
For
1
2
3
4
5
Missing Data
The presence of missing data is very common in any kind of research. You will
always come across dead mice, sick children, non-compliant adults, unfilled forms,
lost samples and so on. You cant really do anything about it, other than planning for it
while creating your data file. You can enter a special code to indicate missing data or
you can leave the item blank. The good thing about SPSS is that any blank or period
40
(.) is considered missing data unlike some other software programs which consider a
blank as zero (cant even imagine the amount of distortion that would cause).
Saving your information
Now proceed to name the remaining variables (columns) of your spreadsheet. Then
click on the Data View tab at the bottom of the Data Editor to enter your data (This is
very easy for us to list as a step, but will take a long long time. Beware of errors while
entering your data. Slow and steady is the best way to go). Finally, save your data to
your disc by using the File pull down menu from the toolbar and selecting Save As.
Name your file, select the format of the file with the Save as type option. And you have
saved your file.
Exercise
The following data are total cholesterol levels in mg/dL for 8 groups of subjects.
Create a data file and use SPSS to list it back onto paper.
Group
1
Group
2
Group
3
Group
4
Group
5
Group
6
Group
7
Group 8
181
249
260
296
334
250
223
320
177
325
261
245
340
232
305
345
200
425
263
306
356
220
210
340
171
276
309
350
235
309
325
159
272
235
374
254
341
173
217
243
236
333
41
Fixed Format: Data files are most often fixed format. This is simper to read, edit
and understand. Fixed format means that the data for a given variable occupies the
same line columns and line position for each case (Note: If there is more than one line
of data per case, you must use the RECORDS subcommand to specify the number if
lines per case).
42
Example:
ID #
Columns 1-3
Weight
Columns 5-9
Height
Columns 11-13
001 065.3 091.
002 141.3 171.
003 12.4 154.
Free-field format: This is another form of data organization. The variables are in the
same order for each case but do not necessarily occupy the same line columns for each
case. The data for each variable must be separated by one or more spaces or a comma
in the data file. According to us, it is better to stick to the fixed format while preparing
data files for SPSS especially since in the free-field format, there will always arise the
problem of missing data (part and parcel of any experiment).
Example:
ID WEIGHT HEIGHT
001 65.3 91
002 141.3 171
003 122.4 154
43
44
(To open this box, you need to have some data on your data editor of SPSS. This is
just an example file).
You can then select the variables of interest (one or more dependent variables)
and choose one or more grouping variables (factors). By default, you will get box plots,
stem-and-leaf plots and basic descriptive statistics for each dependent variable (either
for the whole group or separated by a grouping variable). You can suppress the display of
plots or descriptive statistics. You can choose additional statistics and plots as described
below.
In the Explore dialog box, you can click on the Statistics button and choose
one or more of the following to be displayed
Descriptives
This is the default SPSS uses. It includes the mean and its confidence intervals,
median, 5 % trimmed mean, standard error, variance, standard deviation, minimum,
maximum, range, interquartile range (IQR), skewedness and its standard error, kurtosis
and its standard error. IQRs are computed by the H AVERAGE method. This is a weighted
average method and is described in detail in the base manual.
By default, 95 % confidence interval is displayed. The dialog box offers the choice
to set this to any value between 1 % and 99.99 %.
M-estimators
45
divided into intervals and the number of cases in each interval is displayed.
Spread vs. level with Levenes Test A spread-versus-level plot is produced with
the slope of the regression line and Levenes test for homogeneity of variances. If no
factor variable has been defined, this plot is not produced. You can choose one of the
following.
None : No plots or tests are produced
Power estimation : For each group, the natural log of the median is plotted against
the log of the interquartile range. Estimated power is also displayed. Use this
to determine an appropriate transformation of your data.
Transformed : The data are transformed according to user-specified power. One of
the alternatives below needs to be selected.
1/square root: The reciprocal of the square root is calculated for each data
value.
This command produces tables of frequency counts and percentages for the values
of individual variables. By default, a table is created that displays counts for each value
of a variable, the counts percentaged over all cases and over all cases with nonmissing
values and the cumulative percentage over all cases with nonmissing values. The values
46
are ordered from lowest to highest. All variable labels and value labels are printed if
they have been defined. These defaults can be altered with a number of subcommands.
Also, bar charts, histograms and statistics can be chosen.
Select the FREQUENCIES command by clicking on the Analyze pull down menu
on the toolbar and then select Descriptive Statistics and then Frequencies. The
Frequencies dialog box will open and allow you to select variables for generating
output. If all you want is the default, just click the OK button. You can also select optional
output from the dialog box.
Display frequency tables This is the default. To suppress frequency tables, click
on the little box to remove the check mark.
Statistics
This command controls the display of statistics. By default, no statistics are displayed.
One or more of the following choices may be used.
Under the Percentiles Values box, you can select one or more of the following:
n Quartiles: Displays the 25th, 50th and 75th percentiles.
n Cut points for n equal groups: Displays percentile values that divide the
sample into equal-sized groups. 10 is the default. You can enter any positive integer
between 2 and 100. The number of percentiles displayed is one fewer than the number
of groups specified.
n Percentile(s): Displays user-specified percentiles. You can enter any percentile
value between 0 and 100, and then click on Add. You can continue to add additional
percentile values to build a list which will be displayed. You can remove or change your
entered percentiles before execution of the command by highlighting them and clicking
the appropriate button
Under the Dispersion box, you can select one or more of the following:
n Standard deviation
n Variance
n Range
n Minimum
CHILLIBREEZE PUBLICATIONS | http://www.chillibreeze.com
47
n Maximum
n S.E mean (Standard error of the mean)
Under the Central Tendency box you can select one or more of the following:
n Mean
n Median
Under the Distribution box, you can select one or more of the following:
n
Charts
This allows selection of bar charts and histograms and some control over how bar
charts are labeled. You can choose only one of the following:
n
48
n Mode
n Sum
Format
This controls the output of frequency tables.
Under the Order by box, you can select one of the following
n
If you have selected percentiles or a histogram, you get the output for Ascending
values, regardless of your selection here.
You can also control the printing of large tables and the printing of multiple
variables.
m) The DESCRIPTIVES command
This command generates a listing containing the variable name, variable label,
49
mean, standard deviation, minimum and the maximum. Additionally, the standard error
of the mean, variance, kurtosis, skewness, range and sum may be requested under the
Options subcommand. To use the DESCRIPTIVES command, select Analyze from the
toolbar and then Descriptive Statistics and then Descriptives. A dialog box will appear
which will allow you to select variables, select additional descriptive statistics and set the
order that the variables are displayed in. This is a very useful tool for a quick summary
of your data.
Exercise
Using our data file diet.sav and the commands Frequencies and Descriptives, answer
the following:
How many and what percent of individuals fall into the following categories?
#
%
Males
___________ __________
Females
___________ __________
Asians
___________ __________
Orientals
___________ __________
50
African Americans
___________ __________
For each of the following, give the mean, standard deviation, minimum and maximum
values without using the Frequencies command:
Age
Kilocalories
Fat
Protein
Carbohydrate
Total cholesterol
Polyunsaturated fatty acids
Saturated fatty acids
Calcium
Mean
________
________
________
________
________
________
________
________
________
SD
_______
_______
_______
_______
_______
_______
_______
_______
_______
Min-Max
_______
_______
_______
_______
_______
_______
_______
_______
_______
operators are:
Addition
Subtraction
Multiplication
Exponentiation
Division
51
performed first, exponentiation next, then multiplication and division and finally, addition
and subtraction. Expecting this hierarchy, expressions are evaluated left-to-right. The
order of operations can be controlled by using parentheses.
Examples:
X = A + B * 2
X = (A + B) * 2
If A = 4 and B = 5, then the first expression would equal 14 while the second would
equal 18.
Examples:
RATIO = (A + B) * (C + D)/ (E F) **2
STANDARD = 142.675
X = ABS (Y)
TEST2 = TEST1 + 3
>=
COMPUTE A = B + 2
COMPUTE IRONRDA = 18 + Y
COMPUTE AGEGROUP = 1
53
You can choose the Options dialog box to alter this list or to add a number of other
descriptive statistics. You can also control the use of variable and value labels and
choose an analysis of variance or/and a test for linearity.
CHOL
SEX
1
2
N 16
16
Mean
350.00
225.00
Std. Deviation
29.781
67.382
Std. Error
Mean
7.445
16.845
CHOL
Equal variances
assumed
Equal variances
not assumed
F
7.258
Sig.
.011
t
6.787
6.787
30
Sig. (2-tailed)
.000
Mean
Difference
125.000
Std. Error
Difference
18.417
20.645
.000
125.000
18.417
df
95% Confidence
Interval of the
Difference
Lower
Upper
87.387
162.613
86.659
163.341
Levenes test is performed to find if we should use unequal variances t value or equal
variances t value. If p> 0.05 we use equal variance t value. Here p for Levenes test is
0.011, which means we must use the unequal variance t test p value that is 0.000 (<
0.05) which means that our test is significant. There is a significant difference in the
CHILLIBREEZE PUBLICATIONS | http://www.chillibreeze.com
55
Exercise
A few problems to test your t-test quotient
I) The data below were collected during an animal feeding experiment. One group of
rats was provided a complete diet (controls) and the other group was fed a diet low in
protein (deficient diet). Determine if the deficient diet produced a different weight gain
(in gms) as compared to the control diet.
Deficient Diet
Rat
1
2
3
4
5
Initial Wt
222
224
225
224
227
Control Diet
Final Wt
352
370
360
381
352
Rat
1
2
3
4
5
Initial Wt.
223
219
224
225
226
Final Wt.
417
416
415
417
419
Control Group
Deficient Group
_________
_________
_________
_________
_________
_________
_________
_________
_________
_________
_________
_________
_________
_________
_________
_________
before and after ingestion of a test meal containing 10.0 gms of salt. The data obtained
were:
Subject
1
2
3
4
5
6
7
8
9
10
11
12
__________
__________
__________
__________
40.5
43.9
39.3
47.4
41.5
46.7
40.5
47.9
40.4
43.7
41.8
54.3
39.3
48.7
46.4
47.8
49.2
56.5
Note: hematocrit means % Red blood cells in peripheral blood. Each data point
above represents the arithmetic mean of 3+ determinations. (Hint: Do you know the
normal mouse hematocrit? It is about 39-49 %)
Mean hematocrit for group X mice
____________
Standard deviation
____________
Mean hematocrit for group Y mice
____________
CHILLIBREEZE PUBLICATIONS | http://www.chillibreeze.com
57
Standard deviation
____________
Which statistical test did you use?
____________
What is the level of significance?
__________
Was there a biological significance? (Note: this is different from statistical significance.
Look at the normal values and decide the biological significance).
IV) Compare means for Coenzyme Q levels in whole blood from cardiomyopathy
patients. Group 1 is placebo treated and Group 2 received 100 mg/day of CoQ. The data
are listed below. (A clinical improvement is seen in patients with CoQ concentrations
at or above 2.5 microg/ml)
CoQ concentrations (microg/ml)
Group 1
Group 2
0.7
2.2
0.9
3.0
1.1
2.3
0.5
1.4
0.4
2.5
Control
Initial wt.
Final wt.
Initial wt.
Final wt.
89
283
89
335
93
287
87
342
87
292
87
344
90
285
90
336
91
267
91
321
88
295
88
348
86
280
86
332
95
282
95
331
87
284
93
89
296
89
336
349
Experimental
Control
Mean initial weight
___________
__________
Standard Deviation
___________
__________
Statistical test used was?
___________
__________
58
___________
___________
___________
___________
___________
__________
__________
__________
__________
__________
59
60
Oneway
Descriptives
KCAL
PRO
VITA
FE
Asian
African American
Oriental
Total
Asian
African American
Oriental
Total
Asian
African American
Oriental
Total
Asian
African American
Oriental
Total
N 11
12
9
32
11
12
9
32
11
12
9
32
11
12
9
32
Mean
2711.36
2865.17
2641.00
2749.25
96.64
99.83
96.33
97.75
4983.45
5228.00
4298.44
4882.50
10.336
11.017
9.567
10.375
Std. Deviation
355.507
421.962
375.573
386.605
11.084
13.750
12.114
12.136
947.277
1573.407
1700.459
1436.305
3.7583
4.3626
4.1914
4.0240
Std.
Error
107.189
121.810
125.191
68.343
3.342
3.969
4.038
2.145
285.615
454.203
566.820
253.905
1.1332
1.2594
1.3971
.7114
3133.27
2929.69
2888.64
104.08
108.57
105.65
102.13
5619.84
6227.69
5605.53
5400.34
12.861
13.789
12.788
11.826
Minimum
2195
2195
2195
2195
80
80
80
80
2973
973
977
973
4.8
4.8
4.8
4.8
Maximum
3241
3241
3241
3241
120
120
120
120
6570
6570
5966
6570
16.5
16.5
16.5
16.5
ANOVA
KCAL
PRO
VITA
FE
Between Groups
Within Groups
Total
Between Groups
Within Groups
Total
Between Groups
Within Groups
Total
Between Groups
Within Groups
Total
Sum of
Squares
282491.8
4350866
4633358
83.788
4482.212
4566.000
4614641
59337495
63952136
10.838
491.142
501.980
df
2
29
31
2
29
31
2
29
31
2
29
31
Mean
Square
141245.894
150029.869
F.941
Sig.
.402
41.894
154.559
.271
.764
2307320.525
2046120.515
1.128
.338
5.419
16.936
.320
.729
Since the p values are not significant, Scheffes test has no meaning in this case.
61
Dependent
Variable
KCAL
(I)
RACE
Asian
African American
Oriental
Asian
PRO
African American
Oriental
Asian
VITA
African American
Oriental
Asian
FE
African American
Oriental
(J)
RACE
African
American
Oriental
Asian
Oriental
Asian
African American
African American
Oriental
Asian
Oriental
Asian
African American
African American
Oriental
Asian
Oriental
Asian
African American
African American
Oriental
Asian
Oriental
Asian
African American
Mean
Difference
(I-J)
-153.803
70.364
153.803
224.167
-70.364
-224.167
-3.197
.303
3.197
3.500
-.303
-3.500
-244.545
685.010
244.545
929.556
-685.010
-929.556
-.6803
.7697
.6803
1.4500
-.7697
-1.4500
Std.
Error
161.684
174.095
161.684
170.800
174.095
170.800
5.189
5.588
5.189
5.482
5.588
5.482
597.094
642.929
597.094
630.759
642.929
630.759
1.7178
1.8497
1.7178
1.8147
1.8497
1.8147
Sig.
.640
.922
.640
.433
.922
.433
.828
.999
.828
.817
.999
.817
.920
.573
.920
.351
.573
.351
.925
.917
.925
.729
.917
.729
Homogeneous Subsets
KCAL
a,b
Scheffe
RACE
Oriental
Asian
African American
Sig.
9
11
12
Subset
for alpha
= .05
1
2641.00
2711.36
2865.17
.425
Means
a. for groups in homogeneous subsets are displayed.
b. Uses Harmonic Mean Sample Size = 10.513.
The group sizes are unequal. The harmonic mean
of the group sizes is used. Type I error levels are
not guaranteed.
62
PRO
a,b
Scheffe
RACE
Oriental
Asian
African American
Sig.
9
11
12
Subset
for alpha
= .05
1
96.33
96.64
99.83
.813
Means
a. for groups in homogeneous subsets are displayed.
b. Uses Harmonic Mean Sample Size = 10.513.
The group sizes are unequal. The harmonic mean
of the group sizes is used. Type I error levels are
not guaranteed.
VITA
a,b
Scheffe
RACE
Oriental
Asian
African American
Sig.
9
11
12
Subset
for alpha
= .05
1
4298.44
4983.45
5228.00
.343
Means
a. for groups in homogeneous subsets are displayed.
b. Uses Harmonic Mean Sample Size = 10.513.
The group sizes are unequal. The harmonic mean
of the group sizes is used. Type I error levels are
not guaranteed.
FE
a,b
Scheffe
RACE
Oriental
Asian
African American
Sig.
9
11
12
Subset
for alpha
= .05
1
9.567
10.336
11.017
.724
Means
a. for groups in homogeneous subsets are displayed.
b. Uses Harmonic Mean Sample Size = 10.513.
The group sizes are unequal. The harmonic mean
of the group sizes is used. Type I error levels are
not guaranteed.
63
The subset for alpha = 0.05 box is not divided for any variable analyzed. If the
variables had been different, this box would have been divided into 1, 2 and so on. The
groups that are in separate boxes are different from each other.
Exercise
I) The data below were collected from an animal feeding experiment. Analyze the
data using SPSS and answer our questions.
Group
A
A
A
A
A
B
B
B
B
B
C
C
C
C
C
Rat number
1
2
3
4
5
1
2
3
4
5
1
2
3
4
5
Initial wt.
80
79
89
85
83
82
85
87
82
83
80
86
87
84
86
Final wt.
335
342
353
337
334
335
345
357
335
343
398
405
398
398
395
Brain wt.
6.7
6.6
7.1
6.5
6.8
4.3
4.7
4.3
4.5
4.6
8.1
8.2
7.3
7.6
8.2
Group A
Group B
Initial weight (means +/- SD)
_______
_______
Statistical significance ________
Statistical test used _________
Your interpretation of the results_______________
Group A
Group B
Weight gain (means +/- SD)
_______
_______
Statistical significance ________
Statistical test used _________
Which groups were different from each other? _______________
Your interpretation of the results ____________________
Group C
_______
Group A
Group B
Brain wt as a % of
body weight (means +/- SD)
_______
_______
Statistical significance ________
Statistical test used _________
Which groups were different from each other? _______________
Your interpretation of the results ____________________
Group C
Group C
_______
_______
II) An experiment was conducted to determine the effect of various diets containing
different levels of protein on weight gain in rats. The data are presented below:
Diet A
89
97
64
Diet B
112
110
Diet C
125
128
Diet D
159
159
90
90
93
86
99
82
92
92
90
Group
Group
Group
Group
120
113
106
106
108
116
105
105
106
A
B
C
D
134
126
127
122
128
140
130
130
123
166
167
161
165
159
159
158
158
157
Group B
Group C
13
169
2197
15
225
3375
16
256
4096
17
289
4913
20
400
8000
Group A
Group B
Group C
65
curve for lab assays. This is actually a number of different but related procedures that
can be found under the Analyze pull down menu with the selection of Regression
followed by the selection of the desired procedure. We will only try a few of them.
A scattergram can be a simple way to view your data. It is an excellent way to detect
outliers (that even may be errors). To get a quick scattergram, select the Graphs pull
down menu and choose Scatter. The Scattergram dialog box will appear. You would
normally choose Simple Scattergram and click the Define button to choose the
variables to be used in your graph.
You can then identify the Y-axis variable and the X-axis variable and control labeling
of the graph.
This is an example of a simple scatterplot generated with SPSS.
3250
3000
KCAL
2750
2500
2250
2000
80
90
100
110
120
PRO
66
You can also use SPSS to generate the information needed to define a standard curve
and calculate unknowns from laboratory work. Open the absorbance.sav data file. Use
the Analyze pull down menu, select Regression and then Curve Estimation. The
following dialog box will open:
Select the dependent (Y-axis) variable and the independent (X-axis) variable. You will
use the default (linear) model. You can also control labeling. SPSS will display a plot of
the data. This is not the important output. Close the Chart Carousel window and look
at the results in the output window.
EXAMPLE:
Suppose a set of standards were measured spectroscopically to determine the
absorbances associated with the concentration of Vitamin X listed below.
Vitamin X
Absorbance
10.6
15.4
20.2
25.5
30.2
35.8
0.110
0.165
0.220
0.271
0.318
0.370
The Output window has a chart and the data we need to plot a straight line.
Y = intercept + slope * (X)
Or
Y = B0 + B1 * (X)
Curve Fit
MODEL: MOD _ 1.
_
CHILLIBREEZE PUBLICATIONS | http://www.chillibreeze.com
67
Independent:
Absorbance
Dependent Mth
VitaminX LIN
Rsq
d.f.
.998
Sigf
2052.68
.000
b0
-.5490
b1
96.9697
VitaminX
Observed
40.00
Linear
35.00
30.00
25.00
20.00
15.00
10.00
0.100
0.150
0.200
0.250
0.300
0.350
0.400
Absorbance
20
30
40
68
Absorbance at 595
nm
0.160
0.192
0.255
50
0.296
60
0.352
70
0.357
80
0.390
90
0.451
100
0.500
Absorbance at 595 nm
0.37
0.43
0.31
0.37
0.24
0.58
69
Select your variable of interest and place it in the dependent variable box. Then
select each of the predictors that you want used in your equation and place them in the
independent(s) box. Select the appropriate method. Stepwise is the suggested method.
This will identify the best predictor and generate output, then the best predictor which
goes with the first and so on.
Regression
Descriptive Statistics
KCAL
CHO
FAT
PRO
FIB
CA
CHOL
70
Mean
2749.25
334.75
113.25
97.75
11.75
818.50
287.50
Std. Deviation
386.605
51.285
23.267
12.136
1.951
173.161
81.599
N 32
32
32
32
32
32
32
a
Variables Entered/Removed
Model
1
Variables
Entered
Variables
Removed
C
A
FAT
CHO
PRO
FIB
CHOL
a.
Method
Stepwise
(Criteria:
Probabilit
y-ofF-to-enter
<= .050,
Probabilit
y-ofF-to-remo
ve >= .
100).
Stepwise
(Criteria:
Probabilit
y-ofF-to-enter
<= .050,
Probabilit
y-ofF-to-remo
ve >= .
100).
Stepwise
(Criteria:
Probabilit
y-ofF-to-enter
<= .050,
Probabilit
y-ofF-to-remo
ve >= .
100).
Stepwise
(Criteria:
Probabilit
y-ofF-to-enter
<= .050,
Probabilit
y-ofF-to-remo
ve >= .
100).
Stepwise
(Criteria:
Probabilit
y-ofF-to-enter
<= .050,
Probabilit
y-ofF-to-remo
ve >= .
100).
Stepwise
(Criteria:
Probabilit
y-ofF-to-enter
<= .050,
Probabilit
y-ofF-to-remo
ve >= .
100).
71
Model Summary
Model
1
2
3
4
5
6
R.895a R Square
.801
.983b
.966
c
.999
.997
1.000d
1.000
1.000e
1.000
f
1.000
1.000
Adjusted
R Square
.795
.964
.997
1.000
1.000
1.000
Std. Error of
the Estimate
175.198
73.443
21.887
.000
.000
.000
a.
b. Predictors: (Constant), CA
c. Predictors: (Constant), CA, FAT
For instance, in the Model Summary table, a value of 0.795 for calcium means that
53 percent of change in Kcal can be predicted if we have the Calcium intake. Then
a value of 0.964 below that means that if we have both Calcium and Fat intake, we
can predict 96 percent of the change in Kcal and so on.
72
Coefficientsa
Model
1
2
a.
(Constant)
CA
(Constant)
CA
FAT
(Constant)
CA
FAT
CHO
(Constant)
CA
FAT
CHO
PRO
(Constant)
CA
FAT
CHO
PRO
FIB
(Constant)
CA
FAT
CHO
PRO
FIB
CHOL
Unstandardized
Coefficients
B
Std.
Error
1113.479
151.926
1.998
683.316
1.412
8.038
218.675
.334
9.189
3.634
.000
.000
9.000
4.000
4.000
.000
.000
9.000
4.000
4.000
.000
.000
.000
9.000
4.000
4.000
.000
.000
.182
73.224
.091
.675
34.632
.068
.212
.210
.000
.000
.000
.000
.000
.000
.000
.000
.000
.000
.000
.000
.000
.000
.000
.000
.000
.000
Standardized
Coefficients
Beta
.895
.632
.484
.150
.553
.482
.000
.542
.531
.126
.000
.542
.531
.126
.000
.000
.542
.531
.126
.000
.000
t
7.329
10.998
9.332
15.563
11.905
6.314
4.916
43.352
17.278
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Sig.
.000
.000
.000
.000
.000
.000
.000
.000
.000
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
73
Excluded Variablesf
Model
1
4
5
CHO
FAT
PRO
FIB
CHOL
CHO
PRO
FIB
CHOL
PRO
FIB
CHOL
FIB
CHOL
CHOL
Beta.102
In a
.484a
.220a
-.146a
.013a
.482b
-.009b
-.124b
.143b
.126c
-.063c
.021c
.000d
.000d
.000e
t .474
11.905
1.280
-1.214
.104
17.278
-.111
-2.697
3.028
.
-5.991
1.205
.
.
.
Sig.
.639
.000
.211
.235
.918
.000
.913
.012
.005
.
.000
.239
.
.
.
Partial
Correlation
.088
.911
.231
-.220
.019
.956
-.021
-.454
.497
1.000
-.755
.226
.
.
.
Collinearity
Statistics
Tolerance
.147
.705
.219
.450
.423
.133
.204
.450
.406
.184
.422
.321
.181
.305
.223
a.
Additional subcommands exist if the input is a matrix or if the user wishes to write
the matrix to an external file; if the user wishes to examine residuals; or if plots of
various types are desired. These can be used under the syntax window system of
creating commands. The Regression procedure can be very complex. The order of
subcommands is very important in determining the output. It is therefore imperative
that the manual be studied carefully when using this procedure.
t) CHI-SQUARE test using CROSSTABS
When discrete data have been collected, it is often desirable to use the Chi-square
test. One way to have SPSS calculate the Chi-square for us is the use the Crosstabs
procedure. The Crosstabs command has a variety of parts, many of which are optional.
The discussion below is intended to clarify some of the information provided in the
manual.
Use the Analyze pull down menu and select Descriptive Statistics and then
Crosstabs. The following dialog box will appear:
74
Select your row and column variables. Select the Statistics button and choose
Chi-square.
You can also control what is printed into the cells of the table by selecting the Cells
option.
Valid
RACE * SEX
N 32
Percent
100.0%
Cases
Missing
N 0
Percent
.0%
Total
N 32
Percent
100.0%
75
1
2
3
Total
5
7
4
16
6
5
5
16
Total11
12
9
32
Chi-Square Tests
Pearson Chi-Square
Likelihood Ratio
Linear-by-Linear
Association
N of Valid Cases
a.
Value
.535a
.537
.000
df
2
2
Asymp. Sig.
(2-sided)
.765
.764
1.000
32
Looking at the p values, we can infer that there is no significant difference in the
distribution.
Exercise
I) The following data were collected from an experiment to determine the outcome of
a zinc supplement program on the performance of children on a standardized intelligent
test. Determine statistical significance by Chi-square.
Improved
Total
Control
17
23
Supplemented
19
27
Total
25
25
50
76
Group
Control
Initial wt.
251
229
240
229
243
308
196
222
207
274
239
251
209
220
Final wt.
252
226
241
225
245
309
202
221
200
268
220
251
196
217
294
300
Group
Experimental
Initial wt.
244
231
241
257
253
299
196
232
243
264
220
221
209
229
Final wt.
236
208
241
258
234
302
184
225
196
268
225
211
197
216
298
285
A successful weight loss is defined by the experimenters as being at least five pounds.
Determine if the percentages of successful weight loss were different in the two groups.
(Hint: In your calculations, divide the groups into successful weight loss and no successful
weight loss, controls and experimentals)
Controls
Experimentals
Percentage of successful wt.loss _______
_______
Chi-square value _______
p <
_______
III) We wish to evaluate the presence of breast cancer as a risk factor for subsequent
cancer in the other breast. From a group of 55-65 yr old women, you select a group of
breast cancer cases and a group of controls (not currently suffering from breast cancer).
You use a questionnaire and a thorough search of cancer registry records to determine
past histories of breast cancer. Your data are presented below.
Cancer
(cases)
No Cancer
(controls)
Total
Previous cancer
12
18
No previous cancer
38
44
82
Total
50
50
100
77
cancer at all.
Do you think this value is significant biologically?
u) Selection of a Subset of Cases for Analysis
Many a time, you might like to run a statistical test on selected subsets of your data.
For example you may have a huge data file and decide that you want to look only at
the males in your data. SPSS allows you to create a permanent data file to analyze only
males or do so temporarily. Use the Data pull down menu and select Select cases.
The following dialog box will appear.
Click the If condition is satisfied button and click on If. The following box will
then appear:
Set your conditions for case selection and your data file will now look like this:
78
You can now choose whatever statistical procedure(s) you want to perform on this
subset of cases.
NOTE: The Filtered button (the default) allows you to use Select cases again
to undo or alter your selection. The Deleted button makes this a permanent case
selection. You can then save the new data as another file and work on it.
v) The Nonparametric Tests
SPSS allows the user to perform a number of nonparametric statistical tests. The
tests available can be grouped into broad categories depending on the type of the
experimental data you have e.g. one-sample tests, related-sample tests and independentsamples tests. These tests are found in the Analyze pull down menu on selecting
Nonparametric Tests and then choosing the appropriate category. You may make
one of the following choices:
Chi-square
Binomial
Gives a
variable.
Runs
1-sample K-S
Gives
test.
2 Independent Samples
binomial
one-sample
Kolmogorov-Smirnov
79
Wald-Wolfowitz runs.
K Independent Samples By default you get a Kruskal-Wallis H statistic
computed. You can opt for a Median test.
2 Related Samples
K Related Samples
For each of these, you have choices as to which statistics are displayed an how labels
and missing data are handled. We will examine two of these as examples.
w) Mann-Whitney U test
A Mann-Whitney U test can be obtained by selecting the 2 Independent Samples
option under the Nonparametric Tests menu. The following dialog box will appear.
Since Mann-Whitney U test is the default, we will select our variables to test and our
grouping variable. We must of course define our groups similar to previous tests. The
output looks similar to the following:
NPar Tests
Mann-Whitney Test
Ranks
AGE
80
SEX
1
2
Total
N 16
16
32
Mean 17.34
Rank
Sum of 277.50
Ranks
15.66
250.50
Test Statisticsb
Mann-Whitney U
Wilcoxon W
Z
Asymp. Sig. (2-tailed)
Exact Sig. [2*(1-tailed
Sig.)]
AGE
114.500
250.500
-.510
.610
.616
a.
> Median
<= Median
8
8
8
8
81
In this data, we have equal distribution; therefore our significance level is 1.0 which
means that there is no significant difference in the medians of ages in males and
females.
x) Bivariate Correlations
There are multiple ways to use SPSS to get correlation coefficients. W can select
Correlate from the Analyze pull down menu and then select Bivariate. The following
dialog box will appear.
82
We do not get r2 by using this procedure. If we want the r2 value, we can calculate it
with a calculator or choose Regression and then Linear under the Analyze pull down
menu. Here we would choose Enter under the Method box. This would give us the r
and r2.
Model Summary
Model
1
R Square
.583(a)
.340
Adjusted R
Square
Std. Error of
the Estimate
.318
19.211
The r2 value means that 34 percent of the fat intake can be predicted by
protein intake. (Since this is not the kind of data you would use for regression
analysis, do not try to make sense out of this statement.)
y) Survival
The SURVIVAL command produces life tables, plots and related statistics for
examining the length of time between two events (lets say exposure and development
of disease). Cases can be classified into groups, and separate analyses and comparisons
obtained for the groups. The time interval between two dates can be calculated with the
SPSS date and time conversion functions (e.g. YRMODA).
Example:
If the data file contains dates of important events such as diagnosis or outcome, you
can use the Compute command and the YRMODA function to calculate elapsed time.
83
Use the survival.sav data file provided to you to work on this function. The outcomes
are either 1 (died) or 2 (survived). The treatments are either 1 (Vitamin A), 2 (Beta
Blocker), 3 (ACE inhibitor) or 4 (Aspirin).
Use the Compute command to calculate a variable xyz (time elapsed between
exposure and the event i.e. death)
xyz will now be created as a new variable on your data file and will provide you
the number of years elapsed between the two dates. You can then use the survival
analysis command to compute median survival time for various groups within your
experiment.
The Survival command can be found under the Analyze pull down menu. Then
select Life Tables. The following dialog box will appear:
Place your variable defining the time elapsed in the Time box. The Display Time
Intervals boxes are for defining the number of time units that will be displayed on your
output. The Status box is for defining your outcome and the Factor box is for defining
your grouping variable. Look at the examples below.
84
You can also select the Options button to generate statistical testing your groups
and to control the Output.
40 observations
Life Table
Survival Variable
for
Intrvl
Start
Time
-----.0
1.0
2.0
3.0
4.0
5.0
6.0
Number
Entrng
this
Intrvl
-----19.0
19.0
19.0
19.0
19.0
19.0
19.0
174762 observations
Number
Wdrawn
During
Intrvl
-----.0
.0
.0
.0
.0
.0
.0
xyz
treatmen
Number
Exposd
to
Risk
-----19.0
19.0
19.0
19.0
19.0
19.0
19.0
survival
Number
of
Termnl
Events
-----.0
.0
.0
.0
.0
.0
.0
Propn
Terminating
-----.0000
.0000
.0000
.0000
.0000
.0000
.0000
Vitamin A
Propn
Surviving
-----1.0000
1.0000
1.0000
1.0000
1.0000
1.0000
1.0000
Cumul
Propn
Surv
at End
-----1.0000
1.0000
1.0000
1.0000
1.0000
1.0000
1.0000
Probability
Densty
-----.0000
.0000
.0000
.0000
.0000
.0000
.0000
Hazard
Rate
-----.0000
.0000
.0000
.0000
.0000
.0000
.0000
85
7.0
8.0
9.0
10.0+
**
19.0
19.0
19.0
19.0
.0
.0
.0
4.0
19.0
19.0
19.0
17.0
.0
.0
.0
15.0
.0000
.0000
.0000
.8824
1.0000
1.0000
1.0000
.1176
.0000
.0000
.0000
**
.0000
.0000
.0000
**
Intrvl
Start
Time
------.0
1.0
2.0
3.0
4.0
5.0
6.0
7.0
8.0
9.0
10.0+
SE of
Probability
Densty
-----.0000
.0000
.0000
.0000
.0000
.0000
.0000
.0000
.0000
.0000
**
Life Table
Survival Variable
for
Number
Intrvl Entrng
Start
this
Time
Intrvl
------ -----.0
10.0
1.0
10.0
2.0
10.0
3.0
10.0
4.0
10.0
5.0
10.0
6.0
10.0
7.0
10.0
8.0
10.0
9.0
10.0
10.0+
10.0
**
xyz
treatmen
Number
Wdrawn
During
Intrvl
-----.0
.0
.0
.0
.0
.0
.0
.0
.0
.0
.0
10.00+
SE of
Hazard
Rate
-----.0000
.0000
.0000
.0000
.0000
.0000
.0000
.0000
.0000
.0000
**
Number
Exposd
to
Risk
-----10.0
10.0
10.0
10.0
10.0
10.0
10.0
10.0
10.0
10.0
10.0
survival
Number
of
Termnl
Events
-----.0
.0
.0
.0
.0
.0
.0
.0
.0
.0
10.0
Propn
Terminating
-----.0000
.0000
.0000
.0000
.0000
.0000
.0000
.0000
.0000
.0000
1.0000
Beta Blocker
Propn
Surviving
-----1.0000
1.0000
1.0000
1.0000
1.0000
1.0000
1.0000
1.0000
1.0000
1.0000
.0000
Cumul
Propn
Surv
at End
-----1.0000
1.0000
1.0000
1.0000
1.0000
1.0000
1.0000
1.0000
1.0000
1.0000
.0000
Probability
Densty
-----.0000
.0000
.0000
.0000
.0000
.0000
.0000
.0000
.0000
.0000
**
Hazard
Rate
-----.0000
.0000
.0000
.0000
.0000
.0000
.0000
.0000
.0000
.0000
**
Intrvl
Start
Time
------.0
1.0
2.0
3.0
4.0
5.0
6.0
7.0
8.0
9.0
10.0+
86
1.0000
1.0000
1.0000
.1176
SE of
Cumul
Surviving
-----.0000
.0000
.0000
.0000
.0000
.0000
.0000
.0000
.0000
.0000
.0000
SE of
Probability
Densty
-----.0000
.0000
.0000
.0000
.0000
.0000
.0000
.0000
.0000
.0000
**
10.00+
SE of
Hazard
Rate
-----.0000
.0000
.0000
.0000
.0000
.0000
.0000
.0000
.0000
.0000
**
Life Table
Survival Variable
for
Number
Intrvl Entrng
Start
this
Time
Intrvl
------ -----.0
7.0
1.0
7.0
2.0
7.0
3.0
7.0
4.0
7.0
5.0
7.0
6.0
7.0
7.0
7.0
8.0
7.0
9.0
7.0
10.0+
7.0
**
xyz
treatmen
Number
Wdrawn
During
Intrvl
-----.0
.0
.0
.0
.0
.0
.0
.0
.0
.0
3.0
Number
Exposd
to
Risk
-----7.0
7.0
7.0
7.0
7.0
7.0
7.0
7.0
7.0
7.0
5.5
survival
Number
of
Termnl
Events
-----.0
.0
.0
.0
.0
.0
.0
.0
.0
.0
4.0
Propn
Terminating
-----.0000
.0000
.0000
.0000
.0000
.0000
.0000
.0000
.0000
.0000
.7273
ACE inhibitor
Propn
Surviving
-----1.0000
1.0000
1.0000
1.0000
1.0000
1.0000
1.0000
1.0000
1.0000
1.0000
.2727
Cumul
Propn
Surv
at End
-----1.0000
1.0000
1.0000
1.0000
1.0000
1.0000
1.0000
1.0000
1.0000
1.0000
.2727
Probability
Densty
-----.0000
.0000
.0000
.0000
.0000
.0000
.0000
.0000
.0000
.0000
**
Hazard
Rate
-----.0000
.0000
.0000
.0000
.0000
.0000
.0000
.0000
.0000
.0000
**
SE of
Cumul
Surviving
-----.0000
.0000
.0000
.0000
.0000
.0000
.0000
.0000
.0000
.0000
.1899
Intrvl
Start
Time
------.0
1.0
2.0
3.0
4.0
5.0
6.0
7.0
8.0
9.0
10.0+
SE of
Probability
Densty
-----.0000
.0000
.0000
.0000
.0000
.0000
.0000
.0000
.0000
.0000
**
Life Table
Survival Variable
for
Number
Intrvl Entrng
Start
this
Time
Intrvl
------ -----.0
4.0
1.0
4.0
2.0
4.0
3.0
4.0
4.0
4.0
5.0
4.0
6.0
4.0
7.0
4.0
8.0
4.0
9.0
4.0
10.0+
4.0
**
SE of
Hazard
Rate
-----.0000
.0000
.0000
.0000
.0000
.0000
.0000
.0000
.0000
.0000
**
xyz
treatmen
Number
Wdrawn
During
Intrvl
-----.0
.0
.0
.0
.0
.0
.0
.0
.0
.0
1.0
10.00+
Number
Exposd
to
Risk
-----4.0
4.0
4.0
4.0
4.0
4.0
4.0
4.0
4.0
4.0
3.5
survival
Number
of
Termnl
Events
-----.0
.0
.0
.0
.0
.0
.0
.0
.0
.0
3.0
Propn
Terminating
-----.0000
.0000
.0000
.0000
.0000
.0000
.0000
.0000
.0000
.0000
.8571
Aspirin
Propn
Surviving
-----1.0000
1.0000
1.0000
1.0000
1.0000
1.0000
1.0000
1.0000
1.0000
1.0000
.1429
Cumul
Propn
Surv
at End
-----1.0000
1.0000
1.0000
1.0000
1.0000
1.0000
1.0000
1.0000
1.0000
1.0000
.1429
Probability
Densty
-----.0000
.0000
.0000
.0000
.0000
.0000
.0000
.0000
.0000
.0000
**
Hazard
Rate
-----.0000
.0000
.0000
.0000
.0000
.0000
.0000
.0000
.0000
.0000
**
87
SE of
Cumul
Surviving
-----.0000
.0000
.0000
.0000
.0000
.0000
.0000
.0000
.0000
.0000
.1870
Intrvl
Start
Time
------.0
1.0
2.0
3.0
4.0
5.0
6.0
7.0
8.0
9.0
10.0+
SE of
Probability
Densty
-----.0000
.0000
.0000
.0000
.0000
.0000
.0000
.0000
.0000
.0000
**
10.00+
SE of
Hazard
Rate
-----.0000
.0000
.0000
.0000
.0000
.0000
.0000
.0000
.0000
.0000
**
label
Vitamin A
Beta Blocker
ACE inhibitor
Aspirin
Abbreviated
Name
treatment
statistic
11.350
D.F.
Prob.
.0100
Total N
Uncen
Cen
Pct Cen
Mean Score
19
10
7
4
15
10
4
3
4
0
3
1
21.05
.00
42.86
25.00
9.0526
-19.1000
-3.5714
11.0000
Extended
Name
treatment
The main values to look for are the median survival times and finally the overall
comparison table for the probability value. Here the probability value is 0.01 which
means that there is no significant difference in the survival of the differently treated
groups.
88
89
NUMISS
FIRST
LAST
Example:
Suppose we had a large data file named EXPERIMENT.DAT and wished to construct
a new data file named FEEDING.DAT containing mean Kilocalorie consumption for each
day rather than individual data for each mouse. We can use Aggregate to do this.
Our command might look something like this:
AGGREGATE OUTFILE = FEEDING.DAT
/BREAK = FEEDING
/AVGKCAL = MEAN (KCAL).
The Autorecode command
The Autorecode command recodes the values of both string and numeric variables
to consecutive integers and puts the new values into a different variable called the target
variable.
Example:
If we have a category called Race and our variable labels were
1 = White
2 = Hispanic
3 = African American
4 = Asian
We can recode this as 1 = White and 2 = Non-White
The ANoVA command
This command does not exist in the toolbar. We need to create it in the Syntax
window. It performs analysis of variance for data from experiments with factorial
designs. A factorial design is used when the researcher wishes to study the effects of
several factors simultaneously. The ANoVA command also allows the user to perform
analysis of covariance procedures. Other SPSS commands which also perform ANoVA
are ONEWAY and MEANS.
Example:
Suppose a researcher wants to study the relationships between total kilocalorie
consumption, sex and race, the following command may be used:
ANOVA VARIABLES = KCAL BY SEX (1,2) RACE (1,4)
/STATISTICS = 3.
The Correlation command
This command produces Pearson product-moment correlations with one-tailed
probabilities. We can also opt for additional output including univariate statistics,
covariances and cross-product deviations.
Example:
CORRELATION VARIABLES = HEIGHT WEIGHT AGE
/VARIABLES = HEIGHT WEIGHT WITH AGE
/OPTIONS = 2 3
/STATISTICS = 1
The first VARIABLES subcommand causes a square matrix of correlation coefficients
to be created among the variables HEIGHT, WEIGHT AND AGE. The second VARIABLES
90
91
92
built.
The Save command
See the Get command
The Sort Cases command
This command records the sequence of cases in the active file based on the values
of one or more variables. This command is again used for printing out business reports.
Its syntax is as follows:
Sort Cases By variable (A or D) variable (A or D) etc.
The cases are sorted for each variable listed. The default is ascending order (A). To
sort in descending order specify (D). After the initial sort with the first variable, additional
variables cause successive sorts to be performed within categories as determined by the
preceding sort(s). Up to 10 variables may be used as sort keys. This command uses a
large amount of disk space for scratch files.
The Subtitle command
This command is now rarely used. It inserts a left-justified subtitle on the second
line of each page of output. The default is a blank line. The subtitle can be up to 64
characters long. If the subtitle is enclosed in quotes or apostrophes, it will be printed
exactly. If they are omitted, it will be printed in upper-case. The syntax is:
Subtitle any string 64 or fewer characters
The Title command
This prints a centered title on the first line of each page. The date and page number
are also printed.
The Translate command
This command either creates an active file from or translates the active file e.g. Excel
file to a file from a spreadsheet program.
The Begin Data and End Data commands
(Under data definition)
These commands are used when we do not wish to use a separate data file. Normally
this is used only when we have a small amount of data. This allows us to include our
data in our command file. This used to be useful when punch cards were used to feed
commands.
Example:
BEGIN DATA.
1 3.4
158
2 2.9
166
3 3.0
178
END DATA
The Set command
(Under information and settings)
This command changes how SPSS runs. When SPSS is started the defaults of the
Set command are in force. Subcommands include:
93
NULLINE
94
been provided for each case. This is usually used in research designs having complex
sampling plans or in situations when one or more groups have been over- or undersampled.
The Write command
This command is used to write cases from the active file to an ASCII file on the
disk.
We believe in learning by doing. SPSS comes with a great Help section.
We compiled this tutorial with a lot of quizzes and exercise problems so that
we could help you learn and understand SPSS better. Test yourself each
time you learn an SPSS command and soon you will master all the essential
commands.
Here we come to the end of our tutorial. Hope you gained as much from
it as we hoped you would. Our 100 page+ association with you has been
nice and we bid farewell to you with a heavy heart and wish you luck in your
ventures using SPSS.
95