You are on page 1of 8

IS workshop

SPSS
The data view
The data view displays your actual data and any new variables you have created
The variable view
The variable view window contains the definitions of each variable in your data set, including
its name, type, label, size, alignment, and other information.
Note: While the variables are listed as columns in the Data View, they are listed as
rows in the Variable View. In the Variable View, each column is a kind of variable
itself, containing a specific type of information.
The output view
The output window is where you see the results of your various queries such as frequency
distributions, cross-tabs, statistical tests, and charts. It is where we see our results.
The draft view
The draft view is where you can look at output as it is generated for printing. The draft view
does not contain the contents pane or some of the notations present in the output pane.

Ex: From the menu, select Analyse > Descriptive Statistics> Crosstabs. Click once on
Employment, and then click the small right arrow next to Rows to move the variable to the
Rows pane. Click Gender, and then click the small right arrow next to Columns to move the
Gender variable to the Columns pane.

What the heck is a crosstab?


A crosstab (short for cross tabulation) is a summary table, with the emphasis on
summary.
Employment Category * Gender Cross tabulation
Count
Gender
Female Male Total
Employment Clerical 206 157 363
Category Custodial 0 27 27
Manager 10 74 84
Total 216 258 474

Notice that the rows contain one set of categories (employment category) while the columns
contain another (gender). In this crosstab, the cells contain counts.
Crosstabs are used for only categorical (discrete) data, that is, groups like employment
categories or gender. You can’t use a crosstab for continuous data like temperature or dosage
or income. BUT, you can change data like temperature or dosage or income into categories
by creating groups, like income less than $25,000, income between 25000 and 49999, income
50000 or higher. Crosstabs deal with groups or categories.

Types of Variable
All experiments examine some kind of variable(s). A variable is not only something that we
measure, but also something that we can manipulate and something we can control for.
Dependent and Independent Variables
An independent variable, sometimes called an experimental or predictor variable, is a
variable that is being manipulated in an experiment in order to observe the effect on
a dependent variable, sometimes called an outcome variable.
Imagine that a tutor asks 100 students to complete a maths test. The tutor wants to know why
some students perform better than others. Whilst the tutor does not know the answer to this,
she thinks that it might be because of two reasons: (1) some students spend more time
revising for their test; and (2) some students are naturally more intelligent than others. As
such, the tutor decides to investigate the effect of revision time and intelligence on the test
performance of the 100 students. The dependent and independent variables for the study are:
Dependent Variable: Test Mark (measured from 0 to 100)
Independent Variables: Revision time (measured in hours) Intelligence (measured using IQ
score)
The dependent variable is simply that, a variable that is dependent on an independent
variable(s). For example, in our case the test mark that a student achieves is dependent on
revision time and intelligence. Whilst revision time and intelligence (the independent
variables) may (or may not) cause a change in the test mark (the dependent variable), the
reverse is implausible; in other words, whilst the number of hours a student spends revising
and the higher a student's IQ score may (or may not) change the test mark that a student
achieves, a change in a student's test mark has no bearing on whether a student revises more
or is more intelligent (this simply doesn't make sense).
Therefore, the aim of the tutor's investigation is to examine whether these independent
variables - revision time and IQ - result in a change in the dependent variable, the students'
test scores. However, it is also worth noting that whilst this is the main aim of the experiment,
the tutor may also be interested to know if the independent variables - revision time and IQ -
are also connected in some way.
Experimental and Non-Experimental Research
Experimental research: In experimental research, the aim is to manipulate an independent
variable(s) and then examine the effect that this change has on a dependent variable(s). Since
it is possible to manipulate the independent variable(s), experimental research has the
advantage of enabling a researcher to identify a cause and effect between variables. For
example, take our example of 100 students completing a maths exam where the dependent
variable was the exam mark (measured from 0 to 100), and the independent variables were
revision time (measured in hours) and intelligence (measured using IQ score). Here, it would
be possible to use an experimental design and manipulate the revision time of the students.
The tutor could divide the students into two groups, each made up of 50 students. In "group
one", the tutor could ask the students not to do any revision. Alternately, "group two" could
be asked to do 20 hours of revision in the two weeks prior to the test. The tutor could then
compare the marks that the students achieved.
Non-experimental research: In non-experimental research, the researcher does not
manipulate the independent variable(s). This is not to say that it is impossible to do so, but it
will either be impractical or unethical to do so. For example, a researcher may be interested in
the effect of illegal, recreational drug use (the independent variable(s)) on certain types of
behaviour (the dependent variable(s)). However, whilst possible, it would be unethical to ask
individuals to take illegal drugs in order to study what effect this had on certain behaviours.
As such, a researcher could ask both drug and non-drug users to complete a questionnaire that
had been constructed to indicate the extent to which they exhibited certain behaviours. Whilst
it is not possible to identify the cause and effect between the variables, we can still examine
the association or relationship between them.
Standard Deviation
Standard deviation is a statistical measurement that looks at how far a group of numbers is
from the mean. Put simply, standard deviation measures how far apart numbers are in a data
set. This metric is calculated as the square root of the variance. This means you have to
figure out the variation between each data point relative to the mean.
Standard deviation is a measure of how much the data in a set varies from the mean. The
larger the value of standard deviation, the more the data in the set varies from the mean. The
smaller the value of standard deviation, the less the data in the set varies from the mean.
Variance
A variance is the average of the squared differences from the mean. To figure out the
variance, calculate the difference between each point within the data set and the mean. Once
you figure that out, square and average the results.
For example, if a group of numbers ranges from 1 to 10, it will have a mean of 5.5. If you
square the differences between each number and the mean and find their sum, the result is
82.5. To figure out the variance:
 Divide the sum, 82.5, by N-1, which is the sample size (in this case 10) minus 1.
 The result is a variance of 82.5/9 = 9.17.
Note that the standard deviation is the square root of the variance so that the standard
deviation would be about 3.03.
Their Key Differences
Other than how they're calculated, there are a few other key differences between standard
deviation and variance. For one thing, the standard deviation is a statistical measure that
people can use to determine how spread out numbers are in a data set. Variance, on the other
hand, gives an actual value to how much the numbers in a data set vary from the mean.
Standard deviation is the square root of variance, and the variance is expressed as a percent
(especially in the context of finance). As such, the standard deviation can actually be greater
than the variance since the square root of a decimal will be larger (and not smaller) than the
original number when the variance is less than one (1.0 or 100%). Likewise, standard
deviation will be smaller than the variance when the variance is more than one (e.g., 1.2 or
120%).
Example of Standard Deviation vs. Variance
To demonstrate how both principles work, let's look at an example of standard deviation and
variance.
Suppose you have a series of numbers and you want to figure out the standard deviation for
the group. The numbers are 4, 34, 11, 12, 2, and 26. We need to determine the mean or the
average of the numbers. In this case, we determine the mean by adding the numbers up and
dividing it by the total count in the group:
(4 + 34 + 18 + 12 + 2 + 26) ÷ 6 = 16
So the mean is 16. Now subtract the mean from each number then square the result:
 (4 - 16)2 = 144
 (34 - 16)2 = 324
 (18 - 16)2 = 4
 (12 - 16)2 = 16
 (2 - 16)2 = 196
 (26 - 16)2 = 100
Now we have to figure out the average or mean of these squared values to get the variance.
This is done by adding up the squared results from above, then dividing it by the total count
in the group:
(144 + 324 + 4 + 16 + 196 + 100) ÷ 6 = 130.67
This means we end up with a variance of 130.67. To figure out the standard deviation, we
have to take the square root of the variance, which is 11.43
What Does Variance Mean?
The simple definition of the term variance is the spread between numbers in a data set.
Variance is a statistical measurement used to determine how far each number is from the
mean and from every other number in the set. You can calculate the variance by taking the
difference between each point and the mean. Then square and average the results.
What Does Standard Deviation Mean?
Standard deviation measures how data is dispersed relative to its mean and is calculated as
the square root of its variance. The further the data points are, the higher the deviation.
Closer data points mean a lower deviation. In finance, standard deviation calculates risk so
riskier assets have a higher deviation while safer bets come with a lower standard deviation.
Cross tabulation
Cross tabulation is a basic technique for examining the relationship between two
categorical variables. For example, using Age category as a row variable and Gender as a
column variable, you can create a two-dimensional cross tabulation that shows the number of
males and females in each age category. In SPSS, it is just another name for contingency
tables, which summarize the relationship between different variables of categorical
data. Crosstabs can help you show the proportion of cases in subgroups. It is most often used
to analyse categorical (nominal measurement scale - nominal data is not measured. It is
categorized data. Researchers use cross-tabulation to examine the relationship within the data
that is not readily evident. It is quite useful in market research studies and surveys. A cross-
tab report shows the connection between two or more questions asked in the study.
Chi-square - used to discover if there is a relationship between two categorical variables.
The Chi-square statistic is the primary statistic used for testing the statistical significance of
the cross-tabulation table. Chi-square tests determine whether or not the two variables are
independent. If the variables are independent (have no relationship), then the results of the
statistical test will be “non-significant” and we are not able to reject the null hypothesis,
meaning that we believe there is no relationship between the variables. If the variables are
related, then the results of the statistical test will be “statistically significant” and we are able
to reject the null hypothesis, meaning that we can state that there is some relationship
between the variables.
Computation
The chi-square statistic, along with the associated probability of chance observation, may be
computed for any table. If the variables are related (i.e., the observed table relationships
would occur with very low probability, say only 5%) then we say that the results are
“statistically significant” at the .05 or 5% level. This means that the variables have a low
chance of being independent. Students of statistics will recall that the probability values (.05
or .01) reflect the researcher’s willingness to accept a type I error, or the probability of
rejecting a true null hypothesis (meaning that we thought there was a relationship between the
variables when there really wasn’t).
A chi-square statistic is one way to show a relationship between two categorical variables. In
statistics, there are two types of variables: numerical (countable) variables and non-numerical
(categorical) variables. The chi-squared statistic is a single number that tells you how much
difference exists between your observed counts and the counts you would expect if there
were no relationship at all in the population.
Chi-Square or Pearson's chi-square test is any statistical hypothesis which researchers use to
determine whether there is a significant difference between expected frequencies and the
observed frequencies in one or more categories.  It helps us determine if two discrete
variables are associated. 
The Chi-Square Test of Independence determines whether there is an association between
categorical variables (i.e., whether the variables are independent or related). It is a
nonparametric test.
This test is also known as:
 Chi-Square Test of Association.
This test utilizes a contingency table to analyze the data. A contingency table (also known as
a cross-tabulation, crosstab, or two-way table) is an arrangement in which data is classified
according to two categorical variables. The categories for one variable appear in the rows,
and the categories for the other variable appear in columns. Each variable must have two or
more categories. Each cell reflects the total count of cases for a specific pair of categories.

When you choose to analyse your data using a chi-square test for independence, you need to
make sure that the data you want to analyse "passes" two assumptions.
These two assumptions are:
Assumption #1: Your two variables should be measured at an ordinal or nominal
level (i.e., categorical data).
Assumption #2: Your two variables should consist of two or more
categorical, independent groups. Example independent variables that meet this criterion
include gender (2 groups: Males and Females), ethnicity (e.g., 3 groups: Caucasian, African
American and Hispanic), physical activity level (e.g., 4 groups: sedentary, low, moderate and
high), profession (e.g., 5 groups: surgeon, doctor, nurse, dentist, therapist), and so forth.
What is Statistical Significance?
Statistical significance is a term used by researchers to state that it is unlikely their
observations could have occurred under the null hypothesis of a statistical test. Significance is
usually denoted by a p-value, or probability value.
Statistical significance is arbitrary – it depends on the threshold, or alpha value, chosen by the
researcher. The most common threshold is p < 0.05, which means that the data is likely to
occur less than 5% of the time under the null hypothesis.
When the p-value falls below the chosen alpha value, then we say the result of the test is
statistically significant.

P Values
The P value, or calculated probability, is the probability of finding the observed, or more
extreme, results when the null hypothesis (H0) of a study question is true – the definition of
‘extreme’ depends on how the hypothesis is being tested. P is also described in terms of
rejecting H0 when it is actually true, however, it is not a direct probability of this state.
 
The null hypothesis is usually a hypothesis of "no difference" e.g. no difference between
blood pressures in group A and group B. Define a null hypothesis for each study question
clearly before the start of your study.
 
The only situation in which you should use a one sided P value is when a large change in an
unexpected direction would have absolutely no relevance to your study. This situation is
unusual; if you are in any doubt then use a two sided P value.
 
The term significance level (alpha) is used to refer to a pre-chosen probability and the term
"P value" is used to indicate a probability that you calculate after a given study.
 
The alternative hypothesis (H1) is the opposite of the null hypothesis; in plain language
terms this is usually the hypothesis you set out to investigate. For example, question is "is
there a significant (not due to chance) difference in blood pressures between groups A and B
if we give group A the test drug and group B a sugar pill?" and alternative hypothesis is "
there is a difference in blood pressures between groups A and B if we give group A the test
drug and group B a sugar pill".
 
If your P value is less than the chosen significance level then you reject the null hypothesis
i.e. accept that your sample gives reasonable evidence to support the alternative hypothesis. It
does NOT imply a "meaningful" or "important" difference; that is for you to decide when
considering the real-world relevance of your result.
 
The choice of significance level at which you reject H0 is arbitrary. Conventionally the 5%
(less than 1 in 20 chance of being wrong), 1% and 0.1% (P < 0.05, 0.01 and 0.001) levels
have been used. These numbers can give a false sense of security.
 
Notes about Type I error:
 Is the incorrect rejection of the null hypothesis
 maximum probability is set in advance as alpha
 is not affected by sample size as it is set in advance
 increases with the number of tests or end points (i.e. do 20 rejections of H 0 and 1 is likely
to be wrongly significant for alpha = 0.05)
 
Notes about Type II error:
 is the incorrect acceptance of the null hypothesis
 probability is beta
 beta depends upon sample size and alpha
 can't be estimated except as a function of the true population effect
 beta gets smaller as the sample size gets larger
 beta gets smaller as the number of tests or end points increases
Continuous variable
One-tailed and Two-tailed test

One-tailed test alludes to the significance test in which the region of rejection appears on one
end of the sampling distribution. It represents that the estimated test parameter is greater or
less than the critical value. When the sample tested falls in the region of rejection, i.e. either
left or right side, as the case may be, it leads to the acceptance of alternative hypothesis rather
than the null hypothesis.
The two-tailed test is described as a hypothesis test, in which the region of rejection or say
the critical area is on both the ends of the normal distribution. It determines whether the
sample tested falls within or outside a certain range of values. Therefore, an alternative
hypothesis is accepted in place of the null hypothesis, if the calculated value falls in either of
the two tails of the probability distribution.
Key Differences Between One-tailed and Two-tailed Test
The fundamental differences between one-tailed and two-tailed test, is explained below in
points:
1. One-tailed test, as the name suggest is the statistical hypothesis test, in which the
alternative hypothesis has a single end. On the other hand, two-tailed test implies the
hypothesis test; wherein the alternative hypothesis has dual ends.
2. In the one-tailed test, the alternative hypothesis is represented directionally.
Conversely, the two-tailed test is a non-directional hypothesis test.
3. In a one-tailed test, the region of rejection is either on the left or right of the sampling
distribution. On the contrary, the region of rejection is on both the sides of the
sampling distribution.
4. A one-tailed test is used to ascertain if there is any relationship between variables in a
single direction, i.e. left or right. As against this, the two-tailed test is used to identify
whether or not there is any relationship between variables in either direction.
5. In a one-tailed test, the test parameter calculated is more or less than the critical value.
Unlike, two-tailed test, the result obtained is within or outside critical value.
6. When an alternative hypothesis has ‘≠’ sign, then a two-tailed test is performed. In
contrast, when an alternative hypothesis has ‘> or <‘ sign, then one-tailed test is
carried out.

You might also like