You are on page 1of 14

1

© Sanjay Singh
2

SPSS For Uninitiated:


A Visual Odyssey for Mortals

'The road to every heaven goes through a hell. Bear this in mind."
- Swami Vivekanand, Complete Works

Sanjay Singh

© Sanjay Singh.
Email: sanjay.singh3210@)gmail.com

Images and screenshots of IBM SPSS Statistics software is used in this book for
only learning purposes. The source/credit of any other illustration, image or
resource is duly acknowledged wherever applicable. This book is "work in
progress" and I will keep it continuously updated. Please do not feel offended if
you find typos or errors anywhere and kindly do not rush for an assessment of
IQ & EQ of the author based on some unintentional mistakes that he as a mortal
may commit. I am yet to organize content for many chapters and proof reading
the book is a distant dream. This book is like an experimental release, and it will
take time to give it a final shape. . Kindly behave and do not redistribute content
in unauthorized manner. Any positive comment and suggestion to improve book
is welcome.

2020 © Sanjay Singh


3

Chapter 6 t – test Independent and Paired

54. INDEPENDENT SAMPLE T-TEST – DEFINING INPUT OPTIONS

In this chapter, we will learn how to


calculate differences between two
group means. Whenyou have two
groups to compare and you want to
find whether there are significant
differences between the two groups
or not, you can opt for a mean
comparison between the two groups.
Independent samples t-test is a
Figure 1: Ideependent Samples T Test option in SPSS powerful test for finding out the
group differences between two group
means. To calculate the independent sample, go to Analyze > Compare Means. > Independent
Samples T Test (Figure 1). Look at the symbol of independent sample t test. It reads as ‘t A-B’.
It means the groups that you are comparing - A and B are independent of each other. For
example, suppose you want to make a comparison between the salary of males and females or
population of two cities like Delhi and Mumbai. In this case, the groups are not related to each
other. So, we can opt for an independent sample t-test.

To calculate independent sample t-test, open the ‘Employee data set’ from the SPSS Samples
folder. In this data set, we have the ID's of employees, their gender, birth date, education, job
category, salary, beginning salary, job timing, previous experience and whether they belong to
minority or majority group. Suppose we want to test a hypothesis – ‘There is a significant
difference between salary of males and females’ or suppose the bank wants to find out whether
males are drawing completely much higher salary, as compared to females. To test that, we can
conduct an independent sample t-test. Similarly, the bank wants to find out whether people from
minority category are drawing significantly lesser amount of salary as compared to people from
majority community. In all these situations, it is apt to calculate an independent sample t-test.

We want to compare people across genders. In


this case, Gender has been defined as a string
variable. To calculate any meaningful
quantitative test, we need to define our
variables as numeric variables. Go to Variable
View and you will find Gender defined as a
String variable type. Since it is string, first we

Figure 2 : Converting string variables to numeric (m to 1, in


this case)
© Sanjay Singh
4

need to convert the values. To convert Males into 1, select gender column in Data View and
press Ctrl+F and click on Replace tab. Write 'm' (for males) in ‘Find’ and 1 in ‘Replace with’ to
replace males by 1 (Figure 2). Click Replace all.

You will get a notification –


‘258replacements were made.’
implying there are 258 males in this
study (Figure 3). Similarly, write 'f' in
Find and replace it by 2 in ‘Replace

Figure 3 : Notification after replacing values with’ for renaming gender responses ‘f’
as 2 and then, click Replace All. The
notification after making the replacements shows there are 216 females. Now, we need to
redefine this variable.

Click on the Values tab against the


gender variable. Select Females and
change their value to 2 and click
Change. Click Males and change their
value to 1 and click Change and press
OK (Figure 4).

Figure 4 : Changing string values (gender) to numeric values

Now, we can change the variable type


for gender from String to Numeric
(Figure 5). It gives us little warning –

© Sanjay Singh

Figure 5 : Changing variable type to numeric


5

‘Some value labels or missing value specifications will be discarded for gender’. It only means
that if there are any value labels undefined, they will be discarded Since we are using only
numeric variables now, so discarded alphabets will be okay. Now, the variables have been
defined in Data View.

Again, go to your Analyze > Compare


Means. > Independent Samples T
Test. We want to compare salaries,
based on the gender of the employees.
In the Independent Sample T Test
dialog box (Figure 6), shift Current
Salary to the Test Variable(s) box
(indicating salary as your dependent
variable) and Gender to Grouping
Variable (indicating gender as the

Figure 6 : Independent Samples T Test dialog box independent variable). Once you
defining Grouping Variable as
Gender, you will see two question marks here which imply that you need to define your groups.

Click on Define Groups and write 1 for group 1; 2 for


group 2 (as shown in Figure 7; this implies that 1= male,
2= females). We can also define the cut points instead of
defining the groups. For example, if we want to take a
cut off salary of 10,000, then SPSS will compare two
groups - less than 10,000 and more than 10,000 and will
do a significance testing between these two groups.
Currently, we will use our group definition. Click
Continue and then in the Options, you can select the
confidence interval, which is 95% , by default. If you
Figure 7 : Defining Groups for Grouping Variable
want to change it, you can change it and make it 99
percent, or whichever value you prefer. In this case, we will stick with the default value, 95%. In
Missing Value, we will keep the default value - Exclude cases analysis by analysis – unchanged.

© Sanjay Singh
6

This leads to lesser amount of data loss, as compared to the other option - Exclude cases listwise.
Click Continue. We are not doing Bootstrapping right now. We are looking for a simple result
but if you want to look for the bias corrected confidence interval, you can opt for bootstrapping.
Click OK. Now, all input options for Independent Samples T Test have been defined.

55. INDEPENDENT SAMPLE T-TEST – INTERPRETING DESCRIPTIVE OUTPUT


(MEAN, SD, SE)

This is the Group Statistics


output table of the
Independent sample t-test
on employee data set from
the previous chapter. This
Figure 8 : Group Statistics output for Independent Sample T-Test table provides the
descriptive. It mentions the current salary, number of people in each category along with their
standard deviation and standard error of mean. There are two Gender groups (Male and Female)
in which the current salary, the dependent variable, has been divided here. Current salary is the
independent variable. The sample size is 258 males and 216 females. The average salary drawn
by males is $41,441.78, which is huge, as compared to females who are drawing an average
salary of $26,031.92. So, even if we don't do significance testing and just glance at the means,
we get an impression that there is a huge difference between the salary of males and females and
so we expect a significant effect of gender on salary. Then, the standard deviation for male
employees is $19,499.214 and the standard deviation for females is $7,558.021. Though males
are drawing on an average higher salary, yet the variance or variability of salary is quite high in
male group as compared to female group. In fact, the variance is more than twice.

Independent sample t-test is an inferential test and on the basis of the sample characteristics, we
try to draw inferences about the population. Standard error of mean refers to standard deviation
of various sample means that you draw from the population. Imagine a study inn which we draw
lot of subsamples from the population. You draw one sample from a total of 10,000 employees
and out of those 10,000 people, we randomly draw a sample of 100 employees. Now, this sample
of 100 will have a mean salary and a standard deviation. Again, you draw another 100 sample
that will have another mean and standard deviation. Similarly, as you keep on drawing the
samples from the population, you will get different means for different samples. If there is more
variation in the samples that you are drawing, it means that you are committing an increased
amount of error in the calculation of or estimation of the population mean. This is where the
standard error of means become relevant. Standard error of mean basically describes the average
amount of error when you draw a number of samples from the population. It is quite less as

© Sanjay Singh
7

compared to the average salary but it is higher for the male group, as compare to female group.
Similarly, standard deviation is again high for males group as compared to female group. This
might be due to the outliers, since we did not eliminate the outliers but if you eliminate the
outliers, the standard deviation would definitely reduce. Always, keep in mind that the standard
deviation should not be more than the mean. Similarly, standard error should not be more than
either standard deviation or mean. So, this is the interpretation for the descriptive statistics.
Generally, in journals, you are not supposed to report the entire descriptive table but it is good to
know the concepts for statistical understanding.

56. INDEPENDENT SAMPLE T-TEST – INTERPRETING LEVENE’S TEST, T, P, SE &


95% CI

Figure 9 : Independent Sample T Test Output

Figure 1 shows the Independent Sample test output table for independent simple t-test. The
dependent variable is Current Salary. In front of current salary, there are two classifications –
‘Equal variances assumed’ and ‘Equal variances not assumed’. You may know that t-test is a
parametric test and parametric tests have a hypothesis of homogeneity of variance, thus making
the groups homogeneous. To check whether groups are homogeneous or not, SPSS provides
Levine's test of homogeneity or equality of variance. . Levine's test is basically a kind of F test or
ANOVA. The Levene’s test value is denoted by an F in the output and the F value, here, is
119.669, which is highly significant. If Levene's test is significant, it means that our groups are
non homogeneous because in case of Levine's test, null hypothesis is that there is no significant
difference between the variances of two groups. Since significance value is less than .05, so we
reject the null hypothesis and accept the alternate hypothesis that there are significant differences
between the two groups. This result for ‘Equal variances assumed’ is not applicable only when
the groups are homogeneous. In our case, groups are non homogeneous. So, it is better look for
the second row values - equal variances not assumed. The t-test value is 11.688 and degree of
freedom in 344.26. Calculation of degrees of freedom is distinct in case of equal variances not
assumed, as compared to Equal variances assumed. So, when equal variances are assumed,
degree of freedom is number of observations minus one from each group (df = N-1).

© Sanjay Singh
8

The sample size (N) for males and females is 258 and 216. Deduct 1 from sample size of both
groups and add both values to get the degrees of freedom, .i.e. 472. p-value is less than .001
which means that the effect is significant at α = .001, implying a significant difference between
the salary of male and females. The mean difference, the difference in the average salary, is
15,409.862. So, that is the difference between 41,441 and 26,031.92. Standard error of difference
is in reference to different samples that you draw from the populations. Here, standard error of
difference 1318.40 and 95% confidence intervals of the difference is $12,816 to $18,002. In the
95% percent confidence interval of difference, ensure there is no 0 between this interval.
Otherwise, your conclusion will be unreliable. Since we don't have any 0 here, thus it means that
our conclusion is reliable. After calculation of an independent sample t-test, we found that there
was a significant difference reported between the salary of males and females. On an average,
males are drawing higher salary, i.e. $41,441.78, as compared to salary of females, which is,
$26,031.92. So, that's how we conduct and interpret an independent sample t-test.

57. APA STYLE WRITE-UP FOR INDEPENDENT SAMPLE T-TEST

In this chapter, we will learn how to report the results obtained after t-test calculation in APA
style. APA style is American Psychological Association's prescribed style for writing standard
output in Ph.D. theses or any publication where you want to communicate your results in a
standard manner. To write independent sample t-tests results in APA style, we need to know the t
value, p value and 95% confidence intervals along with average value for the two comparison
groups. To report the output in APA style, start by mentioning which test was use and its
purpose. Write‘t’ followed by degrees of freedom in brackets before writing an equal sign and
the t value. After this, mention the p value. p value, in this case, is less than .001. We report the
minimum possible value of probability value error (Type 1 error). Here, p is .000 which can be
less than .05, .01 or even .001. So, we will report that minimum value here, i.e. .001. Then,
report the upper and lower 95% confidence intervals. Report the means and standard deviation of
salaries for male and female employees. Ensure that all text is in Times New Roman, font size 12
and all symbols are written in Italics (In this case, these are t, p, C. I., M and S.D.).

The results for the independent sample t-test can be reported like this –

'An independent sample t-test reported a significant difference in salary drawn by male and
female employees, t (344.26) = 11.69, p < .001, 95% C.I. [$12,816.73 - $ 18,002.99]. The male

© Sanjay Singh
9

employees are drawing on an average higher salary (M = $41, 441.78, SD = $19,499.21) as


compared to female (M=$26,031.92, SD=$7,558.021) employees.’

While writing your results, be sure that you are reporting the test results in an appropriate format
and clearly describe the outcome or result of your test. Like inn the above write-up, the second
sentence explains the actual outcome in a very crisp, comprehensive manner.

© Sanjay Singh
10

58. WHEN TO USE PAIRED SAMPLE T-TEST

In this chapter, we will learn about paired sample t-test. Paired sample t-test is also known as
correlated sample t-test or dependent sample t-test. It presumes that either repeated measure
design has been used in the study or you are taking the same subject across two different
situations. It is a very useful test to conduct repeated measure inference analysis. Suppose a
researcher wants to understand whether training a student by a particular method leads to
significant improvement in to his/her performance. To study this, the researcher takes 30
students (30 is the sample size recommended for doing parametric test) and measured their
performance, at the beginning of training without any administration of training. He, then,
administered a training program and following the training, he again measures their performance.
Now, he makes a comparison using a paired sample t-test.

Similarly, governments can use paired sample t-test for measuring effect of tourism promotion
for tourists. Tourists’ attitudes toward the country they visit can be measured. They can be asked
questions like whether they like the country and if they are familiar with the culture of the
country. After measuring the attitude toward the country or destination, once they have visited
the country and are about to leave the country, we can again ask them to fill a survey to find out
whether there has been a significant difference in the attitude of tourists when they arrived in the
country and when they were leaving the country. Similarly, when you visit hotels or go to a party
and you want to assess whether visiting a hotel or going to party leads to any improvement in
your mood. You can measure your mood at the beginning once you reach there, and when you
are leaving the hotel or the party, you can again rate your mood and find out whether there has
been a significant improvement in your mood. Similarly, paired sample t-tests can be used at any
place when measuring the difference in condition of same individuals after they have undergone
some training or experiences.

59. CALCULATING PAIRED SAMPLE T-TEST IN SPSS

In the employee
data set (Figure 1),
the variables are
employee data
education, job
category, salary and
we have their
current salary and
Figure 10 : Employee data set (from SPSS Samples folder)

© Sanjay Singh
11

their salary at the beginning, that is, when they joined the company. Suppose a manager wants to
know whether there has been significant improvement in the salary of the employees since they
joined this company. The purpose is to ensure that the employees have been not stagnating while
in the organization. There have been significant improvements in their current salary, as
compared to their baseline salary or the salary that they drew at the time of joining. In this case,
we have a repeated measure kind of situation where for the same ID, we have two observations -
current and at the beginning. So, for all the situations, we are having for all the ID's, we are
having 2 observations.

To conduct a paired sample t-test, go to


Analyze and Compare Means and look at
the Paired Sample t-test. Now, look at the
symbol of paired sample T-test. It's like T
A1-A2. In this case, the groups, A1 and
A2, are part of same group. It means they
are the same individuals but they have
been measured in two different situations.
Figure 11 : Paired Sample T-Test option in SPSS If you look at a symbol of independent
sample t-test, you find the symbol written
as T A-B, implying different individuals who belong to for instance, Delhi and Mumbai. Select
Paired Sample T-test and then, define your pairs in the Paired Sample T-test dialog box.

© Sanjay Singh
12

The pairs are current salary and


beginning salary of the employees.
Shift Current Salary in variable 1 and
beginning salary in Variable 2. This is
to find out differences between current
salary and beginning salary, to form
pair 1. If you have more pairs, you can
Figure 12 : Choosing variables to be paired do multiple pair comparison by
defining your pairs here. Rests of the
options are same as independent sample t-test. If you want to know more about them, you can
look at the chapters on independent sample t-test. Bootstrapping can be done to find out the bias
corrected confidence intervals but we are not doing that right now. Bootstrapping can be done
when you are having a very small sample size and you want to artificially inflate your sample, so
as to find a bias corrected confidence interval.

The arrow, highlighted in Figure 3,


can be used to change the
positioning of the paired variables.
For example, in this case, current
salary is Variable1 and Variable2 is
beginning salary but if you want to
interchange their positions, you can
select the pair and click on the arrow
Figure 13 : Interchanging positioning of variables tab. Beginning salary is now the first
variable and current salary has
become our second variable (Figure 4). The only difference it will make in your data is that it
will give you either a positive t value or negative t value, depending upon the situation. For
example, we know that currently employees must be drawing more salary, as compared to their
beginning salary. So, if you take the beginning salary as the first variable, then you will get a
negative test. Remember the formula for t is difference between group means divided by
standard error. So, the difference between group means is going to be a negative value because
beginning salary was lesser as compared to current salary and you want to avoid a negative t
value. In that case, you can take current salary as first variable. It will not have any implication
for your result or actual conclusion that you draw. It would only going to change the sign of your
t test value. Click OK.

© Sanjay Singh
13

60. INTERPRETING PAIRED SAMPLE T-TEST OUTPUT

Let us learn interpreting


the output for a paired
sample t-test. The first
output table, Pared Sample
Statistics, provides the
Figure 14 : Paired Sample Statistics table descriptive output (Figure
1). The average current
salary is $34,419.57 while at the beginning, it was $17,016.09. Salary has approximately doubled
since the employees joined this company. The sample size is same for both Current Salary and
Beginning Salary as the same individuals’ salaries are observed at two different points in time.
Standard deviation of current salary is much higher, as compared to their beginning salary. This
means that though salary had increased, yet there is a huge variation in the salary. Standard error
for Current Salary is $784.311, while it is $361.51, in case of the Beginning Salary.

When you calculate the


paired sample t-test, you
also get a correlation
coefficient because since

Figure 15 : Paired Samples Correlations you are taking the same


subjects in two different
situations, so there is bound to be a significant correlation between them. The correlations are
reported in the second table, Paired Samples Correlations (Figure 2). In this case, correlation
value is .88, significant at α = .001 level. So, there is quite high correlation between the salary
scores. It implies that the individuals who drew higher salary earlier are also drawing higher
salary, currently. So, there is a pattern in increase or decrease of salary.

Figure 16 : Paired Samples Test output

The average salary for the pair Current Salary - Beginning Salary is $17,403. That is the mean
difference between the salaries. That is calculated by subtracting beginning salary from current
salary, i.e. $34,419 - $17,016 = $17,403. The standard deviation is $10,814, which is a cause for
concern but it is much below the average salary so it is acceptable. Standard error of mean is

© Sanjay Singh
14

$496.732. 95% confidence interval is $16,427 to $18,379. Both the confidence intervals are
positive. Absence of a zero value in this 95% confidence interval means that the outcome is
reliable. Finally, t value (35.036) is very high and degree of freedom is 473 and the result is
significant at α=.001 level. Hence, there has been significant improvement in the salary of
employees since they joined the company.

61. APA STYLE WRITE-UP FOR PAIRED SAMPLE T-TEST

We have learnt how to write the output and its interpretation for an independent sample t-test in
APA format. The APA writing format of the output for a paired sample t-test is quite similar to
writing the output for an independent sample t-test.

Ensure these pointers when writing results for a paired sample t-test –

• Always, write the results in the past tense.


• While writing t value, write the degree of freedom (df) in brackets.
• The formula for degree of freedom for paired sample t-test is df = N – 1. Taking the
example of the employee data set, df= 474 – 1 = 473 and t value is 35.04 and it is
significant at α= .001 level.
• All text should be in Times New Roman font, font size 12.
• Italicize all statistical symbols.
Maintaining these rules, we can report the output for paired sample t-test calculated for employee
data set like this –

“A paired sample t-test suggested that there had been significant increase in the salary of
employees (M = $34,419.57, SD = $17,075.66) since the joined the company (M = $17,016.09,
SD = $7870.64), t (473) = 35.04, p<.001.”

© Sanjay Singh

You might also like