Professional Documents
Culture Documents
© Sanjay Singh
2
'The road to every heaven goes through a hell. Bear this in mind."
- Swami Vivekanand, Complete Works
Sanjay Singh
© Sanjay Singh.
Email: sanjay.singh3210@)gmail.com
Images and screenshots of IBM SPSS Statistics software is used in this book for
only learning purposes. The source/credit of any other illustration, image or
resource is duly acknowledged wherever applicable. This book is "work in
progress" and I will keep it continuously updated. Please do not feel offended if
you find typos or errors anywhere and kindly do not rush for an assessment of
IQ & EQ of the author based on some unintentional mistakes that he as a mortal
may commit. I am yet to organize content for many chapters and proof reading
the book is a distant dream. This book is like an experimental release, and it will
take time to give it a final shape. . Kindly behave and do not redistribute content
in unauthorized manner. Any positive comment and suggestion to improve book
is welcome.
To calculate independent sample t-test, open the ‘Employee data set’ from the SPSS Samples
folder. In this data set, we have the ID's of employees, their gender, birth date, education, job
category, salary, beginning salary, job timing, previous experience and whether they belong to
minority or majority group. Suppose we want to test a hypothesis – ‘There is a significant
difference between salary of males and females’ or suppose the bank wants to find out whether
males are drawing completely much higher salary, as compared to females. To test that, we can
conduct an independent sample t-test. Similarly, the bank wants to find out whether people from
minority category are drawing significantly lesser amount of salary as compared to people from
majority community. In all these situations, it is apt to calculate an independent sample t-test.
need to convert the values. To convert Males into 1, select gender column in Data View and
press Ctrl+F and click on Replace tab. Write 'm' (for males) in ‘Find’ and 1 in ‘Replace with’ to
replace males by 1 (Figure 2). Click Replace all.
Figure 3 : Notification after replacing values with’ for renaming gender responses ‘f’
as 2 and then, click Replace All. The
notification after making the replacements shows there are 216 females. Now, we need to
redefine this variable.
© Sanjay Singh
‘Some value labels or missing value specifications will be discarded for gender’. It only means
that if there are any value labels undefined, they will be discarded Since we are using only
numeric variables now, so discarded alphabets will be okay. Now, the variables have been
defined in Data View.
Figure 6 : Independent Samples T Test dialog box independent variable). Once you
defining Grouping Variable as
Gender, you will see two question marks here which imply that you need to define your groups.
© Sanjay Singh
6
This leads to lesser amount of data loss, as compared to the other option - Exclude cases listwise.
Click Continue. We are not doing Bootstrapping right now. We are looking for a simple result
but if you want to look for the bias corrected confidence interval, you can opt for bootstrapping.
Click OK. Now, all input options for Independent Samples T Test have been defined.
Independent sample t-test is an inferential test and on the basis of the sample characteristics, we
try to draw inferences about the population. Standard error of mean refers to standard deviation
of various sample means that you draw from the population. Imagine a study inn which we draw
lot of subsamples from the population. You draw one sample from a total of 10,000 employees
and out of those 10,000 people, we randomly draw a sample of 100 employees. Now, this sample
of 100 will have a mean salary and a standard deviation. Again, you draw another 100 sample
that will have another mean and standard deviation. Similarly, as you keep on drawing the
samples from the population, you will get different means for different samples. If there is more
variation in the samples that you are drawing, it means that you are committing an increased
amount of error in the calculation of or estimation of the population mean. This is where the
standard error of means become relevant. Standard error of mean basically describes the average
amount of error when you draw a number of samples from the population. It is quite less as
© Sanjay Singh
7
compared to the average salary but it is higher for the male group, as compare to female group.
Similarly, standard deviation is again high for males group as compared to female group. This
might be due to the outliers, since we did not eliminate the outliers but if you eliminate the
outliers, the standard deviation would definitely reduce. Always, keep in mind that the standard
deviation should not be more than the mean. Similarly, standard error should not be more than
either standard deviation or mean. So, this is the interpretation for the descriptive statistics.
Generally, in journals, you are not supposed to report the entire descriptive table but it is good to
know the concepts for statistical understanding.
Figure 1 shows the Independent Sample test output table for independent simple t-test. The
dependent variable is Current Salary. In front of current salary, there are two classifications –
‘Equal variances assumed’ and ‘Equal variances not assumed’. You may know that t-test is a
parametric test and parametric tests have a hypothesis of homogeneity of variance, thus making
the groups homogeneous. To check whether groups are homogeneous or not, SPSS provides
Levine's test of homogeneity or equality of variance. . Levine's test is basically a kind of F test or
ANOVA. The Levene’s test value is denoted by an F in the output and the F value, here, is
119.669, which is highly significant. If Levene's test is significant, it means that our groups are
non homogeneous because in case of Levine's test, null hypothesis is that there is no significant
difference between the variances of two groups. Since significance value is less than .05, so we
reject the null hypothesis and accept the alternate hypothesis that there are significant differences
between the two groups. This result for ‘Equal variances assumed’ is not applicable only when
the groups are homogeneous. In our case, groups are non homogeneous. So, it is better look for
the second row values - equal variances not assumed. The t-test value is 11.688 and degree of
freedom in 344.26. Calculation of degrees of freedom is distinct in case of equal variances not
assumed, as compared to Equal variances assumed. So, when equal variances are assumed,
degree of freedom is number of observations minus one from each group (df = N-1).
© Sanjay Singh
8
The sample size (N) for males and females is 258 and 216. Deduct 1 from sample size of both
groups and add both values to get the degrees of freedom, .i.e. 472. p-value is less than .001
which means that the effect is significant at α = .001, implying a significant difference between
the salary of male and females. The mean difference, the difference in the average salary, is
15,409.862. So, that is the difference between 41,441 and 26,031.92. Standard error of difference
is in reference to different samples that you draw from the populations. Here, standard error of
difference 1318.40 and 95% confidence intervals of the difference is $12,816 to $18,002. In the
95% percent confidence interval of difference, ensure there is no 0 between this interval.
Otherwise, your conclusion will be unreliable. Since we don't have any 0 here, thus it means that
our conclusion is reliable. After calculation of an independent sample t-test, we found that there
was a significant difference reported between the salary of males and females. On an average,
males are drawing higher salary, i.e. $41,441.78, as compared to salary of females, which is,
$26,031.92. So, that's how we conduct and interpret an independent sample t-test.
In this chapter, we will learn how to report the results obtained after t-test calculation in APA
style. APA style is American Psychological Association's prescribed style for writing standard
output in Ph.D. theses or any publication where you want to communicate your results in a
standard manner. To write independent sample t-tests results in APA style, we need to know the t
value, p value and 95% confidence intervals along with average value for the two comparison
groups. To report the output in APA style, start by mentioning which test was use and its
purpose. Write‘t’ followed by degrees of freedom in brackets before writing an equal sign and
the t value. After this, mention the p value. p value, in this case, is less than .001. We report the
minimum possible value of probability value error (Type 1 error). Here, p is .000 which can be
less than .05, .01 or even .001. So, we will report that minimum value here, i.e. .001. Then,
report the upper and lower 95% confidence intervals. Report the means and standard deviation of
salaries for male and female employees. Ensure that all text is in Times New Roman, font size 12
and all symbols are written in Italics (In this case, these are t, p, C. I., M and S.D.).
The results for the independent sample t-test can be reported like this –
'An independent sample t-test reported a significant difference in salary drawn by male and
female employees, t (344.26) = 11.69, p < .001, 95% C.I. [$12,816.73 - $ 18,002.99]. The male
© Sanjay Singh
9
While writing your results, be sure that you are reporting the test results in an appropriate format
and clearly describe the outcome or result of your test. Like inn the above write-up, the second
sentence explains the actual outcome in a very crisp, comprehensive manner.
© Sanjay Singh
10
In this chapter, we will learn about paired sample t-test. Paired sample t-test is also known as
correlated sample t-test or dependent sample t-test. It presumes that either repeated measure
design has been used in the study or you are taking the same subject across two different
situations. It is a very useful test to conduct repeated measure inference analysis. Suppose a
researcher wants to understand whether training a student by a particular method leads to
significant improvement in to his/her performance. To study this, the researcher takes 30
students (30 is the sample size recommended for doing parametric test) and measured their
performance, at the beginning of training without any administration of training. He, then,
administered a training program and following the training, he again measures their performance.
Now, he makes a comparison using a paired sample t-test.
Similarly, governments can use paired sample t-test for measuring effect of tourism promotion
for tourists. Tourists’ attitudes toward the country they visit can be measured. They can be asked
questions like whether they like the country and if they are familiar with the culture of the
country. After measuring the attitude toward the country or destination, once they have visited
the country and are about to leave the country, we can again ask them to fill a survey to find out
whether there has been a significant difference in the attitude of tourists when they arrived in the
country and when they were leaving the country. Similarly, when you visit hotels or go to a party
and you want to assess whether visiting a hotel or going to party leads to any improvement in
your mood. You can measure your mood at the beginning once you reach there, and when you
are leaving the hotel or the party, you can again rate your mood and find out whether there has
been a significant improvement in your mood. Similarly, paired sample t-tests can be used at any
place when measuring the difference in condition of same individuals after they have undergone
some training or experiences.
In the employee
data set (Figure 1),
the variables are
employee data
education, job
category, salary and
we have their
current salary and
Figure 10 : Employee data set (from SPSS Samples folder)
© Sanjay Singh
11
their salary at the beginning, that is, when they joined the company. Suppose a manager wants to
know whether there has been significant improvement in the salary of the employees since they
joined this company. The purpose is to ensure that the employees have been not stagnating while
in the organization. There have been significant improvements in their current salary, as
compared to their baseline salary or the salary that they drew at the time of joining. In this case,
we have a repeated measure kind of situation where for the same ID, we have two observations -
current and at the beginning. So, for all the situations, we are having for all the ID's, we are
having 2 observations.
© Sanjay Singh
12
© Sanjay Singh
13
The average salary for the pair Current Salary - Beginning Salary is $17,403. That is the mean
difference between the salaries. That is calculated by subtracting beginning salary from current
salary, i.e. $34,419 - $17,016 = $17,403. The standard deviation is $10,814, which is a cause for
concern but it is much below the average salary so it is acceptable. Standard error of mean is
© Sanjay Singh
14
$496.732. 95% confidence interval is $16,427 to $18,379. Both the confidence intervals are
positive. Absence of a zero value in this 95% confidence interval means that the outcome is
reliable. Finally, t value (35.036) is very high and degree of freedom is 473 and the result is
significant at α=.001 level. Hence, there has been significant improvement in the salary of
employees since they joined the company.
We have learnt how to write the output and its interpretation for an independent sample t-test in
APA format. The APA writing format of the output for a paired sample t-test is quite similar to
writing the output for an independent sample t-test.
Ensure these pointers when writing results for a paired sample t-test –
“A paired sample t-test suggested that there had been significant increase in the salary of
employees (M = $34,419.57, SD = $17,075.66) since the joined the company (M = $17,016.09,
SD = $7870.64), t (473) = 35.04, p<.001.”
© Sanjay Singh