You are on page 1of 5

CS1703(1805): Data and Information (2015/16)

Dr Timothy Cribbin, Brunel University London

Laboratory Tutorial 1-2: Further data


processing tasks in SPSS v20
In this laboratory tutorial you will:

1. Learn how to compute a new variable as a function of existing variables

2. Practice the processing techniques you have learnt over the previous two sessions

3. Answer questions that require a combination of data processing operations

This is a mandatory tutorial. In order to pass the coursework, you must achieve a score of 75% or
higher on the associated Blackboard quiz (Lab Quiz 1-2).

Preamble
In the first lecture we identified five basic types of data processing tasks:

1. Classification

2. Rearrangement/sorting

3. Summarising/aggregating

4. Performing calculations on data

5. Selection of data

In the first laboratory tutorial (1-1), you learned some techniques which enabled you to perform four
out of these five tasks using SPSS. If you have not successfully completed this tutorial (did you score
100% on the quiz?) then please repeat it until you are confident that you have achieved the learning
outcomes.

So far we have not yet looked at performing calculations on data. In this tutorial you will begin by
finding out how to create a new variable using two or more existing variables. Once you have done
this you will apply all of the techniques that you have learnt to answer some more challenging
questions about the “sleep3ED” dataset.

Exercise 1: Computing a new variable


It is often useful to transform one or more existing variables into a new variable or measure that tells
us something that isn’t so easy to see by examining the original variable(s) alone. For instance,
suppose we want to compare the performance of a number of companies in order to decide where
to make an investment. Comparing on profit alone doesn’t give us the information we need to make
our decision. For instance, company X might generate larger profits than company Y simply because
it is a larger operation backed by more investment. Therefore to find out which company would
provide more return for our pound, a derived measure based on a function such as Return on capital
employed (ROCE) might be more enlightening. ROCE is calculated using the following ratio:
CS1703(1805): Data and Information (2015/16)
Dr Timothy Cribbin, Brunel University London

ROCE = (Profit / Capital) * 100 = X%

Computing this ratio helps us to make our decision by putting profit scores into the same context.
Hence, if the profit of a company is £20000 and the total original investment was £10000 then the
ROCE is 200% (i.e. the achieved a return of twice the original investment). Ranking a larger sample of
companies by ROCE would make our decision much easier than simply comparing companies by
profit (or capital) alone.

There are several opportunities to derive useful variables in the Sleep3ED dataset. For instance, we
might form a hypothesis (an educated guess) that people who don’t sleep enough during the week
feel less refreshed on week days (see the variable “refreshd” and its description in the codebook)
than those who do get enough sleep.

Look at the codebook: Which variable could we use to divide people into those two groups?

Classifying people simply by the hours they sleep on week nights (hourwnit), for example, might give
us a misleading result because people have different sleep requirements (Margaret Thatcher
famously only needed 4 hours a night). However, if we could argue that people who aren’t getting
enough sleep make up for it by sleeping more at weekends. Hence, we could combine hourwnit with
hourwend into a function that outputs a measure of sleep deficit built up during the week:

sleepdef = (hourwend - hourwnit) / hourwend

You can see that people who sleep the same amount of time on weeknights as on weekends will
derive a score of exactly zero from this calculation, whilst those who get less sleep during the week
will get a positive score, increasing with greater sleep deficit to a maximum of one (if they don’t
sleep at all during the week!). We call this a normalised scale – as long as hourwend is greater than
hourwnit, we always get a score between 0 and 1. Of course, there may be people who sleep more
during the week (if that’s possible). What would happen to their score? Does this function work well
for people like this?

Your answers to the following questions can be checked by taking Lab Quiz 1-2 on Blackboard:

Q1: If a participant reported sleeping 5 hours a night during the week and 8 hours a night at the
weekend, what would their “sleepdef” score be (enter answer to 2 d.p.)?

It is possible that some participants in the study sleep longer during the week, perhaps because they
work late shifts at the weekend or they just like to party hard!

Q2: What would be the “sleepdef” score of a participant who reported sleeping 8 hours a night
during the week but only 6 hours a night at weekends (2 d.p)?

Computing “sleepdef” in SPSS


SPSS makes it possible to transform one or more variables into a new variable using a variety of
arithmetic, statistical and other kinds of functions. Computing “sleepdef” is a relatively
straightforward procedure, once you know what is required:

 Go to the menu and follow the path Transform  Compute Variable


CS1703(1805): Data and Information (2015/16)
Dr Timothy Cribbin, Brunel University London

 In the Compute Variable dialogue, type the expression “(hourwend - hourwnit) / hourwend”
into the Numeric Expression text box. Note you can also select and paste (or drag) variable
names from the variable list if this is required. Basic operators can be found in the key-pad
beneath the expression box and more sophisticated functions (e.g. Logarithms, Trig
functions) can found by browsing the Function lists on the right-hand side.

 To enter the name for the new variable, type “sleepdef” into the text box called Target
Variable

 A label and data type can be specified by clicking on the button below the target variable. As
SPSS variables are numeric (not string) by default and so is our variable, we do not need to
set the data type. Feel free to add a more descriptive label to “sleepdef” if you find it easier
to work with.

 Click OK. You should find the new variable in the right-hand column of the table in Data
View.

Most cases should be between zero and one. If you have a different range of values then revisit the
previous steps and repeat the procedure, checking your work carefully.

Exercise 2: Revision of techniques


In this exercise, you will answer a number of questions using the new “sleepdef” variable. Each
question will require you to use one of the processing or analysis techniques we have already
covered. In each case, think carefully about how you might operationalize the question using an SPSS
function, and select accordingly. You may find it helpful to refer to the original questionnaire and
SPSS codebook that you downloaded last week.

Q3: So going back to the earlier question, did participants who said they woke up feeling refreshed
on weekdays have a lower sleep deficit score than those that said they did not?

Q4: What were the mean sleep deficit scores for the “yes” and “no” response groups respectively
(answer to 2 d.p.)?

Q5: What is the median average sleep deficit score for the whole sample (answer to 2 d.p.)?

Hint: The median is the middle value in the sequence of ranked cases (i.e. the 50 th percentile). You
could also use the Statistics option in the Frequencies function (menu path: Analyze  Descriptive
Statistics)

Q6: What is the mean sleep deficit in the 51+ years group (answer to 2 d.p.)?

Q7. Does sleep deficit increase as people get older?

Exercise 3: Putting it all together


In this exercise, you will answer a general question relating to the Sleep3ED dataset. Related to this
are a series of more specific questions, which require you to apply several of the processing
CS1703(1805): Data and Information (2015/16)
Dr Timothy Cribbin, Brunel University London

techniques (i.e. sorting, classification, summarisation, calculation, selection) that you have learnt
over the last two sessions, in order to answer.

General question: Does being overweight have an effect on quality and quantity of sleep?

We have a “weight” variable (measured in kg) already. However, if person X is heavier than person Y,
it does not necessarily mean they are more over-weight. The ideal weight for a person depends to a
great extent on their height. The extent to which a person is under or over-weight is therefore often
measured using the body-mass index (BMI):

BMI = weight / height2 (kg / m2)

You will need to compute the BMI score for all participants in the data sample. You’ll see we have
data for participants’ height, but it is in the wrong format for the formula shown above. You will
need to convert it from centimetres to metres. Can you create an expression that will compute the
BMI in one “Compute” operation or do you need to create a new variable describing height in
metres first?

Hint: You can use brackets to embed inter mediate transformations. For instance: Minutes =
(Milliseconds*1000)*60. Also, the double asterisk operator allows you to compute the square of a
variable directly - i.e. X2 = X**2.

[COMPUTE BMI=weight / ((height / 100)**2)]

A BMI score means nothing by itself. What is a healthy BMI? What constitutes Now try to recode
scores into a new variable called “BMI_gp” using the following categories:

 0 = Underweight – Less than 18.5

 1 = Normal – 18.5 to 25

 2 = Overweight – 25 to 30

 3 = Obese – 30 or higher

Make sure you add appropriate value labels to make the group assignments more readable.

Q8: How many of the study participants are underweight (according to the BMI classification)?

Q9: Does satisfaction with sleep (as measured by the “satsleep” variable) increase or decrease as
participants’ progress from normal weight to obese?

Q10: What is the mean sleep satisfaction rating reported by obese participants (answer to 2 d.p.)?

Q11: Does quantity of sleep reported (both weekday and weekend) generally increase or decrease
as participants become more overweight?

Q12: How many hours, on average, does a person of normal weight sleep on a week night
compared to the weekend (answer to 2 d.p.) ?
CS1703(1805): Data and Information (2015/16)
Dr Timothy Cribbin, Brunel University London

Summary
In this tutorial, you have learnt how to compute new variables using two existing variables and you
have applied all of the basic data processing techniques learnt so far to answer new questions of the
Sleep3ED dataset.

Before you move on to the next lab tutorial, make sure you have successfully answered all the
exercise questions (up to Blackboard Lab Quiz 1-2) and feel generally confident about all aspects of
the material covered. If you have any queries please ask a tutor during your laboratory session.

Also ask yourself: what other insights can you make about the sample through further analysis of the
data? You should try to explore the dataset beyond the exercises presented here, with the aim of
applying the data processing and statistical methods that you have learnt to answer different kinds
of questions using different variables.

Don’t forget to save your data file before you close SPSS.

Further Reading
Pallant, J. (2007) SPSS survival manual : a step by step guide to data analysis using SPSS for Windows
(Version 15), Chapter 3.

You might also like