You are on page 1of 6

Analysis of cross sectional data using SPSS

You have collected data from 157 adults in the worksite, measuring their
physical activity with an accelerometer for 7 days. You want to find out if there
is an association between BMI and physical activity.
Download an SPSS data file called msc_eco2001_data.sav from the
PRM Week 1 afternoon workshop section of Blackboard. You will
see the following variable names:
id
height
weight
sex
age
mcph sucph = average accelerometer counts per hour for Monday to Sunday
Descriptives
Firstly you will want to describe your population, e.g. what is the mean age of
the subjects in the study? To do this use Analyze/Descriptive
statistics/Descriptives. Click on age in the left hand box and then click the blue
arrow to move it to the right hand box. Click OK to carry out the analysis. You
should get a table giving the sample size (N), minimum, maximum, mean and
standard deviation values for age. You can repeat this for all the other variables
of interest. They can all be done together by moving them all into the box at
the same time.
What are the mean values of age, weight and height? Write these
values into table 1.
Define value labels
Numbers are used to code categorical variables. For example, in this dataset
sex is coded 1 or 2. From the drop down menu at the top left of the screen,
select Analyze/Descriptive statistics/Frequencies. Then select sex in the left
hand box and click the blue arrow to move it to the right hand box. Click OK.
You will see the results open in a new window, showing that 49% of the sample
are 1. It is helpful to know what these numbers mean, so we add a label to
them.
To do this for the variable sex, click the Variable view tab at the bottom left
of the screen and in the row sex select "values", then click the grey box. A
new form will appear - type 1 in the box "value" and male in the box "value
label". Click "Add". Repeat, putting the number 2 and female in the respective
boxes. Click "OK".
Click the Data view tab and you will see the data again. To see the labels that
you have made, click the Value labels icon on the bar at the top of the
screen. Now repeat the frequency analysis in the above section and you should
see the variables named. Write how many participants of each sex took part
into table 1 (n= ).

What is the mean age of male and female participants?


There are often several ways in SPSS to summarise data. Here are two ways to
find out the mean age of participants separately by sex.
Method 1
Separate the file into groups using: Data/Split file
Click on Organise output by groups. Now click on sex and the arrow button to
move that variable into the box. Click OK.
As before use Analyse/Descriptive statistics/Descriptives. Click on age and the
arrow button. Click OK. Write into Table 1 the age and standard deviation for
male and female participants. This is usually presented as either 37.2 11.5 or
37.2 (11.5).
Method 2
Remove the split file command that you used in Method 1. Use Data/Split file
and click on Analyze all cases. Click OK.
Use Analyze/Compare means/Means. Now you have to enter two pieces of
data. Move age into the top box (Dependent List). Move sex into the bottom
box (Independent List). Take a look at the options available and select Minimum
and Maximum. Click Continue then OK.
Repeat either of these methods to find out the mean height and weight of the
participants. Enter these into table 1.
Visual checking
The mean values can be strongly influenced by outliers. Also, data should be
reasonably Normally distributed for parametric analysis. One way of checking is
to graph your data. There are many graph options! For now, click
Graphs/Legacy dialogs/Histogram and move age into the top box. Check
the box Display normal curve. Click OK. You should see that the data for age
are reasonably normal and there are no outliers.
Define values for missing data
You may often need to code missing data so that it is not included in
calculations. This is done by, in the appropriate row, clicking the missing cell
then the grey box. Usually a discrete missing value, commonly 0, is used.
Compute the mean of one of the daily physical activity values to see the effect.
Defining missing variables can be particularly important when making means
of several variables (see below).
Other variable parameters
You can define the number of decimal places data is displayed to (default is 2).
It is often neater to use one or no dp, but this does not affect calculations.
Have a play and see how the spreadsheet looks. The majority of other variable
options are not immediately necessary.

What is the mean age, height, and weight ( standard deviation) of male and
female participants? Start to complete the table below.

Table 1. Demographic characteristics of study participants


All (n=

Men (n=

Women (n=

Age
Height
Weight
BMI
Physical activity
(weekday)
Physical activity
(weekend day)
Creating new variables
Often you will need to compute new variables and then create categories for
them e.g. BMI. To create the variable BMI, use Transform/Compute Variable.
The target variable name should be BMI. Work out how to put the appropriate
calculation in the Numeric Expression box. BMI is weight (kg)/height (m)
squared. CLUE: height is presented in cm first use
Transform/Compute Variable to make a new variable which is height
in metres.
Check the distribution of BMI as we did before for age. Is BMI normally
distributed?
Now categorise BMI into 1=normal, 2=overweight, 3=obese. To do this use
Transform/recode into different variables. Enter the input variable name
(BMI), the output name (e.g. BMICAT) and then set the old and new values such
that BMI <25 is normal, 25-30 is overweight and >30 is obese. Dont forget to
create value labels.
Create summary variables for physical activity
How would you look at the accelerometer data? You have daily average values
what would a meaningful analysis time frame be? Looking at the week and
weekend separately might be a good start. To create a summary variable for
the week, use the mean counts per hour for Monday to Friday. Create a new
variable called xwkcph using Transform/compute. Calculate this variable by

moving variables that already exist across and using either the computer
keyboard or the on-screen icons. The calculation should be
MEAN(mcph,tcph,wcph,thcph,fcph)
It important that where there is a 0 value in the data, this is coded as
missing - can you explain why?
Now create a variable for the weekend and for the whole week (all days).
Distribution of physical activity variables
We are nearly ready to investigate the physical activity variables. However, we
need to check that the variables we have created are normally distributed and
that there are no outliers. First make a histogram with normal curve (as in the
section visual checking that you did earlier) for xwkcph. You can see that the
distribution looks a bit messy - it appears slightly positively skewed and there
are a couple of high values on their own. Are these outliers? To check, use a
Boxplot. Use Graphs/chart builder. In the gallery select Boxplot and then
move the icon for the 1D-Boxplot into the chart preview space above. Then
select xwkcph from the Variables list and drag it into the dotted blue box. Click
OK and you will have a boxplot in the output document. Are there any outliers?
Analysis at last!
Is there an association between BMI and physical activity?
In a cross sectional study you often want to look for associations first, and then
investigate these further by looking for differences between groups. To
investigate association you use correlation.
Use analyse/correlate/bivariate and move the variables you want to test
against each other into the box (BMI, xwkcph, xwecph). You have continuous
outcome data so you use Pearsons correlation if you want to know more
about this, why not see what help can tell you.
Look at the correlation matrix in output - how strongly are these variables
correlated, and are correlations significant. How could you best show this
relationship try using graphs (scatter plots).
Is there a difference between weekday and weekend physical activity?
Use a paired samples t-test to investigate this is there a statistically
significant difference? Is this the case in men and women when analysed
separately?
Week
All
Men
Women

Weekend

Is there a significant difference in physical activity between BMI


categories?
Now investigate whether there is a difference in physical activity either during
the week and/or at the weekend between the three BMI groups (i.e.compare
means). Use ANOVA to do this (Analyze/Compare Means/One way ANOVA). The
factor is BMICAT, the dependents are the physical activity variables.
Week

Weekend

<25
25-30
>30
P
If there is a difference between groups, is it statistically significant on either
weekdays or the weekend?
What is the result if analysed separately by gender?
Creating categories for data
We have created categories for BMI using defined cut points between
categories. If these do not exist, it is often useful to create categories, e.g.
quartiles, and look for differences between the upper and lower quartiles.
The latest version of SPSS gives you huge flexibility in how you make different
categories of a variable. To start exploring this use transform/visual binning and
move the variable BMI into the right hand box. Click continue and then click
BMI to move it into the Current Variable box. Give the variable a new name in
the Binned Variable box (eg qBMI). Try to create quartiles (4 categories) of BMI.
CLUE: use the Make Cutpoints button then Equal Percentiles.
Look at the mean values of xwkcph in each quartile of BMI what do you see?
To test whether the value of xwkcph is significantly different between the four
quartiles of BMI we use analysis of variance (ANOVA). To do this use
Analyze/Compare Means/One-Way ANOVA. Move xwkcph into the Dependent
List and qBMI into the Factor box. Use the options button to select Descriptive
statistics. Click OK.
In the output you will see two tables. The first is the descriptives, some of
which you will have seen before. The second box (called ANOVA) tells you
whether the ANOVA model is statistically significant overall. What is the p
value? Is the model significant?
P=.

However, you can see that some quartiles (e.g. 2 and 3) have very similar
values. You may want to find out which groups differ significantly from each
other. To do this, repeat the ANOVA but this time click the Post Hoc button and
select Bonferroni. In the output you can now see a third table called post-hoc
tests. This shows different permutations of the four quartiles tested against
each other. Can you identify which groups differ significantly from each other?

You might also like