You are on page 1of 22

Lesson 2:

Using Graphs to Describe and Explore Date

Time Allotment: One hour and 30 minutes


Introduction
Quite a lot of statistical information is
expressed visually. Let us experiment with
various graphs and plots and explore their
possibilities.
Objectives At the end of this lesson, you shall be able to:

1. differentiate between categorical and continuous variables


2. explain normality of variables
3. identify different graphs and plots
4. follow procedure in creating/editing graphs and plots
5. interpret output from different graphs and plots
Putting your mind into action / Putting your ideas into work

While the numerical values obtained in the previous chapter


provide useful information concerning your example and your
variables, some aspects are better explored visually. SPSS for
Windows provides a number of different types of graphs (referred
to by SPSS as charts). This lesson covers the basic procedures to
obtain the following graphs:
Histograms;
Bar graphs;
Scatterplots;
Boxplots; and
Line graphs.

Spend some time experimenting with each of the different graphs


and exploring the possibilities. A brief overview is given here to
get you started. To illustrate the various graphs, let us use the
survey.sav data file.
Histograms

Histograms are used to display the distribution of a single


continuous variable (e.g., age, perceived stress scores).

Procedures in Creating a Histogram

1. From the menu at the top of the screen click on: Graphs, then click
on Histogram.
2. Click on your variable of interest and move it into the Variable box.
This should be a continuous variable (e.g., total perceived stress).
3. Click on Display normal curve. This option will give you the distribution
of your variable and, superimposed over the top, how a normal curved
for this distribution would look.
4. If you wish to give your graph a title, click on the Titles button and type
the desired title in the box (e.g., Histogram of Perceived Stress scores).
Click on Continue, and then OK.
Interpretation of Output from Histogram

Inspection of the shape of the histogram provides


information about the distribution of scores on the
continuous variable. Many of the statistics discussed in this
manual assume that the scores on each of the variables are
normally distributed (i.e., follow the shape of the normal
curved). In this example, the scores are reasonably
normally distributed with most scores occurring in the
center, tapering out towards the extremes. It is quite
common in the social sciences, however, to find that
variables are not normally distributed, scores may be
skewed to the left or right or alternatively, arranged in a
rectangular shape.
Bar Graphs

Bar graphs can be simple or very complex,


depending on how many variables you wish to
include. The bar graph can show the number of
cases in particular categories, or it can show the
score in some continuous variable for different
categories. Basically you need two main variables.
One categorical and one continuous. You can also
break this down further with another categorical
variable if you wish.
Procedures for Creating a Bar Graph
1. From the menu at the top of the screen click on: Graphs, then Bar.
2. Click on Clustered.
3. In the data in chart are section, click on Summaries for groups on cases. Click
and define.
4. In the Bars represent box, click on other summary function.
5. Click on the continuous variable you are interested in (e.g., total perceived
stress). This should appear in the box listed as Mean (Total Perceived Stress).
This indicates that the mean on the Perceived Stress Scale for the different
groups will be displayed.
6. Click on your first categorical variable (e.g., agegp3). Click on the arrow button
to move it into the Category axis box. This variable will appear across the
bottom of your bar graph (X axis).
7. Click on another categorical variable (e.g., sex) and move it into the define
cluster by: the variable will be represented in the legend. Click on the options
button. Remove the tick from display groups defined by missing values. To do
this, click once on the box. Click on OK.
Interpretation of output from Bar Graph
The output from this procedure gives you a quick summary of the
distribution of scores for the groups that you have requested (in this case,
males, females, from the different groups). The graph presented above
suggested that this difference is more pronounced among the two age
groups. Among the 18 to 29 age group, the difference in scores between
males and females is very small.
Care should be taken when interpreting the output from Bar Graph.
You always look at the scale used on the Y (vertical) axis. Sometimes what
looks like a dramatic difference is really only few scale point and therefore,
probably of little importance. This is clearly evident in the bar graph displayed
above. You will see that the difference between the groups is quite small when
you consider the scale used to display the graph. The difference between the
smallest score (males aged 45 or more) and the highest score (female aged
18-29) is only above three points.
To assess the significance you might find between groups it is
necessary to conduct further statistical analyses. In this case, a two-way,
between groups analysis of variance would be conducted to find out if the
differences are statistically significant.
Scatterplots
Scatterplots are typically used to explore the relationship
between two continuous variables (e.g., age and
self-esteem).

Scatterplot will also indicate whether your variables are


positively related (high scores on one variable are
associated with high scores on the other) or
negatively related (high scores on one are
associated with low scores on the other).

The scatterplot also provides a general indication of the


strength of the relationship between
your two variables. If the relationship
is weak, the points will be all over the place, in a
blob-type arrangement.
Procedures for Creating a Scatterplot

1. From the menu at the top of the screen click on Graphs, then on
Scatter.
2. Click on Simple and then Define.
3. Click on your first variable, usually the one you consider is the
dependent variable, (e.g., total perceived stress).
4. Click on the arrow to move it into the box labeled Y axis. This
variable will appear on the vertical axis.
5. Make your other variable (e.g., total PCOISS) into the box labeled
X axis. This variable will appear on the horizontal axis.
6. You can also have SPSS mark each of the points according to
some Set Markers by box. This will display males and females
using different markers.
7. If you wish to attach a title to the graph, click on the Titles button.
Type in the desired title and click on Continue.
8. Click on OK.
Interpretation of output from Scatterplot

From the output above, there appears to be a moderate,


negative correlation between the two variables (Perceived
Stress and PCOISS) for the sample as a whole.
Respondents with high levels of perceived control,
experience (shown on the X, or horizontal, axis), while
those with lower levels of perceived control have much
greater perceived stress. There is no indication of a
curvilinear relationship, so it would be appropriate to
calculate a Pearson product-moment correlation for these
two variables.

Remember the scatterplot does not give you definite


answers, you need to follow it up with the calculation of
the appropriate statistic (in case: Pearson product-
moment correlation coefficient).
Boxplots
Procedures for Creating a Boxplot
From the menu at the top of the screen click on: Graphs, then click
on Boxplot.
Click on simple. In the Data in Chart Are section click on
Summaries for groups of cases. Click on define button to
move it into the Variable box.
Click on your continuous variable (e.g., total positive effect). Click
the arrow button to move it into the Variable box.
Click on your categorical variable (e.g., sex). Click on the arrow
button to move it into the Categorical axis box.
Click on ID and move it into the Label cases box. This will allow
you to identify the ID numbers of any cases with extreme
values.
Click on the Options button. Remove the mark in Display groups
defined by missing values box by clicking once in the box.
Click on Continue, and then OK.
Interpretation of output from Boxplot
The output from Boxplot gives you a lot of information about the distribution
of your continuous variable and the possible influence of your other
categorical variable (and variable in used).

Each distribution of the scores is represented by a box and protruding lines (called
whiskers). The length of the box is the variable’s interquartile range and contains 50
percent of cases. The lines across the inside of the box go out to the variable’s smallest
and largest values.
Any scores that SPSS considers are outliers appear as little circles with a number
attached (this is the ID number of the case). Outliers are the cases with scores that are
quite different to the remainder of the sample, either much higher or much lower. SPSS
defines points as outliers if they extend more than 1.5 box-lengths from the edge of the
box. Extreme points (indicated with an asterisk,*) are those that extend more than 3
box-lengths from the edge of the box. In the example above there are a number of
outliers at the low values for Positive Affect for both males and females.
In addition to providing outliers, a boxplot also allows you to inspect the pattern of
scores for your various groups. It provides an indication of the viability in scores within
each group and allows a visual inspection of the differences between groups. In the
example presented above the distribution of scores on Positive Affect for males and
females are very similar.
Line graphs
Procedures for Creating a Line Graph

1. From the menu at the top of the screen click on: Graphs, then click on Line.
2. Click on Multiple. In the Data in Chart Are section, click on Summaries for
groups of cases. Click on Define.
3. In the Lines represent box, click on Other summary function. Click on the
continuous variable you are interested in (e.g., total perceived stress). Click
on the arrow button. The variable should appear in the box listed as Mean
(Total Perceived Stress). This indicates that the mean on the Perceived Stress
Scale for the different groups will be displayed.
4. Click on your first categorical variable (e.g., agegp3). Click on the arrows
button to move it into the Category Axis box. This variable will appear across
the bottom of your line graph (X axis).
5. Click on another categorical variable (e.g., sex) and move it into the Define
Lines by: box. This variable will be represented in the legend.
6. Click on the Options button. Remove the risk from Display groups define by
missing values. To do this, click ones on the box.
7. Click on Continue and then OK.
Interpretation of output from Line Graph
The line graph displayed above contains a good deal of
information
First, you can look at the impact of age on perceived stress for
each of the sexes separately. Younger males appear to have
higher levels of perceived stress than either middle age or older
males. For females are only slightly less stressed than the
younger group.
You can also consider the difference between males and
females. Overall, males appear to have high levels of perceived
stress than females. Although the difference for the younger
group is only small, there appears to be a discrepancy for the
older age groups. Whether these differences reach statistical
significance can only be determined by performing a two-way
analysis of variance.

You might also like