You are on page 1of 10

# 1

## SOME NOTES ON STATISTICAL INTERPRETATION

Below I provide some basic notes on statistical interpretation for some selected procedures.
The information provided here is not exhaustive. There is more to learn about
assumptions, applications, and interpretation of these procedures. Further information
can be obtained in statistics textbooks and statistics courses.
Crosstabs:
Crosstab is short for cross-tabulation or cross-classification table. In its basic form it is a
bivariate table. Usually the independent variable is represented by the columns and the
dependent variable is represented by the rows.
One can use any variables with any level of measurement in a crosstab but usually they are
constructed using nominal or ordinal variables. Because interval/ratio variables tend to have
many potential variables, crosstabs are usually impractical for these levels of measurement.
More complex multivariate crosstabs can also be constructed (e.g., where a third variable is
controlled).
The data in crosstabs is usually presented either as percentages, or frequencies. Percentages can
pertain to the cell as a function of either: 1) the column, 2) the row, 3) the total. In constructing a
crosstabulation for a report you should make clear which of these types of percentages are being
calculated. (This can often be done easily by providing a total percentage at the end of the row or
column.)
In providing descriptive interpretation of results one can discuss the relative frequency or
percentage of cases falling in particular cells. Usually this is done in reference to the column
variable. E.g., 35% of women strongly agreed with statement X, while only 15% of men strongly
agreed with statement X.

2
Chi-Square:
Technically this is a test of statistical independence. That is, if two variable are unrelated then
they are independent of one another. If not, they are dependent. Another way of thinking about
this is that they are associated. Chi-square can be used with nominal and ordinal variables. If
the significance value corresponding to the chi-square test is less than or equal to .05, then the
test is deemed to be statistically significant and you can interpret the two variables in the test as
being dependent or associated.

There are several limitations to the chi-square test. Two of these are: 1) the test does not tell you
about the direction of an association (e.g., positive or negative), 2) the test does not tell you about
the strength of an association.
From the chi-square statistic (and its related level of significance) all you can say is that the
variables are statistically associated or not.
You can, however, try to interpret the percentages in the related crosstabulation.
In Table 1, the chi-square is significant. This means that employment status and gender are
statistically associated. The results in the crosstabulation suggest that men are more likely to be
employed full-time.

3
Pearsons Correlation:
Pearsons correlation is a bi-variate measure of association for interval/ratio level variables.
Pearsons correlation ranges from 0 to the absolute value of 1 (e.g. 1 or -1).
A correlation of 0 means that there is no linear statistical association between two variables. A
correlation of 1 means that there is a perfect positive correlation (or linear association) between
two variables. A correlation of -1 means that there is a perfect negative correlation between two
variables. A correlation of .50 means that there is a moderately strong positive correlation
between two variables.
There is also an associated test of significance. If the significance value (p.) is # .05, then the
correlation is deemed to be statistically significant.
In Table 2 the correlation between years of education and personal income is .42, and p. is < .01.
Thus there is a significant, moderately strong positive correlation between education and income.
(Another way of saying this is that there is a significant moderately strongly positive linear
association between education and income.)
In other words, people with higher levels of education tend to earn higher levels of income,
people with lower levels of education tend to earn lower levels of income.

4
Multiple Regression Analysis.
Multiple regression analysis examines the strength of the linear relationship between a set of
independent variables and a single dependent variable (measured at the interval/ratio level).
The R2 provides the proportion of variation in the dependent variable that is explained by the
independent variables in the model. For example, the independent variables in Model 5 of Table
7 explain .20 of the variation in environmentally friendly behaviour, or, converted into a
percentage, they explain 20% of the variation in environmentally friendly behaviour.
There are two types of coefficients that are typically be displayed in a multiple regression table:
unstandardized coefficients, and standardized coefficients.
To interpret an unstandardized regression coefficient: for every metric unit change in the
independent variable, the dependent variable changes by X units. For instance, if income is the
dependent variable, and years of education is one of the independent variables, and the
unstandardized regression coefficient for education is 3,000, then this would mean that for every
additional year of education a respondent has, their income increases by \$3,000.00 (controlling
for the other independent variables in the equation).
In multiple regression, the effects of the independent variables are always net effects
controlling simultaneously for the effects of the other variables in the equation.
One advantage of using unstandardized coefficients is that they have readily interpretable
substantive meaning (such as in the example of education and income given above).
One disadvantage is that the independent variables usually have different metrics (e.g. income in
dollars, age in years, attitudes on a rating scale, etc.). This makes it difficult to compare the
relative influence of different independent variables upon the dependent variable.

Standardized regression coefficients are based on changes in standard deviation units. For
example, in Model 5 of Table 7, for every standard deviation unit increase in activism, the
respondents score on the environmentally friendly behaviour index increases by .18 standard
deviation units.

5
One advantage of using standardized regression coefficients is that you can compare the relative
strength of the coefficients. Generally, the closer to the absolute value of 1 the coefficient is, the
stronger the effect of that independent variable on the dependent variable (controlling for other
variables in the equation). The closer the coefficient is to 0, the weaker the effect of that
independent variable.
For example, in Model 1 of Table 1, Age has the strongest effect on environmentally friendly
behaviour (-.23), while income (log) has the smallest effect (-.08).
(0 means no net effect; under unusual circumstances in multiple regression, standardized
regression coefficients can be greater than the absolute value of 1; in bivariate regression the
standardized regression coefficient also known as Pearsons Correlation Coefficient has a
maximum value of the absolute value of 1.)

## Usually independent variables are measured at the interval/ratio level.

While it is technically not supposed to be done, sometimes ordinal variables (measured in likerttype scales) are treated as interval/ratio level variables and used as independent variables.
It is also possible to include categorical variables as independent variables but they have to be
binarized, and coded as 0 or 1. Also, at least one category has to be left out to serve as a
reference category. Variables coded in this way are referred to as dummy variables.
For example, in Table 7 gender is coded as male = 1, and female = 0.
If one had income as a dependent variable in a multiple regression, and the unstandardized
regression coefficient for gender was 10,000 then (assuming the previous coding scheme) men
would make 10,000 more than women controlling for other variables in the equation.
Another example in Table 7 is Gendpar where female parents are coded as 1, and everyone else
is coded as 0.
It is somewhat more difficult to interpret standardized regression coefficients for dummy
variables because standard deviation unit changes are somewhat meaningless when there are only
two categories. In Model 1 of Table 7, it can be said that there is a significant effect for gender,
females have higher scores for environmentally friendly behaviour.
In multiple regression analysis, significance levels are usually also reported that are associated
with the individual regression coefficients, and also a separate significance level is reported for
the equation as a whole and associated with the R2.

6
Usually .05 is the minimal criterial for indicating a result is significant (though in Table 7, the
level of .10 is also reported.)
For example, in Model 2 of Table 7, the following independent variables are significant at the .05
level: gender, age, and education (squared).
The following variables are not significant at the .05 level: income (log), parent.
In Model 2 of Table 7 the equation as a whole is significant. (See the asterix next to the R2.)

There are a variety of different ways of displaying information in a multiple regression table.
Sometimes a series of models is presented (such as in Table 7) where conceptually similar
variables are grouped together and added in a block, and then different blocks are added in
sequence usually associated with theoretical arguments. This is often referred to as hierarchal
regression analysis.
Sometimes only the results associated with a single model are presented.
Sometimes only the unstandardized coefficients are provided.
Sometimes only the standardized coefficients are provided (this is the case in Table 7).
Sometimes the standard error associated with the coefficient is provided.
Sometimes R2 Changes are provides in association with different models. (This could have been
done in Table 7).
Also, the number of cases used to create the regression model are usually indicated (N).
These are just some of the basics. There is a good deal of additional information to know
associated with assumptions underlying the variables, regression diagnostics, and interpreting
regression equations.
There are also a variety of specialized types of regression equations (e.g. for non-linear effects,
for interaction effects, etc.)

7
Difference in Means and t-test:
When you wish to examine the relationship between a nominal (or ordinal) variable with two
categories that is an independent variable, and a dependent variable that is measured at the
interval/ratio level then an appropriate then an appropriate procedure and test is to examine the
difference in means, and calculate a t-test.

To see the direction of the difference in means just examine the respective means for the two
groups. For the t-test there is an associated significance level. If the significance level is #.05,
then the difference in means is statistically significant.
For example, examine the third row of Table 3. This displays the mean personal income for
women and men. Men made an average of \$46,968 while women made a an average of \$24,268.
This difference is statistically significant (p. # .01). Thus you can conclude that (for this sample)
men make more than women.

8
Univariate Statistics: Frequencies and Percentages:
Often it is useful to provide basic univariate statistics describing key variables. For nominal and
ordinal variables this can be done by providing frequencies and percentages. (There are also a
variety of other useful statistics that will not be discussed here.) Technically, you can also
provide frequencies and percentages for interval/ratio variables but it is usually not practical to
do so because there are so many potential values. (Instead, such data are sometimes portrayed in
graphs.)
When you provide tables of frequencies and percentages you should provide totals.
Also, if there is missing data you should indicate this in the table.
In Table 4, the response category with the largest number of cases is strongly agree. 7 out of
20 people or 35% of the sample selected this response.

9
Univariate Statistics: Means, Standard Deviations, and N
For interval/ratio level variables, one way of summarizing data is to provide means, standard
deviations, and N.
The mean is the arithmetic average of the data. The standard deviation is a measure of how
dispersed the data are. The N is the number of (valid) cases that were used to calculate these
statistics.
In row 2 of Table 5 we see that for this sample the mean years of education were 15.36, and the
standard deviation was 2.17. These statistics were calculated from 183 cases.
The standard deviation means that about 68% of the cases fell between 13.19 and 17.53, and
about 95% of all the cases fell between 11.02 and 19.70.

10
Percentage Tables for Multiple Items:
Sometimes it is useful to provide tables that summarize multiple variables at the same time.
Table 2 does this for some correlations. Table 5 does this for means, standard deviations, and
Ns.
When you have likert-type scales it is sometimes useful to present data in the form of a matrix
with the categories across the top (or columns) and the different questionnaire items down the
side (or rows).
Table 6 does this for the political efficacy items.
For example, for item #4, 35% strongly disagreed, 15% disagreed, 0% had no opinion, 20%
agreed, and 30% strongly agreed.
When the data are displayed this way we can try to discern patterns by comparing across the
items.
In this particular instance the responses look pretty similar across items with lots of responses
in the extreme categories and fewer responses in the middle of the scale (especially for no
opinion).