Professional Documents
Culture Documents
Degrees of Freedom
Degrees of Freedom
A
S we were teaching a multivariate statistics
course for doctoral students, one of the stu- and multivariate analysis, degrees of freedom are a
dents in the class asked,"What are degrees function of sample size, number of variables, and
of freedom? I know it is not good to lose degrees of number of parameters to be estimated; therefore,
freedom, but what are they?" Other students in the degrees of freedom are also associated with statisti-
class waited for a clear-cut response. As we tried to cal power. This research note is intended to com-
give a textbook answer, we were not satisfied and we prehensively define degrees of freedom, to explain
did not get the sense that our students understood. how they are calculated, and to give examples of
We looked through our statistics books to deter- the different types of degrees of freedom in some
mine whether we could find a more clear way to commonly used analyses.
explain this term to social work students.The wide
variety of language used to define degrees ojfrecdom DEGREES OF FREEDOM DEFINED
is enough to confuse any social worker! Definitions In any statistical analysis the goal is to understand
range from the broad, "Degrees of freedom are the how the variables (or parameters to be estimated) and
number of values in a distribution that are free observations are linked. Hence, degrees of freedom
to vary for any particular statistic" (Healey, 1990, are a function of both sample size (N) (Trochim,
p. 214), to the technical; 2005) and the number of independent variables (k)
in one's model (Toothaker & Miller, 1996; Walker,
Statisticians start with the number of terms in 1940; Yu, 1997).The degrees of fi^edom are equal to
the sum [of squares], then subtract the number the number of independent observations {N),or the
of mean values that were calculated along the number of subjects in the data, minus the number of
way. The result is called the degrees of freedom, parameters (k) estimated (Toothaker & Miller, 1996;
for reasons that reside, believe it or not. in the Walker, 1940). A parameter (for example, slope) to be
theory of thermodynamics. (Norman & Streiiier, estimated is related to the value of an independent
2003, p. 43) variable and included in a statistical equation (an
additional parameter is estimated for an intercept iu
Authors who have tried to be more specific have a general linear model). A researcher may estimate
defined degrees of freedom in relation to sample parameters using different amounts or pieces of
size (Trochim,2005;Weinbach & Grinne]],2004), information,and the number of independent pieces
cell size (Salkind, 2004), the mmiber of relation- of information he or she uses to estimate a statistic
ships in the data (Walker, 1940),and the difference or a parameter are called the degrees of freedom {dfi
in dimensionahties of the parameter spaces (Good, (HyperStat Online, n.d.). For example,a researcher
1973).The most common definition includes the records income of N number of individuals from a
number or pieces of information that are free to community. Here he or she has Nindependent pieces
vary (Healey, 1990; Jaccard & Becker, 1990; Pagano, of information (that is, N points of incomes) and
2004; Warner, 2008; Wonnacott & Wonnacott, one variable called income (t); in subsequent analysis
1990). These specifications do not seem to aug- of this data set, degrees of freedom are asociated
ment students' understanding of this term. Hence, with both Nand k. For instance, if this researcher
degrees of freedom are conceptually difficult but wants to calculate sample variance to understand the
are important to report to understand statistical extent to which incomes vary in this community,
analysis. For example, without degrees of freedom, the degrees of freedom equal N - fc.The relation-
we are unable to calculate or to understand any ship between sample size and degrees of freedom is
t,J
1 [rich) 88.974 18.0712
d s
^3
lA
>^
I 13
00
00
Levene's Test for
Equality of Variances
1a
.3
LITERACY Equal variances assumed 14,266 .000
Equal variances not assumed
o
S
rS oS
-TJ (N
p
UJ
o
1
T
00
lA
'^
C-
E
tt;
c
c
Q
Q
# 5"
u
'O
lA
renc
fS -«•
cv r^
I. 2 Q
w
lU
a.
^
LITERACY
^
t
rj
—
IT
o
—
00
q
r-i
<ri
—
Z^
100
"iliJ 1g S1 1UJ .L
O
00
—
—
t
fS
-^
m
IN
n-1
5
p
> E S
I
96.9
Q
are the same (that is,fe-1 = 2 - 1 = \,N-k = 102 value for degrees of freedom, 96.97 (SAS rounds it
- 2 = 100) as the degrees offireedomassociated with to 97), is accurate, and our earlier value, 100, is not.
"equal" variance test discussed earlier,and therefore Fortunately, it is no longer necessary to hand cal-
SPSS does not report it separately. culate this as major statistical packages such as SAS
If the assumption of equal variance is violated and SPSS provide the correct value for degrees of
and the two groups have different variances as is freedom when the assumption of equal variance is
the case in this example, where the folded F test or violated and equal variances are not assumed. This
Levene's F weighted statistic is significant, indicat- is the fourth value for degrees of freedom in our
ing that the two groups have significantly different example, which appears in Table 1 as 97 in SAS
variances, the value for degrees of freedom (100) is and 96.967 in SPSS. Again, this value is the correct
no longer accurate.Therefore, we need to estimate number to report, as the assumption of equal vari-
the correct degrees of freedom (SAS Institute, 1985; ances is violated in our example.
also see Satterthwaite, 1946, for the computations
involved in this estimation). Comparing the Means of g Groups with
We can estimate the degrees of freedom accord- One Parameter (Analysis of Variance)
ing to Satterthwaite's (1946) method by using the What if we have more than two groups to com-
following formula: pare? Let us assume that we have «, + . . . + «
groups of observations or countries grouped by
(//"Satterthwaite =
pohtical freedom (FP^EDOMX) and that we
are interested in differences in their literacy rates
(H-])(»,-!)
(LITERACY, the dependent variable ).We can test
the variability of^ means by using the analysis of
(« - 1 ) 1 - variance (ANOVA).Thc ANOVA procedure pro-
duces three different types ot degrees of freedom,
where n^ - sample size of group \,n^= sample size calculated as follows:
of group 2, and S^ and S^ are the standard deviations
of groups 1 and 2,respectiveIy.By inserting subgroup • The first type of degrees of freedom is called
data from Table 1, we arrive at the more accurate the between-groups degrees of freedom or model
degrees of freedom as follows: degrees of freedom and can be determined by
using the number of group means we want
(64-1) (38-1)
to compare. The ANOVA procedure tests
25.65= X 38 25.65- X 38 the assumption that the g groups have equal
(64-1) 1 - / 25.65^ X 38 + (38-1) / 25.65^ X 38
means and that the population mean is not
V 18.07^ X 64, U 18.07^x64, statistically different from the individual group
means. This assumption reflects the null hy-
2331 pothesis, which is that there is no statistically
25000.96 25000.96
significant difference between literacy rates
63 1 - + 37 in g groups of countries (ji^= ^^ = fX^).The
25000.96 25000.96
h 20897.59; & 20897.59, alternative hypothesis is that the g sample
means are significantly different from one
2331 another. There are g - 1 model degrees of
freedom for testing the null hypothesis and for
.7 2_5_0(i(X96 assessing variability among the^ means.This
45898.55J '''[45898.55
value of model degrees of freedom is used in
the numerator for calculating the F ratio in
2331
63[1 - .5447]^ ANOVA.
• The second type of degrees offreedom,called
2331 2331 the within-groups degrees of freedom or error de-
= 96.97
63 X .207 + 37 X .2967 24.0379
grees offreedom, is derived from subtracting the
Because the assumption of equality of variances is model degrees offreedom from the corrected
violated, in the previous analysis the Satterthwaite's total degrees of freedom. The within-groups
rj
Between Groups
frt
fN
•*
rs
lA
24253.483 12126.74! 16.638 .000
r^
CV
Within Groups
lA
72156.096
rs
lA
728.849
—
>O
O
Total 96409.578
Ml
(TV
cv
O
00
vi
I 3
Subset (a = .05)
-
1 FREEDOMX
f^
II
47.417
II
(N
"0 t:
57.529
II
y
a 'S
—1
84.3! 3
<^
.126 1.000
2
3
o
II i
mbei
J
tical
il
there are (fe + 1) - 1 or it degrees offreedom 1. as explained above. F values and the respective
tor testing the null hypothesis (Dallal, 2003). degrees of freedom from the current regression
In other words, the model degrees offreedom output should be reported as follows: Tbe regres-
equal the number of useful pieces ofinforma- sion model is statistically significant with F(6, 92)
cioii available for estimation of variability in = 44.86,p< .0001.
the dependent variable.
• The second type is tbe residual, or error, degrees Degrees of Freedom in a
offreedom. Residual degrees of freedom in mul- Nonparametric Test
tiple regression involve information of both Pearson's chi square, or simply tbe chi-square sta-
sample size and predictor variables. In addition, tistic, is an example ofa nonparametric test that is
we also need to account for the intercept. For widely used to examine the association between tvi'o
example, if our sample size equals N, we need nominal level variables. According to Weiss (1968)
to estimate k + l parameters, or one regression "the number of degrees offreedom to be associated
cocfiU-ient for each of the predictor variables with a chi-square statistic is equal to the number of
{k) plus one for the intercept. The residual independent components that entered into its calcu-
degrees of freedom are calculated N - {k + lation" (p. 262). He further explained that each cell in
l).This is the same as the formula for the a chi-square statistic represents a single component
error, or within-groups, degrees offreedom and that an independent component is one where
in the ANOVA. It is important to note that neither observed nor expected values are determined
increasing the number of predictor variables by the frequencies in other cells. In other vfords, in
has implications for the residual degrees of a contingency table, one row and one column are
freedom. Each additional parameter to be fated and tbe remaining cells are independent and are
estimated costs one residual degree offreedom free to vary. Therefore, the chi-square distribution
{Dallal,2()03).The remaining residual degrees has ( r - 1) X ( c - 1) degrees of freedom, where r is
offreedom are used to estimate variability in the number of rows and f is tbe number of columns
the dependent variable. in the analysis (Cohen, 1988; Walker, 1940; Weiss,
1968). We subtract one from both the number of
• Tbe third type of degrees offreedom is the
rows and columns simply because by knowing the
total, o r corrected total, degrees offreedom. As in
values in other cells we can tell the values in the last
ANOVA, this is calculated N- \.
cells for both rows and columns; therefore, these last
• Finally, the fourth rype of degrees offreedom
cells are not independent. \
that SAS (and not SPSS) reports under the
parameter estimate in multiple regression is As an example, we ran a chi-square test to exatnine
worth mentioning. Here, the null hypothesis whether gross national product (GNP) per capita
is that there is no relationship between each of a country (GNPSPLIT) is related to its level
independent variable and the dependent vari- of political freedom (FREEDOMX). Countries
able. The degree of freedom is always t for (GNPSPLIT) are divided into two categories—rich
each relationship and therefore, some statistical countries or countries with higji GNP per capita
sofrware.sucb as SPSS,do not bother to report (coded as 1) and poor countries or countries with
it. low GNP per capita (coded as O),and political free-
dom (FREEDOMX) has three levels—free (coded
In the example of multiple regression analysis (see as 1), partly free (coded as 2), not free (coded as 3)
Table 3), there are four different values of degrees (see Table 4). In this analysis, the degrees offree-
of freedom. The first is the regression degrees of dom arc (2 - 1) X (3 - 1) = 2. In other words, by
freedom. This is estimated as (fe + 1) - 1 or (6 + knowing the number of rich countries, we would
1) - 1 = 6 , where k is the number of independent automatically know the number of poor countries.
variables in the model. Second, the residual degrees But by knowing the number of countries that are
offreedom are estimated as N- {k + 1). Its value free, we would not know the number of countries
here is 99 - (6 + 1) = 92.ThinJ, the total degrees that are partly free and not free. Here, we need to
offreedom are calculated N - 1 (or 99 -1 = 98). know two of the three components—for instance,
Finally, the degrees of freedom shown under pa- the number of countries that are free and partly
rameter estimates for each parameter always equal free—so that we will know the number of countries
44.861
(N cs •<?•
00 (N
"^ op CN
2 "T 00 •""
1 Coeffi. enrs
11761.620
262.177
(N o 00
Estimate
16.1919
• p. Q fS rN
1 Standa:
3.2
cc q q p
1
so oo CO
Adjusted /^'
v^ fN 00 fS
o •—1 o 5q
ON CN o p — q
,729
-o v>
CN
Q a
UJ II
u n U
c
effi
c o
24120,239
94689.960
70569.721
Squares
Sum of
part]
0
Vari
S CO „
(N
<1
w CN 00
o
SO 00
D q
,745
CQ
o c
00 o -9
-§
tVari
GNP
00
2
r 1 o
Regression
00 90
c oCO
Residual
c Q ooo
1
Total
Z
2 1 d
SCHC
part
EDFU
PUPl
c.
so
c Q
V (Con Z
t3 0 Qu
00
I U
J2
S
B
o
.2
Mode!
Model
del
-a o
—
1
ON •«• (N — — m
-ii- — "A ^»
O oV
o o o o V V
.20
92
.39
00
.08
IN
I/N
t
— 00 so
•<}• O O
-I JJ -.r ^*j i^ w^
s. P (N CN IN q
.g r^ (*) \o q
.'I ON ^-^ — O
P S
I C
S so
IN
1/5 — (S CN
111 IT p -V 00
o o o
J4 .S ^
S E E
3 3a
Z Z 2 t^
•§
B -0
2 ^
n
-a
Missing
<
:;
GNPSPLIT • FREEDOMX 64.6 192 100.0
FREEDOMX Total
-
GNPSPLIT 0 Count 10 31 35 76
% of Total 8.1% 25-0% 28,2% 61.3%
1 Count 25 10 13 48
% of Total 20.2% 8.1% 10.5% 38.7%
Total Count 35 41 48 124
% of Total 28.2% 33.1% 38.7% 100.0%
Asymptotic, p
Value (2-tailea)
fN
3
(N —
o
o
o
o
o
o
u
that are notfree.Therefore,in this analysis there are Healey.J. F. (1990). Statistics: A tool for social research (2nd
two independent components that are free to vary, ed,), Belmont, CA: Wadsworth,
HyperStat Onlitie. (n,d,). Degrees of freedom. Retrieved May
and thus the degrees of freedom are 2. 30, 2006, from http://davidmiane.com/hyperstat/
Readers may note that there are three values under A42408.htmt
Jaccard,J., & Becker, M. A. (1990). Statistiafor the behavioral
degrees of freedom in Table 4. The first two values sciences (2nd ed.), Belmont, CA:Wadsworth,
are calculated the same vi'ay as discussed earlier and Norman. G. R,, & Streiner, D. L, (20113), PDQ statistics (3rd
have the same values and are reported most widely. ed,). Hamilton, Ontario, Canada: BC Decker,
Pagano. R, R, (2004), Understanding statistics in the behavioral
These are the values associated with the Pearson sciences (7th ed,). Belmont, CA: Wadsworth,
chi-square and likelihood ratio chi-square tests.The Rosenthal, j.A, (2001), Statistics and data interpretaliou for the
helping professions. Belmont. CA: Wadsworth.
fmal test is rarely used. We explain this briefly.The Runyon. R. P,,& Haber,A. (1991). Fundamcntak of behav-
degree of freedom for the Mantel-Haenszel chi-^ ioral statistics (7th ed.). New York: McGraw-Hill,
square statistic is calculated to test the hypothesis Salkind, N.J, (21)04), Statistics for people who (think they)
hate statistics (2nd ed.).Thousand Oaks, CA: Sage
that the relationship between two variables (row Publications,
and column variables) is linear; it is calculated as SAS Institute. (1985). SAS user's guide: Statistics, version 5.1.
(N " 1) X f^, where r is the Pearson product-mo- Cary, NC: SAS Institute.
SAS Institute, (1990), SAS procedure ^uide, version 6 (3nJ
ment correlation between the row variable and the ed.). Cary, NC: SAS Institute, '
column variable (SAS Institute. 1990).This degree Satterthwaite, F. E. (1946), An approximate distribution of
estimates of variance components. Biometrics Bulletin,
of freedom is always 1 and is useful only when both 2, 110-114.
row and column variables are ordinal. Toothaker, L, E., &: MiUer, L, (1996), Introductory statistics
for the behavioral sciences (2nd ed.). Pacific Grove, CA:
Brooks/Cole,
CONCLUSION Trochim,W.M.K. (2005), Research methods:The concise
knowledge base. Cincinnati: Atomic Dog,
Yu (1997) noted that "degree of freedom is an Walker, H,W, (1940), Degrees of freedom.^Hrrti/ of
intimate stranger to statistics students" (p. l).This Ediuational Psychology, 31, 2 5 3 - 2 6 9 .
research note has attempted to decrease the strange- Warner. R. M. (2008). Applied sMti.(fi«,Thousand Oaks,
CA: Sage Publications,
ness of this relationship with an introduction to the Weinbach, R. W., & Griniiell, R. M.,Jr. (2004). Statistics for
logic of the use of degrees of freedom to correctly social workers (6th ed.). Boston: Pearson.
Weiss, R. S. (1968). Statistics in social research:An introdudion.
interpret statistical results. More advanced research- New York: John Wiley & Sons.
en, however, will note that the information provided Wonnacott,T, H., & Woiinacott, R.J, (1990). Introductory
in this article is limited and fairly elementary. As statistics (5th ed,). New York: John Wiley & Sons.
Yu, C. H, (1997), Illustrating degrees of freedom in terms of
degrees of freedom vary by statistical test (Salkind, sample size and dimensionality. Retrieved November
2004), space prohibits a more comprehensive dem- 1,2007, from http://www.creative-wisdotii,
onstration. Anyone with a desire to learn more com/computer/sas/df.hnnl
about degrees offreedomin statistical calculations is Shanta Pandey, PhD, is associate profes.<:or, and Charlotte
encouraged to consult more detailed resources,such Lyn Bright, MSH^is a doctoral student, George Warren Brown
as Good (1973),Walker (1940),andYu (1997). School of Social Work, Washington University, St. Louis. The
Finally, for illustrative purposes we used World authors are thankful to a student whose curious mind inspired
Data that reports information at country level. In them to work on this research note. Correspondence concerning
our analysis, we have treated each country as an this article should be sent to Shanta Pcindey, Geotge Warren
independent unit of analysis. Also, in the analysis, Brown School of Social Work, Washington University, St. Louis,
MO 63130; e-mail:parideYs@unistl.edu.
each country is given the same weight irrespec-
tive of its population size or area. We have ignored Original manuscript received August 29, 2006
Final revision received November 15, 2007
limitations that are inherent in the use of such data. Accepted January 16, 2008
We warn readers to ignore the statistical findings of
our analysis and take away only the discussion that
pertains to degrees of feedom.
REFERENCES
Coheti,J, (1988), Statistical power analysis for the behavioral
sciences (2nd ed,), Hillsdale, NJ: Lawrence Eribauni.
Dallal. G. E, (2003). Decrees of freedom. Retrieved May 30,
2006. trotn http;//www.tufts,edu/~gdailal/dof,hem
Good. LJ. (1973), What are degrees of freedom? Amerkim
Statistician, 21, 227-228.