Course Syllabus

ADVANCED STATISTICAL METHODS
CAPITOL UNIVERSITY (C.U.)

DR. WARREN I. LUZANO
Course Objective:
The objective of this course is to frame management problems in appropriate statistical terms in
order use data to make better decisions. The students will learn to make sense of data along with
the basics of statistical inference and regression analysis and their hands-on implementation
using software. They will develop critical and integrative thinking in order to communicate the
results of the analysis clearly in the context of the problem. Finally, the students will learn to
unambiguously articulate the conclusions and limitations of the analysis with a clear separation
between data and judgment.
Course Outline:
Part I. Descriptive Statistics and Inferential Statistics

1. Descriptive Statistics
2. The Principles of Inferential Statistics
3. The Mechanics of Inferential Statistics
4. An Introduction to SPSS
Part 2. Methods of Statistical Analysis
5. Research Designs or Questions
6. Statistical Methods for Examining Differences
a. Chi-Square
b. Independent t-Test
c. Paired t-Test
d. One-Way ANOVA
e. Independent Sample Factorial
7. Statistical Methods for Examining Associations
a. Pearson’s r Correlation
b. Linear Regression
c. Multiple Linear Regression (Standard, Heirarchical, Stepwise)
SUGGESTED READINGS:
1. Chris Drewberry: Statistical Methods for Organizational Research (2012)

2. Daniel Arkkelin: Using SPSS to Understand Research and Data Analysis (2014)
3. Andy Field: Discovering Statistics using SPSS (2009)
4. Paul Connoly: Quantitative Data Analysis in Education: Using SPSS (2007)
5. Alan Bryman: Quantitative Data Analysis with SPSS (2005)
SUGGESTED JOURNAL ARTICLES FROM GALE CENGAGE LEARNING DATABASES
1. Correlations: how do we ever establish definite causation?, Morton E. Tavel., Skeptical
Inquirer. (September-October 2015) p54
2. Correlations and causation, William F. Vitulli, Skeptical Inquirer. (January-February 2016): p63
3. Learn how to spatially evaluate the T-test, Joseph Berry., GEO World., (Apr. 2013) p10.
4. Calculating and Reporting Healthcare Statistics, 5th Edition (online access included),
ProtoView. (Oct. 2017)
5. Moving beyond correlation, Corn & Soybean Digest., (Aug. 29, 2017)
6. Results Correlation with External Influences, LCGC Europe., Sept. 2017) p490
7. Relationship between economic freedom and stages of development, Sayed M. Elsayed
Elkhouly and Mohamed Gamal Amer, Competition Forum. 13.1 (Jan. 2015): p154
8. IMPACT OF EMPLOYEES' VOICE ON EMPLOYEES' EFFECTIVENESS, Nawaz Ahmad, Adnan Rizvi
and Syeda Nadia Bokhari, Journal of Business Strategies (Karachi)
9. A regression-based approach to library fund allocation, William H. Walters, Library
Resources & Technical Services. 51.4 (Oct. 2007)
10. Constructing analysis of variance (ANOVA), Mathew Mitchell, Journal of Computers in
Mathematics and Science Teaching. 21.4 (Winter 2002): p381.
FULL TEXTS
1. Correlations: how do we ever establish definite causation?, Morton E. Tavel., Skeptical

Inquirer. (September-October 2015) p54
Q: I have read several times in previous issues and in the medical misinformation issue that
"correlation is not causation." As a student skeptic, I am perplexed about why so much health
research is done in the correlation realm, and it seems to me that much of the health advice given by
both alternative medicine and western medicine (science-based we would hope) derives from
correlation studies. If my sense of this is true, how do we as the public ever know when there is a
definite causation and that we should certainly follow the advice? Or are there actually times when
the correlation evidence is so compelling that it should be seriously considered? Headline-grabbing
correlation studies can be very seductive. Some guidance here would be helpful.
Thanks for all the excellent work.
Jim Jackson
W. Townshend, Vermont
A: The questions raised by "correlation is not causation" are of enormous importance. In the attempt
to determine underlying causes of any outcome, we often necessarily begin by studying an entire
population for disproportionately prevalent factors, such as an environmental exposure associated
with those individuals suffering from a given illness. Media reports constantly bombard us with the
purported dangers of exposure to noxious outside forces, from radio waves to cell phones to all sorts
of chemicals. Statisticians and epidemiologists can inform us there may be a significant correlation
between an environmental exposure and a given disease, but does that mean the environmental
event is the cause? Are there rules that can help us determine whether a given preceding event may
actually be the cause?
To demonstrate, let's explore the relationship between cigarette smoking and lung cancer: We begin
by observing that cigarette smokers in high numbers succumb to lung cancer. Applying simple
statistical methods, we note that this high incidence of cancer cannot be explained simply by chance
alone, thus establishing a "correlation." But what can we then conclude? Does the smoking itself
cause the cancer? Or are there other independent factors in smokers linked to cancer? Perhaps a
physical or genetic predilection to cancer also causes one to desire cigarettes. Or perhaps the actual
presence of cancer produces the desire to smoke. In these latter two instances, even though there is
correlation between smoking and cancer, it does not prove the smoking itself is the underlying cause
of the malignancy.
Ideally we could answer such questions of causation by designing a rigorous prospective scientific
study: For instance, we could take a few hundred non-smokers and divide them equally and
randomly into two groups. One group is instructed to begin smoking and the other to remain smoke
free. After perhaps a twenty-year period of observation, we would compare the rates of cancer in
each group to form definite conclusions. But this form of proof is, for obvious reasons, not only
impossible but ludicrous!
How Do We Proceed in the 'Real World'?

Environmental questions noted above are difficult because proof of cause is seldom available. In a
classic report, Hill (1965) presented guidelines for assessing likely causation. He pointed out the
following:
There are, of course, instances in

which we can reasonably answer
these questions [about cause and
effect] from the general body of
medical knowledge. A particular, and
perhaps extreme, physical environment
cannot fail to be harmful; a
particular chemical is known to be
toxic to man and therefore suspect on
the factory floor. Sometimes, alternatively,
we may be able to consider
what might a particular environment
do to man, and then see whether
such consequences are indeed to
be found. But more often than not
we have no such guidance, no such
means of proceeding; more often
than not we are dependent upon
our observation and enumeration
of defined events for which we then
seek antecedents. In other words we
see that the event B is associated
with the environmental feature A,
that, to take a specific example, some
form of respiratory illness is associated
with a dust in the environment.
In what circumstances can we pass
from this observed association to
a verdict of causation? Upon what
basis should we proceed to do so?
Hill then presented a series of guidelines; the major ones are listed below in order of importance.
Again, we refer to the example of cigarettes and cancer:
1. Strength of association--the degree to which a certain disease is increased following a given

exposure. Hill noted that prospective inquiries established the death rate from lung cancer in
cigarette smokers was nine to ten times the rate in non-smokers, and the rate in heavy smokers was
twenty to thirty times that of the base value. This would be considered a strong association as
opposed to a twofold rise in incidence.
2. Consistency of association--whether the association has been repeatedly observed by different

persons and in different places, circumstances, and times. The relationship between smoking and
cancer is the same in hundreds of studies derived from a wide variety of situations and techniques.
Repeated exposure to smoke in nonsmokers also increases their risk of cancer.
3. Specificity of association--whether a specific disease is related to a single type of exposure. Even

though exposure to cigarette smoking is associated with other maladies, most notably
cardiovascular disease, it is by far best correlated specifically with lung cancer. And although lung
cancer occurs in individuals lacking such exposure, it is quite rare.
4. Temporal relationship of association--whether the unfavorable outcome follows the suspected
noxious culprit. Which is the cart and which is the horse? A rather far-fetched example of this would
be to ask whether smoking preceded the onset of cancer, or was smoking taken up as an
inexplicable reaction to an already established cancerous condition.
5. Biologic gradient--whether there is a dose-response curve. As implied in example 1, the death

rate from lung cancer rises linearly with the number of cigarettes smoked daily. This fact, in itself, is
quite incriminating.
6. Biologic plausibility and coherence--the existence of other biologic evidence that supports a
causal explanation. We know that exposure to agents contained in cigarette smoke can cause
cancer in other organs, thus implicating the likelihood of the same effect on the lungs. Cigarette
smoke has been associated with cancers in skin, urinary tract, oral and nasal cavity, esophagus,
larynx, pancreas, stomach, cervix, and colon, and it is even related to certain types of leukemia.
Moreover, cancer can be induced in laboratory animals by exposure to cigarette smoke. However,
the argument for biologic plausibility is not always possible because it often depends on the
experimental information available in the same era. Thus we may seek and obtain biologic
confirmation in the laboratory after the fact.
7. Experimental confirmation--by intervening to eliminate the suspected offender, we can show that
the disease in question is prevented or eliminated. Repeated studies have shown that cessation of
smoking reduces the rate of cancer.
Hill goes on to recognize that statistical significance between an environmental factor and a given
disease does not provide proof of causation. He cites interference by confounding factors such as
selection bias or inadequate sample sizes. Finally, he considers what level of evidence might justify
preventive actions. He admits, as have many others, that this complex challenge requires the
consideration of such issues as the cost of interventions, the likely benefits if preventive measures
are successful, and the strength of evidence supporting a causal relationship. In other words,
evidence or belief that a causal relationship exists is not itself sufficient to suggest taking action.
Conversely, uncertainty about whether there is a causal relationship, or even an association, is not
sufficient to suggest action should not be taken. Each circumstance may dictate a different
response. Clearly, there are no easy answers.
We cannot escape the overwhelming evidence establishing cigarettes as a cause of lung cancer.
Similar evidence, not detailed here, links smoking almost as convincingly to cardiovascular disease.
Nevertheless, when bombarded by alleged relationships between other environmental exposures,
the reader is cautioned not to accept them without serious skepticism. One must usually insist at
least upon confirmation from multiple sources under differing circumstances.
These guidelines, as established by Hill, remain appropriate to this day and can provide us all with a
basis for understanding all the allegedly "scientific" information constantly surrounding us daily.
EDITOR'S Note: We thought the following question from a reader interesting and important enough
to provide a response. We asked Morton E. Tavel, MD, author of the lead article in our recent
special medical misinformation issue, to answer.
Morton E. Tavel, MD
Reference
Hill, Austin Bradford. 1965. The environment and disease: Association or causation? Proceedings of
the Royal Society of Medicine 58: 295-300.
Morton E. Tavel, MD, is Clinical Professor Emeritus, Indiana University School of Medicine, and
author of Snake Oil Is Alive and Well. He wrote "Bias in Reporting of Medical Research: How
Dangerous Is It?" in our medical misinformation issue [Mau/June 2015).
Tavel, Morton E.
2. Correlations and causation, William F. Vitulli, Skeptical Inquirer. (January-February 2016): p63
Kudos to Dr. Morton E. Tavel for his article

"Correlations: How Do We Ever Establish DefiniteCausation?" in SI (September/October 2015, pp.
54-55). His reference to Hill's 1965 report by example reinforced grounds for crucial conditions in
cause-effect assumptions regarding the linkage between cigarette smoking and cancer. The query
from Jim Jackson regarding "correlation is not causation" is raised often in classrooms and scientific
critiques. Yet the quotation minimizes the probability that under bounded conditions, as explained in
Tavel's article (e.g., see Hill's seven guidelines, point 2 "Consistency of association," p. 55),
correlation may be one indicator of "causation." That is, if all known controls are implemented and if
a statistical level of significance (p < .001) is agreed upon, then independent and dependent
variables would be both correlated and assumed causally related.
In effect, then, the phrase, namely, "correlation is not causation" should be modified. A qualified
expression is as follows: correlation does not guarantee causation. At issue are necessary
(correlated) and sufficient (controlled) conditions. Essentially, "causation," even with appropriate
controls, is a tentative, statistically probable conclusion. Yet future research may reveal unexpected,
extraneous intervening variables that require additional controls and re-evaluation. As British-
Empiricist philosopher David Hume asserted, causality is couched in the probabilities of events, not
in certainty (paraphrased).
William F. Vitulli, PhD
3. Learn how to spatially evaluate the T-test, Joseph Berry., GEO World., (Apr. 2013) p10.
Last month's "Beyond Mapping" column provided everything you ever wanted (or maybe never
wanted) to know about the "map-ematical" framework for modern spatial statistics. Its historical roots
are in characterizing spatial patterns formed by the relative positioning of discrete spatial objects:
points, lines and polygons. However, spatial data mining has expanded the focus to the direct
application of advanced statistical techniques in the quantitative analysis of spatial relationships that
consider continuous geographic space.
[ILLUSTRATION OMITTED]
From this perspective, grid-based data are viewed as characterizing the spatial distribution of map
variables as well as the data's numerical distribution. For example, in precision agriculture, GPS and
yield monitors are used to record the position of a harvester and the current yield volume every
second as it moves through a field (Figure 1). These data are mapped into grid cells comprising the
analysis frame georegistered to the field to generate the 1997 Yield and 1998 Yield maps shown in
the figure (3,289 50-foot grid cells covering a central-pivot field in Colorado).
The deeper-green appearance of the 1998 map indicates greater crop yield over the 1997 harvest--
but how different is the yield between the two years? Where are there greatest differences? Are the
differences statistically significant?
Each grid-cell location identifies the paired yield volumes for the two years. The simplest comparison
would be to generate a difference map by simply subtracting them. But it doesn't go far enough to
determine if the differences are "significantly different" within a statistical context.
An often-used procedure for evaluating significant difference is the paired T-test that assesses
whether the means of two groups are statistically different. Traditionally, an agricultural scientist
would sample several locations in the field and apply the T-test to the sampled data. But the yield
maps, in essence, form continuous sets of georegistered sample plots covering the entire field. A T-
test could be evaluated for the entire set of 3,289 paired yield values (or a sampled subset).
However, the following discussion suggests a different strategy that enables the T-test concept to be
spatially evaluated to identify 1) a continuous map of localized T-statistic metrics and 2) a binary
map of the T-test results. Instead of a single scalar value determining whether to accept or reject the
null hypothesis for an entire field, the spatially extended statistical procedure identifies where it can
be accepted or rejected--valuable information for directing attention to specific areas.
The key to spatially evaluating the T-test involves an often-used procedure involving the statistical
summary of values within a specified distance of a focal location: a "roving window." The lower
portion of Figure 1 depicts a five-cell roving window (73 total cells) centered on column 33, row 53 in
the analysis frame. The pair of yield values within the window is shown in the Excel spreadsheet
(columns A and B) on the right side of Figure 1.
Getting Mean
Figure 2 shows these same data and the procedures used to solve for the T-statistic within the
localized window. They involve the ratio of the "mean of the differences" to a normalized "standard
deviation of the differences." The following are the equation and solution steps:
[FIGURE 2 OMITTED]
[T.sub.Statistic] = [d.sub.Mean] / ([d.sub.Stdev / Sqrt(n))
Step 1. Calculate the difference ([d.sub.1] = [y.sub.1] - [x.sub.1]) between the two values for each
pair.
Step 2. Calculate the mean difference of the paired observations, [d.sub.Avg].

Step 3. Calculate the standard deviation of the differences, [d.sub.Stdev].
Step 4. Calculate the T-statistic by dividing the mean difference between the paired observations by
the standard deviation of the difference divided by the square root of the number of paired values:
[T.sub.Statistic] = [d.sub.Avg] / ([d.sub.Stedev / Sqrt(n)).
However, Figure 3 shows what really happens in the grid-based map-analysis solution. Instead of a
roving Excel solution, steps 1-3 are derived as separate map layers using fundamental map-analysis
operations. The two yield maps are subtracted on a cell-by-cell basis, and the result is stored as a
new map of the difference (step 1).
[FIGURE 3 OMITTED]
Then a neighborhood-analysis operation is used to calculate and store a map of the "average of the
differences" within a roving five-cell window (step 2). The same operation is used to calculate and
store the map of localized "standard deviation of the differences" (step 3).
The bottom-left portion of Figure 3 puts it all together to derive the localized T-statistics (step 4). Map
variables of the mean and StDev of the differences (both comprised of 3,289 georegis-tered values)
are retrieved from storage, and the map-algebra equation in the lower left is solved 3,289 times--
once for each map location in the field. The resultant T-statistic map displayed in the bottom-right
portion shows the spatial distribution of the T-statistic, with darker tones indicating larger computed
values (see "Author's Note 1").
Broader Points
At first encounter, the idea of a T-test map may seem strange. It concurrently considers the spatial
distribution of data as well as their numerical distribution in generating a new perspective of
quantitative data analysis. Although the procedure itself has significant utility, it serves to illustrate a
much-broader conceptual point: the direct extension of the structure of traditional math/stat to map
analysis and modeling.
The use of fundamental map-analysis operations in a generalized map-ematical context
accommodates a variety of analyses in a common, flexible and intuitive manner. Also, it provides a
familiar mathematical context for conceptualizing, understanding and communicating the principles
of map analysis and modeling: the SpatialSTEM framework.
Joseph Berry is a principal in Berry & Associates, consultants in GIS technology. He can be reached
via e-mail at jkberry@du.edu.
Author's Note: 1) Darian Krieter with DTSgis developed an ArcGIS Python script calculating the
localized T-statistic (www.innovativegis.com/basis/MapAnalysis/Topic30/PythonT); 2) see
www.innovativegis.com/basis/Papers/Online_Papers.htm for a link to an early paper, "A
Mathematical Structure for Analyzing Maps"; for a more-detailed discussion, see the online book,
Beyond Mapping III, Topic 30, "A Math/Stat Framework for Map Analysis," at
www.innovativegis.com/basis/MapAnalysis.
Berry, Joseph
4. Calculating and Reporting Healthcare Statistics, 5th Edition (online access included),
ProtoView. (Oct. 2017)
Calculating and Reporting Healthcare Statistics, 5th Edition (online access included)
Loretta A. Horton
AHIMA
2017
422 pages
$99.95
RA407
Updating her textbook for two-year or four-year health information technician or health administration
programs, Horton describes how and why statistics are calculated and used. She provides exercises
in compiling information such as important service days, length of stay, and occupancy and mortality
rates. Among her topics are mathematics review, percentage of occupancy, statistics computed with
the health information management department, descriptive statistics in healthcare, and inferential
statistics in healthcare. ([umlaut] Ringgold, Inc., Portland, OR)
5. Moving beyond correlation, Corn & Soybean Digest., (Aug. 29, 2017)
Byline: Dan Frieberg 1
Throughout Premier Crop's nearly 20-year history, we've perhaps been the most diligent at
communicating that what we do - big data analysis. Within the scientific community, that would be
considered "observational data analysis."
Early on, my slide deck featured this chart - a photo on the left from my earlier years versus today -
to explain that observational data analysis can show us relationships and correlations. But, it stops
short of proving cause and effect.
Within crop production and agronomy scientific circles, making decisions using observational data
analysis has been viewed as inferior and some would argue it's an informed guess. It has only been
within the last few years - with validation of big data analytics from the dramatic investments by
major ag companies - that the idea that millions of yield observations might be valuable in crop
production decision-making.
For decades, the foundations of agronomic knowledge have been the results of small randomized
replicated plots. That experimental design and the statistical analysis dates back to 1930's, with Sir
Ronald Fisher's analysis of variance (ANOVA), whom many consider the father of modern statistics.
All universities and industry companies have adopted and use replicated plots as the gold standard
for conducting trials and proving that a change in treatment actually causes a change in yield.
Premier Crop has now added randomization and replication to our Learning Block concept with an
offering we are calling Enhanced Learning Blocks. It allows us to move beyond correlation and
actually prove causation.
With over 500 successful trials in the 2016 crop year, we are now able to scientifically prove the
value of using variable rate technology in a grower's operation. And the results are dramatic. In
several examples, customers with 5 replications of 3 different planting rates were able to prove that
the ideal seeding rate varied over 9,000 seeds per acre, with resulting yield results greater than 40
bushels per acre. That's a swing of over $30/acre in seed cost and $140/acre in revenue.
The beauty of these experiments is that the technology you've already purchased in the cab does all
the work - there is no slowing down or adding any hassle to do a plot. You plant, apply and harvest
the same with the technology executing the experiment. We refer to Enhanced Learning Blocks as
"knowledge creation at the speed of farming"!
Highest rate 36K - 260 bu/ac - Okoboji soil type
Lowest rate 27K = 216 bu/acre - Harps soil type
6. Results Correlation with External Influences, LCGC Europe., Sept. 2017) p490
In the previous instalment, we presented a case of periodically fluctuating data that did not lend itself
well to trend analysis with conventional statistical methods. The data did appear to have a strong
regular fluctuation, but its relationship to other observations was not clear. This instalment addresses
methods for teasing out external influences on trending data.
Chromatographers, and many others, sometimes encounter questions of whether a series of data
over a period of time changes in ways that are significant for the observed process, or whether the
changes have been influenced by other factors external to the process. Some possible factors
include environmental temperature and pressure, power line voltage and frequency, gas purity, gas
pressure and flow control, and instrument health. The first defense against unwanted experimental
influences is proper setup, operation, and maintenance of, for our purposes, all of the equipment
associated with gas chromatography (GC) analysis. When something does go wrong--and its a good
idea to assume problems will inevitably occur--it's time for logical troubleshooting. Sometimes close
examination of suspect data will reveal a lot about what's going on, and may help avoid unnecessary
diagnostics, parts replacement, and time lost.
A Time-Series Analysis
Time-series analysis refers to a number of mathematical and statistical methods of examining data
that has a regular, time-based aspect. A set of peak retention times, areas, and related variables
that were measured at regular intervals comprise a time series. The four-day data set that was
presented in the previous instalment of "GC Connections" (1) is also such a series. Figure 1
presents those data, along with additional measurements that extend the time span forwards and
backwards. The time span is extended from four days to a week, with measurements occurring
every 2 h.
A quick visual examination of the concentrations, following the blue dashed line, reveals daily
fluctuations that increase in magnitude through 1 April. The ambient temperatures in the upper plot
of Figure 1 clearly exhibit expected regular daily changes. The peaks and troughs in the
concentration levels also appear to correlate with the temperature data, but a closer look begins to
uncover some inconsistencies.
First, while the daily ambient temperature and concentration fluctuations seem well related, there is
a long-term upwards trend in the concentrations not reflected by the ambient temperature. And
second, the timing of the daily high and low fluctuations does not match well. For example, the
temperature lows early on 29, 30, 31 March, and 1 April lead the highs in concentration by 2-4 h.
The concentration lows lag behind the temperature maxima by a longer time period. Perhaps, then,
the relationships between the two variables are not so straightforward.
Some common statistical analysis techniques for time series can yield more-precise information
about the relationships or lack thereof between the temperature and measured concentrations. The
simplest one, a moving average, is represented in Figure 1 by the dashed blue line. The number of
points in the average span was chosen so that the line would follow the data points fairly closely
while removing much of the point-to-point noise. As already mentioned, this construction did not
provide a clear answer for the concentration-temperature relationship and instead we have to ask
additional questions about it.
In the sections that follow, the math behind the graphs is not elucidated, for reasons of space as well
as for mercy on the typesetter. For the interested reader, please see the large volume of information
available on these topics in general in any number of statistical works both in print and online. The
graphs and some additional information shown here were generated using The Comprehensive R
Archive Network (CRAN R) (2).
Correlation
Correlation refers to a number of mathematical techniques that can provide insight about
relationships between two or more variables. Let's take a look at some mathematical and graphic
methods for this discussion.
One of the simplest ways to spot a strong relationship between two variables is to plot one as a
function of the other, by placing one variable on the x-axis and another on the y-axis. Figure 2 is
such a plot for the ambient temperature and the measured concentration. I see this technique
applied often: it is very easy to make such a graph using common spreadsheet programs. Here,
however, we have a bit of a problem. The data points are well scattered across the total range of
values. There is the appearance of an inverse relationship, as might be expected from the high-low
nature of the data as seen in Figure 1.
There are fewer low concentrations at the lower temperatures, but most of the values cluster around
the centre of the region. The standard correlation coefficient for two data sets, known as the Pearson
correlation coefficient, calculates to -0.24, which reflects the situation seen in Figure 2. The negative
correlation coefficient means that there is some inverse correlation, but its magnitude is small. The
correlation is not particularly strong.
An assumption behind these graphical and mathematical correlations is that the data are related in a
linear and directed manner. The problem here is that while the data tend to go consistently in
opposite directions, the values are not strongly related linearly. We cannot choose a particular value
for the temperature and predict the concentration with much success.
Autocorrelation
Autocorrelation relates the values that a variable takes at any location over a time period or other
regular interval to previous and future values of the same variable. It is a useful technique for
characterizing periodicity in data and so could be well suited to a better understanding of the current
test data.
Figure 3 illustrates two types of autocorrelation for the temperature data. The first, Figure 3(a), is the
standard autocorrelation, in which the magnitude of each bar represents how well data values are
predicted or are related to others with the indicated time lag or spacing away. For a lag of 0, the data
are perfectly represented by themselves and so the autocorrelation is 1.0. As the lag increases, the
autocorrelation decreases to near-zero at 6 h and then down to about -0.5 at 12 h. It continues this
way as the lag increases and ultimately presents a damped sine wave with a period of 24 h. This
then is the main periodicity of the ambient temperature cycle, which is not at all surprising.
Figure 3(b) shows a so-called partial autocorrelation of the same data. This plot represents the
residual autocorrelation coefficients after removal of the prime correlation at a time lag of zero. The
dashed lines in Figures 3(a) and 3(b) represent the 95% confidence limits for significance of the
correlations. Values that lie between the confidence limits are close to the estimated noise level of
the data and can be treated as insignificant. Figure 3(b) conveys additional periodicities in the data
of 2, 4, 6, and 26 h.
The concentration data, despite its appearance of periodicity, presents a different picture in its
autocorrelations. Figure 4 gives the autocorrelation and partial autocorrelation of the concentration
data. In this case, autocorrelation shows no real sine-wave format and instead decreases steadily
from zero to 12 h of time lag, remains constant out to 24 h, and then drops off at larger lags. This
appearance is characteristic of a longer-term upwards trend in the data, which is supported by its
appearance in Figure 1(b). The partial autocorrelation of the concentration data in Figure 4(b) does
show a strong periodicity at 2 h, and some additional activity at 6, 12, and 26 h. Interestingly, these
values correspond with the partial autocorrelation of the temperature data in Figure 3(b). Overall,
however, these correlations are small and cannot be given much significance.
Cross-Correlation
Finally, let's take a look at how well the temperature and concentration data correlate with each other
when considered as a paired time series. Each temperature measurement has a corresponding
concentration measurement at the same time. Figure 5 shows the result. Here the lag spans both
positive and negative (leading) intervals. The temperature is the first variable and concentration is
the second. Significant correlations are present where the temperature leads the concentration by
about 4 h--that is, where the lag = -4. There is also a small correlation significance at a lag of about 8
h. Or, if we imagine sliding the lag forward by +4 h, then the strongest correlation between the
temperature and concentration would sit at zero lag. The sinusoidal nature of the temperature makes
a clear pattern as well.
If this meta-analysis found a direct correlation between external temperatures, or other factors, and
the instrument's performance, there would be some good indications of where to look first in a
troubleshooting exercise. In this case, it makes sense to look instead at causes of variability in the
process under measurement.
Conclusion
In this second look at process GC data where instrument results might depend on external
influences, it appears that this is not the case, or at least there is not a strong correlation between
the two either as linear variables or as time series. Some minor periodic relationships were
uncovered, but these are not necessarily directly causal via thermal influences on the
chromatographic equipment. A strong possibility is the influence of ambient conditions on the
monitored process itself. This supposition is supported by the time lag that was found between the
temperature and the measured concentrations.
References
(1) J. Hinshaw, LCGC Europe 30(7), 358-361 (2017).
(2) The Comprehensive R Archive Network (CRAN R Project), https://cran.r-project.org/, v. R-3.4.1,

July, 2017.
John V. Hinshaw, GC Connections Editor
"GC Connections" editor John V. Hinshaw is a Senior Scientist at Serveron Corporation in

Beaverton, Oregon, USA, and a member of LCGC Europe's editorial advisory board. Direct
correspondence about this column to the author via e-mail: LCGCedit@ubm.com
Caption: Figure 1: Gas concentrations and ambient temperatures measured every 2 h over a one
week interval. (a) Ambient temperatures in [degrees]C; (b) Ethane concentrations in parts-per-million
(ppm). The dashed blue line shows how the concentration data appears to trend. The red squares
and blue circles are the first and second sets of data that were discussed in the previous installation.
The triangles are additional data points added to extend the time series and increase its statistical
significance.
Caption: Figure 2: Plot of the data in Figure 1 with the ambient temperature on the x-axis and the
measured concentration concentration on the y-axis. Number of points = 83.
Caption: Figure 3: Autocorrelation of the ambient temperature time series. (a) the autocorrelation
factor for each time lag from 0 to 38 h; (b) the partial autocorrelation of the same data.
Caption: Figure 4: Autocorrelation of the concentration time series. (a) the autocorrelation factor for
each time lag from 0 to 38 h; (b) the partial autocorrelation of the same data.
Caption: Figure 5: Cross-correlation of the temperature and concentration data. The x-axis shows
the time that the temperature cycles lag behind the concentration cycles.
Please Note: Illustration(s) are not available due to copyright restr
ictions.
7. Relationship between economic freedom and stages of development, Sayed M. Elsayed
Elkhouly and Mohamed Gamal Amer, Competition Forum. 13.1 (Jan. 2015): p154
This paper examined the relationship between the Heritage Index of Economic Freedom and stages
of development as sub-indexes of the Global Competitiveness Index (GCI). Pearson coefficient of
correlation calculated for sample size of 150 countries for the average of five-year indexes (2010-
2014). The Economic Freedom index ranked countries into five categories. The results revealed that
there was correlation between the Economic Freedom Index and GCI. For classified countries based
on their Economic Freedom category, the results indicated that there was correlation only between
Economic Freedom and stages of development in moderately free economies, while for other
categories there was no correlation.
Keywords: Global competitiveness, Stage of development, Economic freedom
Full Text:
LITERATURE REVIEW
Competitiveness
Competitiveness has always been an interesting topic. The appearance of competitiveness reports
of major international organizations, such as the World Economic Forum (WEF), has laid the solid
ground about the measurement of competitiveness (Chikan, 2008). A slight adjustment is attempted
by Porter to include the business competitiveness index into the national index developed by the
world economic forum (WEF, 2014). "Unless there is appropriate improvement at the microeconomic
level, macroeconomic, political, legal and social reforms will not bear full fruit" (Porter, 2004).
There have been studies that have tried to capture as widely as possible the phenomenon, but every
time there remains room for new influence factors. This just shows the dimensions of
competitiveness, its interconnectivity, extended and deep roots and its continuous dynamics. People
change, people evolve, markets are interconnected and so companies and states face new
challenges and have to find new sources of competitiveness (Alin & Mihaiu, 2011).
The dimensions of competitiveness, its pillars, and its continuous dynamics all create sources of
competitiveness. Michael Porter explains national competitiveness as the result of microeconomic
competitiveness: "competitiveness is rooted in a nation's microeconomic fundamentals, manifested
in the sophistication of its companies and the quality of its microeconomic business environment
(Ogrean, 2010). One definition of national competitiveness is "a field of economic theory which
analyses the facts and policies that shape the ability of a nation to create and maintain an
environment that sustains more value creation for its enterprises and more prosperity for its people"
(Garelli, 2011). Garelli has elaborated with The Competitiveness Cube: "The Cube theory defines
four competitiveness forces: aggressiveness vs. attractiveness, assets vs. processes, globality vs.
proximity, and social responsibility vs. risk taking. The frontal face of the cube describes how
competitiveness is generated within one given year. The depth of the cube introduces the time
dimension and illustrates competitiveness accumulated over time, and thus the wealth of a nation"
(Garelli, 2011). The US Competitiveness Policy Council (1998) defines competition as the capability
of producing goods/services at an international quality that can compete at international markets,
resulting in a continuous increase in the welfare of a nation. Porter (1990) further emphasizes the
productive use of resources in a nation as a good measure for competitiveness. However,
measuring competitiveness at the national level has not been easy and straightforward. National
competitiveness is defined by World Economic Forum as "the set of institutions, policies, and factors
that determine the level of productivity of a country (WEF, 2014).
THE 12 PILLARS OF COMPETITIVENESS: The World Economic Forum (WEF) considers 12 major
pillars to quantify competitiveness. Many determinants drive productivity and competitiveness. While
all of these factors are likely to be important for competitiveness and growth, they are not mutually
exclusive. Two or more of them can be significant at the same time. This open-endedness is
captured within the GCI by including a weighted average of many different components, each
measuring a different aspect of competitiveness. These components are grouped into 12 pillars of
competitiveness (WEF, 2014).
First pillar: Institutions: The institutional environment is determined by the legal and administrative
framework within which individuals, firms and governments interact to generate wealth. The
importance of a sound and fair institutional environment has become all the more apparent during
the recent economic and financial crisis and is especially crucial for further solidifying the fragile
recovery, given the increasing role played by the state at the international level and for the
economies of many countries (WEF, 2014).
Second pillar: Infrastructure: Extensive and efficient infrastructure is critical for ensuring the effective
functioning of the economy, as it is an important factor in determining the location of economic
activity and the kinds of activities or sectors that can develop within a country. A well-developed
infrastructure reduces the effect of distance between regions, integrating the national market and
connecting it at low cost to markets in other countries and regions. In addition, the quality and
extensiveness of infrastructure networks significantly impact economic growth and reduce income
inequalities and poverty in a variety of ways (WEF, 2014).
Third pillar: Macroeconomic environment: The stability of the macroeconomic environment is

important for business, and, therefore, is significant for the overall competitiveness of a country.
Although, it is certainly true that macroeconomic stability alone cannot increase the productivity of a
nation, it is also recognized that macroeconomic disarray harms the economy, as we have seen in
recent years, notably in the European context. The government cannot provide services efficiently if
it has to make high-interest payments on its past debts. Running fiscal deficits limits the
government's future ability to react to business cycles (WEF, 2014).
Fourth pillar: Health and primary education: A healthy workforce is vital to a country's
competitiveness and productivity. Workers who are ill cannot function to their potential and will be
less productive. Poor health leads to significant costs to business, as sick workers are often absent
or operate at lower levels of efficiency. Investment in the provision of health services is thus critical
for clear economic, as well as moral considerations. In addition to health, this pillar takes into
account the quantity and quality of the basic education received by the population, which is
increasingly important in today's economy (WEF, 2014).
Fifth pillar: Higher education and training: Quality higher education and training is crucial for
economies that want to move up the value chain beyond simple production processes and products
(WEF, 2014).
Sixth pillar: Goods market efficiency: Countries with efficient goods markets are well positioned to
produce the right mix of products and services given their particular supply-and-demand conditions,
as well as to ensure that these goods can be most effectively traded in the economy (WEF, 2014).
Seventh pillar: Labor market efficiency: Efficiency and flexibility of the labor market are critical for
ensuring that workers are allocated to their most effective use in the economy and provided with
incentives to give their best effort in their jobs. Labor markets must therefore have the flexibility to
shift workers from one economic activity to another rapidly and at low cost and to allow for wage
fluctuations without much social disruption (WEF, 2014).
Eighth pillar: Financial market development: The financial and economic crisis has highlighted the
central role of a sound and well-functioning financial sector for economic activities. An efficient
financial sector allocates the resources saved by a nation's citizens, as well as those entering the
economy from abroad, to their most productive uses. It channels resources to those entrepreneurial
or investment projects with the highest expected rates of return rather than to the politically
connected. A thorough and proper assessment of risk is therefore a key ingredient of a sound
financial market (WEF, 2014).
Ninth pillar: Technological readiness: Technology is increasingly essential for firms to compete and
prosper. The technological readiness pillar measures the agility with which an economy adopts
existing technologies to enhance the productivity of its industries, with specific emphasis on its
capacity to full leverage, information and communication technologies (ICTs) in daily activities and
production processes for increased efficiency and enabling innovation for competitiveness. ICT
access and usage are key enablers of countries' overall technological readiness (WEF, 2014).
Tenth pillar: Market size: This affects productivity since large markets allow firms to exploit
economies of scale. Traditionally, the markets available to firms have been constrained by national
borders. In the era of globalization, international markets have become a substitute for domestic
markets, especially for small countries. Vast empirical evidence shows that trade openness is
positively associated with growth. Even if some recent research casts doubts on the robustness of
this relationship, there is a general sense that trade has a positive effect on growth, especially for
countries with small domestic markets (WEF, 2014).
Eleventh pillar: Business sophistication: There is no doubt that sophisticated business practices are
conducive to higher efficiency in the production of goods and services. Business sophistication
concerns two elements that are intricately linked: the quality of a country's overall business networks
and the quality of individual firms' operations and strategies. These factors are particularly important
for countries at an advanced stage of development when, to a large extent, the more basic sources
of productivity improvements have been exhausted. The quality of a country's business networks
and supporting industries, as measured by the quantity and quality of local suppliers and the extent
of their interaction, is important for a variety of reasons (WEF, 2014).
Twelfth pillar: Innovation: This can emerge from both new technological and non-technological
knowledge. Non-technological innovations are closely related to the know-how, skills, and working
conditions that are embedded in organizations and are therefore largely covered by the eleventh
pillar of the GCI. The final pillar of competitiveness focuses on technological innovation. Although
substantial gains can be obtained by improving institutions, building infrastructure, reducing
macroeconomic instability or improving human capital, all these factors eventually run into
diminishing returns. The same is true for the efficiency of the labour, financial, and goods markets
(WEF, 2014).
Stages of Development and The Weighted Index: While all of the pillars described above will matter
to a certain extent for all economies, it is clear that they will affect them in different ways. The best
way for Cambodia to improve its competitiveness is not the same as the best way for France to do
so. This is because Cambodia and France are in different stages of development. As countries move
along the development path, wages tend to increase and, in order to sustain this higher income,
labor productivity must improve. In line with well-known economic theory of stages of development,
the GCI assumes that, in the first stage, the economy is factor-driven and countries compete based
on their factor endowments--primarily unskilled labor and natural resources. Nineteen companies
compete on the basis of price and sell basic products or commodities, with their low productivity
reflected in low wages. Maintaining competitiveness at this stage of development hinges primarily on
well-functioning public and private institutions (pillar 1), a well-developed infrastructure (pillar 2), a
stable macroeconomic environment (pillar 3) and a healthy workforce that has received at least a
basic education (pillar 4). As a country becomes more competitive, productivity will increase and
wages will rise with advancing development. Countries will then move into the efficiency-driven
stage of development when they must begin to develop more efficient production processes and
increase product quality because wages have risen and they cannot increase prices. At this point,
competitiveness is increasingly driven by higher education and training (pillar 5), efficient goods
markets (pillar 6), a well-functioning labor markets (pillar 7), developed financial markets (pillar 8),
the ability to harness the benefits of existing technologies (pillar 9) and a large domestic or foreign
market (pillar 10).
Finally, as countries move into the innovation-driven stage, wages will have risen by so much that
they are able to sustain those higher wages and the associated standard of living only if their
businesses are able to compete with new and unique products. At this stage, companies must
compete by producing new and different goods using the most sophisticated production processes
(pillar 11) and by innovating new ones (pillar 12) (WEF, 2014).
Implementation of Stages of Development: Two criteria are used to allocate countries into stages of
development. The first is the level of GDP per capita at market exchange rates. This widely available
measure is used as a proxy for wages because internationally comparable data on wages are not
available for all countries covered. A second criterion is used to adjust for countries that, based on
income, would have moved beyond stage 1 but where prosperity is based on the extraction of
resources (WEF, 2014).
Economic Freedom
Economic freedom is the condition in which individuals can act with maximum autonomy and
minimum obstruction in the pursuit of their economic livelihood and greater prosperity. Any
discussion of economic freedom has at its heart reflection on the critical relationship between
individuals and the government. As Friedrich Hayek once observed, "To be controlled in our
economic pursuits means to be controlled in everything." Hayek's keen insights on economic
freedom are based on the moral truth that each person is, as a matter of natural right, a free and
responsible being with inalienable dignity and fundamental liberties that righteous and effective
political systems should regard as unassailable.
Guiding Principles of Economic Freedom: In an economically free society, each person controls the
fruits of his or her own labor and initiative. Individuals are empowered, indeed, entitled to pursue
their dreams by means of their own free choice. In an economically free society, individuals succeed
or fail based on their individual effort and ability. The institutions of a free and open market society
do not discriminate either against or in favor of individuals based on their race, ethnic background,
gender, class, family connections or any other factor unrelated to individual merit. Government
decision-making is characterized by openness and transparency, which illuminates the shadows
where discrimination might flourish and promotes equal opportunity for all. In an economically free
society, the power of economic decision-making is widely dispersed, and the allocation of resources
for production and consumption is based on open competition so that every individual or firm gets a
fair chance to succeed. These three fundamental principles of economic freedom: empowerment of
the individual, non-discrimination and open competition underpin every measurement and policy
idea presented in the Index of Economic Freedom.
Economic Freedom: Autonomy, Not Anarchy: In general, state action or government control that
interferes with individual autonomy limits economic freedom. The Index of Economic Freedom is not,
however, a call for anarchy. The goal of economic freedom is not simply an absence of government
coercion or constraint, but the creation and maintenance of a mutual sense of liberty for all. As
individuals enjoy the blessings of economic freedom, they in turn have a responsibility to respect the
economic rights and freedoms of others within the rule of law. Governments are instituted to ensure
basic protections against the ravages of nature or the predations of one citizen over another.
Positive economic rights such as property and contracts are given societal as well as individual
defence against the destructive tendencies of others.
A comprehensive view of economic freedom encompasses all liberties and rights of production,
distribution or consumption of goods and services. The highest forms of economic freedom should
provide an absolute right of property ownership: full freedom of movement for labor, capital and
goods and an absolute absence of coercion or constraint of economic activity beyond that which is
necessary for the protection and maintenance of liberty itself. An economically free society
encourages handling of economic decisions in a decentralized fashion. Individuals are free to work,
produce, consume and invest in any way they choose under the even-handed application of laws,
with their economic freedoms at once both protected and respected by the state. However, some
government action is necessary for the citizens of a nation to defend themselves, promote the
peaceful evolution of civil society and enjoy the fruits of their labor. For example, citizens are taxed
to provide revenue for public safety, the protection of property and the common defence. There can
also be other goods, what economists call "public goods," that may be supplied more efficiently by
governments than through private means. Some public goods, such as the maintenance of a police
force to protect property rights, a monetary authority to maintain a sound currency and an impartial
judiciary to enforce contracts among parties are themselves vital ingredients of an economically free
society. When government action rises beyond the minimal necessary level, however, it leads
inevitably and quickly to the loss of freedom and the first freedom affected is often economic
freedom.
Throughout history, governments have imposed a wide array of constraints on economic activity.
Such constraints, though sometimes imposed in the name of equality or some other noble societal
purpose, are in reality imposed most often for the benefit of societal elites or special interests and
they come with a high cost to society as a whole. By substituting political judgments for those of the
marketplace, government diverts entrepreneurial resources and energy from productive activities to
rent-seeking, the quest for economically unearned benefits.
The result is lower productivity, economic stagnation and declining prosperity. Government provision
of goods and services beyond those that are clearly considered public goods imposes a separate
constraint on economic activity, as well, crowding out private-sector activity and usurping resources
that might otherwise have been available for private investment or consumption. Constraining
economic choice distorts and diminishes the production, distribution, and consumption of goods and
services (including, of course, labor services). The wealth of a nation inevitably declines as a result.
RESEARCH METHODOLOGY
Research Hypotheses
Research hypotheses were as follows:

[H.sub.1]: There is a relationship between Economic Freedom and the Global Competitiveness
Index (GCI).
[H.sub.2]: There is a relationship between Economic Freedom and Stages of Development.
Research Method
The indexes and indicators used in this paper were the Heritage Economic Freedom Index and
Global Competitive Index (GCI). Data were collected from 2010-2014. The Index of Economic
Freedom takes a broad and comprehensive view of country performance, measuring 10 separate
areas of economic freedom. Each economic freedom is individually scored on a scale of 0 to 100. A
country's overall economic freedom score is a simple average of its scores on the 10 individual
freedoms (Heritage Index of Economic Freedom, 2010, 2011, 2012, 2013, 2014). Competitiveness
of countries on a macroeconomic level is measured by The World Economic Forum's Global
Competitiveness Index (WEF, 2010, WEF, 2011, WEF, 2012, WEF, 2013, WEF, 2014).
Research Sample
The relationship between the Heritage Economic Freedom Index and Global Competitive Index
(GCI) was examined based on data of the 150 states selected. From among all 184 states of the
Index of Economic Freedom, it was necessary to reject those for which no data was available on
Global Competitive Index (GCI). Also excluded were those countries in which the number of cells
with data available for both indexes was lower than 50% of all values. Five-year averages were used
to minimize short-term business cycle fluctuations and measurement error.
Data Analysis
The correlation research design was chosen. The purpose of correlational research designs is
discovering and expressing relationships among variables (Tabachnick & Fidell, 2007), which is the
focus of this research. Pearson correlation coefficients (r) were utilized to measure the correlations
between Economic Freedom and GCI. Interpretation of correlation coefficients were (r), (0.70 or
higher)--very strong relationship; (0.50 to 0.69)--substantial relationship; (0.30 to 0.49) moderate
relationship; (0.10 to 0.29)--low relationship; and (0.09 or lower) negligible relationship (Davis, 1971).
FINDINGS
Descriptive of Economic Freedom Ranking
Looking at the five-year average of the Heritage Economic Freedom Index data, as presented in
Table 1, the sample size was made of 150 economies, with mean 61.604, standard deviation
10.196, min average 26.8 and max average 89.74.
The Heritage Economic Freedom index ranked countries into five categories, Free, Mostly Free,
Moderately Free, Mostly Un-free, and Repressed economies. Free economies were six countries
with mean 83.88, standard deviation 3.87, min average 80.14 and max average 89.74. Mostly Free
Economies were 25 countries with mean 73.25, standard deviation 2.63, min average 69.97 and
max average 78.13. Moderately Free Economies were 52 countries with mean 64.94, standard
deviation 3.01, min average 60.25 and max average 69.74. Mostly Un-Free Economies were 51
countries with mean 55.32, standard deviation 2.93, min average 50.14 and max average 59.87.
Repressed Economies were sixteen countries with mean 44.23, standard deviation 1.54, min
average 26.8 and max average 49.16. The majorities were moderately and mostly Un-free
economies; both were 103 out of 150.
Correlation Test for 5-Year Average Scores (2010: 2014)
Pearson Correlation test performed for each category of Economic Freedom. Davis's set of
descriptors was used, (0.70 or higher) very strong relationship; (0.50 to 0.69) substantial
relationship; (0.30 to 0.49) moderate relationship; (0.10 to 0.29) low relationship; and (0.09 or lower)
negligible relationship (Davis, 1971). The results of correlation between Economic Freedom and
both GCI and its stages of development, are presented through Table 2 to Table 7. GCI assumes
that, in the first stage of development, the economy is factor-driven. Countries will then move into the
efficiency-driven stage of development. Finally, countries move into the innovation-driven stage
(WEF, 2014).
The results presented in Table 2 shows that for the Free Economies Category, there was no
significant correlation between Heritage Economic Freedom Index and Global Competitive Index
(GCI) and its three stages of development (Factor driven, Efficiency driven, and Innovation driven),
as all Sig. (2-tailed) were > 0.05.
Results of correlation for Mostly Free Economies in Table 3 reveals that there was no significant
correlation between Heritage Economic Freedom Index and Global Competitive Index (GCI) and its
three stages of development (Factor driven, Efficiency driven, and Innovation driven), as all Sig. (2-
tailed) were > 0.05.
For 52 Moderately Free Economies, as Table 4 shows, there was significant positive substantial
correlation between Economic Freedom and Factor driven Stage of Development (r = 0.510, n = 52,
p < 0.01). While there was significant positive moderate correlation between Economic Freedom and
both Efficiency driven and Innovation driven (r = 0.445, n = 52, p < 0.01) and (r = 0.377, n = 52, p <
0.01) respectively. The correlation between Economic Freedom and GCI was significant positive
moderate correlation (r = 0.481, n = 52, p < 0.01).
Results of correlation for Mostly Un-Free fifty one Economies in Table 5 reveals that there was no
Results of correlation for Mostly Repressed sixteen Economies in Table 6 reveals that there was no
Further testing was performed using average 5-year indexes from 2010 to 2014. Results show the
correlation between Heritage Economic Freedom and Global Competitiveness Index (GCI and its
sub-indexes, the sample size was 150 countries). The results presented in Table 7 for total sample
size indicates that there was a significant positive very strong correlation between Economic
Freedom Index and GCI (r = 0.759, n = 150, p < 0.01) and between Economic Freedom index and
Factor-driven (r = 0.751, n = 150, p < 0.01) and between Economic Freedom index and Efficiency
driven (r = 0.781, n = 150, p < 0.01). The correlation between Economic Freedom and Innovation
driven was positive substantial (r = 0.698, n = 150, p < 0.01).
CONCLUSIONS
This paper tested two hypotheses of relationships between Economic Freedom and stages of
development. The hypotheses were tested by conducting correlation tests on average 5-year
indexes of the Heritage Economic Freedom Index and Global Competitiveness Index (GCI).
The Heritage Economic Freedom index ranked countries into five categories: Free, Mostly Free,
Moderately Free, Mostly Un-free, and Repressed economies. The World Economic Forum, Global
Competitiveness Report classified countries based on three stages of development: Factor driven,
Efficiency driven and Innovation driven, and two transitions between the three stages.
The results revealed that there were no significant correlations between Economic Freedom and
GCI and its stages of development in four categories out of five categories of Economic Freedom.
While for the Moderate Free Economies category, there were significant correlations between
Economic Freedom and Factor driven stage (r = 0.510, n = 52, p < 0.01), and between Economic
Freedom and efficiency driven stage (r = 0.445, n = 52, p < 0.01) and between the Economic
Freedom and innovation driven stage (r = 0.377, n = 52, p < 0.01). There was significant correlation
between Economic Freedom and GCI (r = 0.481, n = 52, p < 0.01).
In the average 5-year of the entire sample (150 countries), there was significant positive very strong
correlation between Economic Freedom and GCI (r = 0.759, n = 150, p < 0.01).
Sayed M. Elsayed Elkhouly, Ain Shams University, Egypt
Mohamed Gamal Amer, Asec Automation, Egypt
REFERENCES
Alin O., & Mihaiu, D. M. (2011). Analysis of European Union competitiveness. Romanian Economic
and Business Review, 5(4), 69.
Becker, L. B., Vlad, T., & Nusser, N. (2004). Media freedom: conceptualizing and operationalizing
the outcome of media democratization. Paper presented to the International Association for Media
and Communication Research, Porto Alegre, Brazil.
Becker, L. B., Vlad, T., & Nusser, N. (2007). An evaluation of press freedom indicators. International
Communication Gazette, 59(1), 5-28.
Chikan, A. (2008). National and firm competitiveness: a general research model. Competitiveness
Review, 18(1/2), 20-8. Davis, J.A. (1971). Elementary survey analysis. Englewood Cliffs, NJ:
Prentice Hall.
Index of Economic Freedom. The Heritage Foundation. Retrieved from

http://www.heritage.org/index/download
Index of Economic Freedom, 2010. The Heritage Foundation. Retrieved from

http://www.heritage.org/Index/pdf/2010/Index2010_Full.pdf

http://www.heritage.org/Index/pdf/2011/Index2011_Full.pdf

http://www.heritage.org/index/pdf/2012/book/index_2012.pdf


Aiginger, K., Barenthaler-Sieber, S., & Vogel, J. (2013). Research paper competitiveness under new
perspectives. Working Paper no 44.
Ogrean, C. (2010). National competitiveness between concept and reality. Some Insights for
Romania. Revista Economica, 1-2(49), 60.
Porter, M.E. (2004). The microeconomic foundations of prosperity: findings from the microeconomic
competitions index. World Economic Forum, Geneva.
Porter, M.E. (1990). Competitive advantage of nations. New York, NY: The Free Press.
Seeman, N. (2003). Measuring a free press. Fraser Forum. Retrieved December 7, 2006, from
http://www.fraserinstitute.ca/shared/readmore.asp?sNav=pb&id=583
Sussman, L., & Karlekar, K. (Eds.). (2002). The annual survey of press freedom 2002. New York:
Freedom House.
Stephane, G. (2011). IMD world competitiveness yearbook 2011, 495.
Tabachnick, B. G., & Fidell, L. S. (2007). Using multivariate statistics (5th ed.). Boston, MA: Allyn
and Bacon.
The US Competitiveness Policy Council. (1998). Building a competitive America. First Report to the
President and Congress.
WEF. (2014). The global competitiveness report 2013-2014. World Economic Forum, Geneva.
WEF. (2012). The global competitiveness report 2011-2012, World Economic Forum, Geneva.
TABLE 1
Descriptive of Economic Freedom Ranking
N Mean Std. Std.

Deviation Error
Free Economies 6 83.8761 3.86974 1.57981

Mostly Free Economies 25 73.2457 2.6319 0.52638
Moderately Free Economies 52 64.9439 3.00575 0.41682
Mostly Un-free Economies 51 55.3222 2.93448 0.41091
Repressed Economies 16 44.2266 6.15473 1.53868
Sample Size 150 61.6036 10.19585 0.83249
95% Confidence Minimum Maximum
Lower Upper
Bound Bound
Free Economies 79.815 87.9371 80.14 89.74

Mostly Free Economies 72.1593 74.3321 69.97 78.13
Moderately Free Economies 64.1071 65.7807 60.25 69.74
Mostly Un-free Economies 54.4969 56.1475 50.14 59.87
Repressed Economies 40.947 47.5062 26.8 49.16
Sample Size 59.9586 63.2486 26.8 89.74
TABLE 2
Free Economies
Stages of Development GCI
Factor Efficiency Innovation

Driven Driven Driven
Pears on 0.667 0.59 -0.121 0.269

Correlation
Economic Sig. (2-tailed) 0.148 0.218 0.82 0.606
Freedom N 6 6 6 6
Index
TABLE 3
Mostly Free Economies

Pearson 0.077 0.19 0.058 0.126

Correlation
Freedom
Index N 25 25 25 25
TABLE 4
Moderately Free Economies
Stages of Development
Factor Efficiency Innovation GCI

Pearson .510 ** .445 ** .377 ** .481 **

Correlation
Economic Sig. (2- 0.000 0.001 0.006 0.000
Freedom tailed)
Index N 52 52 52 52
**. Correlation is significant at the 0.01 level (2-tailed).

TABLE 5
Mostly Un-free Economies

Pearson 0.057 0.214 0.253 0.137

Correlation
Freedom N 51 51 51 51
Index
TABLE 6
Repressed Economies

Pearson 0.019 0.071 0.18 0.048

Correlation
Freedom
Index N 16 16 16 16
TABLE 7
Total Sample Size
Stages of Development
Factor Efficiency Innovation GCI

Pearson .751 ** .781 ** .698 ** .759 **

Correlation
Economic Sig. (2- 0.000 0.000 0.000 0.000
Freedom tailed)
Index N 150 150 150 150
Elkhouly, Sayed M. Elsayed^Amer, Mohamed Gamal

8. IMPACT OF EMPLOYEES' VOICE ON EMPLOYEES' EFFECTIVENESS, Nawaz Ahmad, Adnan Rizvi
and Syeda Nadia Bokhari Journal of Business Strategies (Karachi). 11.1 (June 30, 2017): p79.
Abstract
The focus of this research study is to identify the impact of employee voice mechanism on employee
effectiveness factor in the corporate sector of Karachi. The three effectiveness factors under study
are employee commitment, employee engagement, and employee motivation. Although the impact
of employee voice has been researched by numerous scholars, yet it has not been explored in the
context of Pakistani market. A sample 172 employees belonging to a diverse set of corporate
sectors was studied for this research. Primary data was gathered with the aid of a questionnaire.
Regression analysis revealed that employee voice had a positive impact on the three effectiveness
factors thus indicating that it plays an important role in influencing employee commitment,
engagement, and motivation.
Keywords: Employee voice, commitment, engagement, motivation
Introduction
The concept of employee voice was coined by Albert Hirschman in the 1970's where he defined it as
"taking actions for altering the dissatisfying work situations/status quo" (Hirschman, 1970). He
furthered the concept by introducing the term 'exit' which implied employees leaving the organization
in dissatisfying conditions. This concept contends that employees in dissatisfying situations either
raise their voice and concerns to the management or if unheard, simply disconnect themselves from
the organization.
Traditionally employee voice was channeled through collective bargaining agents and union
representation. With the progression of time and the development in the human resource
management field, alternative forms of representation such as non-union and direct forms of voice
gained momentum (Bryson, 2004). While trade unions are formal systems of indirect representation
of union members, direct voice has been considered as a superior form of employee's expressing
their ideas opinions and concerns (Rees, Alfes, and Gatenby, 2013). It helps remove barriers
among employers and employees and allows supervisors to better cater to the diverse needs of
individual employees.
The term 'employee voice' has been associated with grievances on an individual or collective basis
which restricts the conceptualization of the term (Gollan, 2001). This concept limits employees from
practicing their rights and participating and contributing in organizational decision making. According
to Pyman, Cooper Tiecher and Holland (2006), employee voice can include diverse avenues such
as "vocalization of individual dissatisfaction, collective representation, and organization, upward
problem solving, engagement and contribution to management decision making and demonstration
of mutuality and cooperative relations". Morrison et al. (2011) referred to employee voice as the
articulation of ideas, opinions, and suggestions with the intention of improving departmental or
organizational functioning.
Folger (1977) adds that employee voice is an organization-wide mechanism whereby a set of
procedures or rules permits employees who feel impacted by a management decision to
communicate information concerned to that decision.
In any given organization, the effort of its human resource plays a vital role in contributing to the
success of the organization. Organizations need to have a sound and credible internal
communication system in place, such that it addresses its employee's concerns, recommendations,
opinions, and advice. A two-way direct communication is essential for employers as well as
employees (freeman and Medoff, 1984). The absence of a voice system in an organization may lead
to low level of motivation and laxed attitude among employees thus affecting individual and overall
organization efficiency and effectiveness. The efficacy of the employee voice hinges upon sharing of
information within the organization. A well-informed employee with an opportunity to raise his/her
voice is in a better position to enhance his/her engagement level with the organization. Only by
giving employees opportunities to raise their voices, employees can become more engaged.
If employees acknowledge that their organizational policy is employee voice friendly and builds trust,
they can organize an effective partnership. The effects of employing an employee voice approach
and bringing in modes of communications for employees are significant and an evidence for
employees that their voice is valued and understood across the organizational hierarchy.
As voice allows an opportunity for employees to express their opinions and cements the belief
among them that their efforts and contributions are valued, it creates a certain degree of regard for
the leadership of an organization, thus creating a direct link between voice and employee trust in
management. Organizations that honor their obligations towards their workforce result in the
enhancement of and sense of loyalty and commitment of employees for their organization. With
higher trust levels, employees are more motivated and are inclined towards fulfilling their
commitment and obligations, thus enhancing the level of job engagement (Rousseau, Sitkin, Burt,
and Camerer, 1998).
The concept of employee voice was traditionally linked with unions and collective bargaining. More
recently, it broadly refers to how employees participate by raising their concerns about how things
run within their organization. This may be achieved through indirect means such as trade unions and
collective bargaining or through a more direct channel.
Employee voice has been generally associated with 'grievances'. Hirschman (1970) referred to it as
the ability to express dissent or discontent. Dissent is the intrinsic human ability to complain, protest.
This approach limits the conceptualization of the term. Voice can be utilized in making constructive
suggestions for change and recommending reforms in standard procedures adopted by the
organization. Organizations that value employee voice witnessed increased levels of organizational
commitment, employee engagement, and employee motivation. This is due to the fact that
employees believe that the organization values their input and that they are able to influence and
contribute to organizational decision making.
This research study aims at understanding the impact of employee voice mechanism on employee's
commitment, engagement, and motivation in a diverse set of corporate sectors in Karachi where
employees were able to communicate their concerns regarding management decisions such as
organizational restructuring, privatization and employee layoffs. An attempt is made in gauging the
impact of employee voice on employee commitment, employee engagement, and employee
motivation.
Research Hypotheses
H1: employee voice does not affect employee commitment towards the organization.
H2: employees' Voice does not affect employee engagement.

H3: employees' voice does not affect work motivation.
Literature Review
The literature review is further divided and explained by following subheadings.
Employees' Voice
Employees' voice has been an infrequent researched and an important issue among the Human
resource Management in general and employee-employer relation in particular. Generally, employee
voice refers to the opportunity for employees to voice their concerns or opinions to discuss or give
suggestions about their issues or changes with the employers (also includes salary, environment,
and other organizational factors). In the past, employees had raised their voices through other
conventional means such as through collective bargaining agents or employee unions.
Work on employee voice was initiated by Albert Hirschman when he wrote a book "exit, Voice, and
loyalty" in 1970. In his book, he introduced the term "Voice of employees" and further defined in
subchapter "responses to declining in firms, organizations, and states" that a concern exists if there
are "repairable lapses" in an organization or issues that would be solved with an appropriate level of
information and responses. Hirschman (1970) stated that over time, organizations tend to decline
gradually and lose their efficiency and effectiveness. He proposed two countering forces which can
reverse such situations, known as "Desertion or articulation" or "exit or Voice"
Previous research on employee voice has shed significant light on the factors of employee's voice
and its mechanism and its impact on how it can prove to be effective for an organization. These
studies propose a varying range of contextual and personal factors which explain
employees' participation in voice.
"Voice of employee" is directly concerned with the concept of 'participative management'. Stueart
and Moran (2007) in their study concluded that participative management empowers employees
through team building and hence directly participating in such activities gives them the opportunity to
directly be part of the decision-making process. They further established a positive link between
employee empowerment and better innovation, flexibility, customer service and creativity.
Detert and Burris (2007) researched into the relation between employee voice and leadership styles
among restaurant chains. Their research indicated that an open attitude is more progressively
attributed to employee voice, employee satisfaction, and job demography. However, their
relationship showed to be intervened by subordinate insights of psychological safety, depicting the
significance of supervisors in subordinate evaluation of the risks of raising voices. Moreover,
supervisor/leadership behavior tend to have the highest impact on the employee voice behavior of
the star performing employees.
Morrison and Milliken (2000) stated that managers/supervisors high in the hierarchy have a critically
significant role in engaging employees' into raising voice. Whenever supervisors portray negative
intrinsic attitude concerning employee voice behavior (such as subordinates' raising voice is
because of self-interest, he himself knows the best, and that employee voice is dissent), these
attitudes lead the creation of systems and mechanisms that promote organizational culture of silence
such as centralized decision making. They also concluded that managers have a fear of getting a
negative response from their subordinates as employee voice. They, therefore, try to avoid
themselves of such embarrassment as they feel vulnerable and hence try to suppress their
subordinate's voice.
When managers cultivate high-level of relation with their subordinates (Botero and Van Dyne, 2009;
Gao, Janssen, and Shi, 2011; Van Dyne, Kamdar, and Joireman, 2008), and when they extends
positive leadership conduct for employee voice like guidance and openness, transformational
leadership, and ethical leadership (Detert and Burris, 2007), the subordinate employees are likely to
enhance their voice behaviors. This is owing to the fact that managers tend to make their
subordinates feel obligated to reciprocate the positive treatment they receive (Podsakoff et al., 2000;
Van Dyne et al., 2008). Also the Managers allow their juniors to inculcate perceptions of
psychological well-being (Walumbwa and Schaubroeck, 2009), sense of influence (Tangirala and
Ramanujam, 2012), and relation with the manager and the organization as a whole (Liu et al., 2010),
which resultantly adds to positive employees' voice behavior.
Organizational Commitment
Existence and implementation of voice mechanism in organizations lead to low employees turnover
(Spencer, 1986). Spencer (1986) conducted a number of studies on this topic. In his initial study, he
came to the conclusion that the more opportunities for employees to raise voice against
dissatisfaction, the more probability that employees will be retained in the organization. In the
second study, he postulated that high level of voice mechanism in the organization leads to higher
expectations among employees that their issues and concerns will be addressed.
(Zehir and Erdogan, 2011) found that a relation exists between employee voice and leadership style.
Their research findings were that performance of employees and organizational commitment are
related to leadership behavior. They concluded that employee voice or silence per se directly
impacts the performance of employees and their commitment towards the organization.
Batts, Colvin and Keefe (2002) observe view that employee voice mechanisms and organizational
practices predict firm-level quit rates. They researched the predictive power of these factors using
data from a 1998 telecommunications industry. As a conventional voice mechanism, they found that
presence of employee union projects lower quit rates, as they control factors such as compensation
and other practices of human resource that may be influenced by collective bargaining.
Freeman (1980) studied that employees which are committed with the unions within the organization
have lesser probability to leave the organization because the respective unions give them a formal
mechanism for raising their voices. Thus in organizations where there are formal employee voice
mechanism and grievance procedures, employees remain engaged for the organization increasing
organizational commitment level among them. Similar studies were conducted by Boroff and Lewin,
(1997) which contends that that employee loyalty level determines to choose between voice or exit.
Freeman and Medoff (1984) took the pluralistic approach while evaluating the employee voice in the
organizations and concluded that collective bargaining agent's or CBA's, as known as, provide more
economically influential solutions to employees for exiting or raising voice because of combined
decision making.
Employee Engagement
The literature on employee engagement is widely available as a core factor that impacts the work-
related behavior/attitudes (Christian, Garza and slaughter 2011). Definition of engagement as noted
by Macey and Schneider (2008) is based on the assumption that 'pro-social' employee activity can
lead to favorable effects, with benefits for both employees and organizations.
Rees, Alfes, and Gatenby (2013) studied the relation between employee voice and engagement.
They found that perceptions of employee's voice behavior, which aims for improvement in
organization, depicts direct and indirect impact on employee engagement levels. Their analysis
revealed that relation between perceptions of employee voice and engagement is intermediated by
employee trust in senior management and the employee- line manager relationship.
Extending opportunities for raising voices to employees can lead to more affirmative attitudes
towards management (Dietz, Wilkinson, and Redman, 2009). Research on employee voice suggest
that employees who believe to have the opportunity of raising voice effectively convey their
suggestions/ concerns/opinions to the management and are more likely to portray a optimistic
attitude and show high performance levels (Purcell, Kinnie, Hutchinson, Rayton and swart 2003;
Robinson, Perryman and Hayday 2004).
Studies also depict that whenever employees assume that they can influence decisions, by raising
their voice to lead to increased levels of organizational engagement (Farndale, Van Ruiten, Kelliher
Ahope-Hailey 2011).
Truss et al. (2006) contend that the main factor of employee engagement is having the opportunity
to raise their opinions upwards.
Employees' Motivation
According to Maslow's needs hierarchy, employees tend to have five levels of needs: physiological,
safety, social, ego, and self-actualizing (Maslow, 1943). Maslow stated that needs at lower level had
to be fulfilled before the next level need so as to motivate employees. Herzberg segregated
motivation into two factors: intrinsic and extrinsic (Herzberg et al, 1959). Intrinsic factors included
recognition, results in job satisfaction whereas extrinsic factors include factors such as salary
structure/job security, results in job dissatisfaction.
Vroom's theory assumes that employees struggle to lead to better performance. Performance, in
turn, leads to some rewards (Vroom, 1964). These rewards could be positive or negative. Positive
rewards lead to motivated employees whereas negative rewards lead to a lesser likelihood of
employee motivation.
Adams' theory postulates that employees work for equality between themselves and other
group workers. Equality is attained when the employee outcomes over inputs ratio is equal to the
employee outcomes over inputs ratios of other employees (Adams, 1965).
Skinner's theory contends that employees' behavior which leads to affirmative results will be
repeated again while behaviors that lead to undesirable outcomes will not be repeated (skinner,
1953). Thus supervisors should positively support employee behavior that results in positive
outcomes and discourages employee behavior that leads to negative outcomes.
Conceptual Framework
There exist one predictor that is employee voice and three CRiterion Variables that are
organizational Commitment, employee engagement and Work Motivation which are the components
of employees' effectiveness.
Research Methodology
Research Instrument
A questionnaire was adopted from different studies based on variables under consideration. This
study focuses on four variables where employee voice is the independent variable and
organizational engagement, organizational commitment, and employee motivation are the
dependent variables. The different construct and their related measures are discussed as follows:
Employee Voice. The scale adopted for employee voice consist of 5 item. This scale is a pioneering
work of Van Dyne and Lepine's (1998), those who have themselves modified this scale from Van
Dyne, graham, and Dienesch (1994) scale of advocacy participation.5 point rating of Likert is
associated with these so that responses would be gauged with ease.
Employee Engagement. The scale for employee engagement is adopted from Rees et. Al (2013)
research, the items of this scale although 5, addresses three type of employee engagement.
Intellectual engagement gauges the level to which individuals are involved cognitively in their
activities related to work. Emotional items are related to effective engagement and communication
with colleagues related to work improvements are part of social engagement. An average of these
items is taken out so that an overall score of employee engagement could be generated. Like
employee voice, these responses are also taken through 5 points Likert scale.
Organizational Commitment. The 5 items of organizational commitment scale are adopted from
organizational commitment questionnaire as proposed by Mowday et al (1979). These questions are
based on two types of organizational commitment namely continuance commitment and affective
commitment. Responses are gauged through 5 point level of agreement of Likert scale.
Work Motivation. Again 5 item scale for Work Motivation is taken from Akinboye (2001) work
motivation questionnaire inventory. The homogeneity of the questionnaire is maintained in the last
by gauging the responses by 5 points Likert scale.
Sample, Participants and Data Collection
The study was initially based on the sample size of 200 employees belonging to different corporate
sectors of Karachi using convenience sampling technique whereby 172 responses were obtained,
thus the response rate was 86%.
Data Analysis
Descriptive statistics is used to analyze the respondent's demographics

whereas inferentialstatistics is used to test the hypothesis. The impact of the independent variable
on the three dependent variables was gauged using the regression analysis.
Results
Table 1. Construct Reliability
Constructs###Cronbach alpha###No of items
Employees' Voice###0.933###5
Organizational Commitment###0.942###5
Employee engagement###0.940###5
Work Motivation###0.968###5
The reliability statistics, Cronbach alpha, of the all the constructs was above the threshold limit of
0.6-0.7 hence within acceptable range depicting instrument consistency and reliability. The overall
reliability statistic for the instrument is 0.947.
Table 2. Participant's Profile
###Number###Percentage
Male###138###80
Female###34###20
19 to 34 Years###154###90
35 Years and beyond###18###10
Bachelor's or below###121###70
Masters or above###51###30
To establish the normality of the data descriptive statistic has been generated that is presented at
below table.
Table 3. Descriptive Statistics
###Mean###Std. Dev.###Skewness###Kurtosis
Employee Voice###3.538###0.8301###-1.027###1.050
Organizational Commitment###3.533###0.8334###-0.813###0.679
Employee engagement###3.584###0.8164###-0.893###0.625
Work Motivation###3.581###0.8419###-0.8419###0.685
In the above table 3 employee Voice (Mean=3.538, SD= 0.8301) has the highest Skewness 1.050,
and organizational Commitment (Mean = 3.533, SD=0.8334) possess lowest Skewness (0.813).
While Kurtosis for all the items is +positive, the greatest for employee Voice (Mean = 3.538, SD=
0.8301) is 1.1050 and the lowest kurtosis is for employee engagement (Mean = 3.584, SD=0.816)
which is 0.25. As per the table, the construct is within the parameter range of +/- 3.5 hence it is
normal tendency of the data is established.
Table 4. Correlation Statistics
###Employee###Organizational Employee Work
###Voice###Commitment###engagement###Motivation
employee Voice###1
organizational Commitment###.749###1
employee engagement###.731###.897###1
Work Motivation###.715###.873###.935###1
The above table depicts that the above-mentioned relations were significant at Confidence interval of
99% (2-tailed). The correlation of organizational Commitment was the strongest with r (172) = 0.749,
p = 0.0<0.01, while there was no weak correlation found. Moreover, the correlation also depicts that
the factors are unique and distinguishing (Hair Jr. Et al, 2010).
Regression Model
This segment summarizes the regression results for the all the factors of the research study. The
hypothesis that the employee's voice has the impact on employees effectiveness was tested via
regression analysis.
Employee Voice and Organizational Commitment
Model Summary
###Model###R###R square###Adjusted R square###Std. error of the estimate
###1###.731a###.535###.532###.57008
The basic correlation between employee Voice and organization Commitment is.731 that is an
indication of high correlation. The change that could be brought upon by employee Voice in
organization Commitment is 53.5%.
Anova a
Model###Sum of squares###df###Mean square###F###Sig.
###regression###63.156###1###63.156###194.331###.000b
###1###residual###54.924###169###.325
###total###118.080###170
The Model fitness is appropriate since the sig value of Anova table is below 0.05. And f statistics is
also greater than 4 hence the variable and data are fit for analysis.
Coefficients a
###Model###unstandardized###standardized###t###sig.
###Coefficients###Coefficients
###B###Std. error###Beta
###(Constant)###.936###.191###4.888###.000
###employee_voice###.734###.053###.731###13.940###.000
The unstandardized beta coefficients depict that for a unit change in employee Voice, the
organizational Commitment would increase by.734 units which is significant at sig value less than
0.05 and t statistics greater than 2.
Conclusively the outcomes of the regression analysis depicts that the predictor employee voice
explains 53% of the variance (r2=0.534, f (= 194, p<.01). It was also found that employees voice (ss
= 0.734, p<.01) significantly predicts organizational Commitment effect.
Employee Voice and Employee Engagement
Model Summary
###Model###R###R square###Aadjusted R square###Std. error of the estimate
###1###.749a###.561###.559###.54244
The basic correlation between employee Voice and employee engagement is.749 that is an
indication of high correlation. The change that could be brought upon by employee Voice in is
employee engagement 56.1%
Anova a
###Model###Sum of squares###df###Mean square###F###Sig.
###regression###63.587###1###63.587###216.106###.000b
###1###residual###49.727###169###.294
###total###113.314###170
also greater than 4 hence the variable and data are fit for analysis
Coefficients a
###Model###unstandardized###standardized###t###Sig.
###B###std. error###Beta
###(Constant)###.977###.182###5.365###.000
1
###employee_voice###.737###.050###.749###14.701###.000
The unstandardized beta coefficients depict that for a unit change in employee Voice, the employee
engagement would increase by.737 units which is significant at sig value less than 0.05 and t
statistics greater than 2.
explains 56% of the variance (r2=0.561, f (= 216, p<.01). It was also found that employees voice (ss
= 0.737, p<.01) significantly predicts employee engagement.
Employee Voice and Work Motivation
Model Summary
Model###R###R square###Adjusted R square###Std. error of the estimate
###1###.714a###.510###.507###.59159
The basic correlation between employee Voice and Work Motivation is.715 that is an indication of
high correlation. The change that could be brought upon by employee Voice in Work Motivation is
51.2%.
Anova a
Model###Sum of squares###df###Mean square###F###Sig.
###regression###61.545###1###61.545###175.851###.000b
1###residual###59.147###169###.350
###total###120.692###170
also greater than 4 hence the variable and data are fit for analysis.
Coefficients a
###Model###unstandardized###standardized###t###Sig.
###B###Std. error###Beta
###(Constant)###1.016###.199###5.114###.000
###employee_voice###.725###.055###.714###13.261###.000
The unstandardized beta coefficients depict that for a unit change in employee Voice, the Work
Motivation would increase by.726 units which is significant at sig value less than 0.05 and t statistics
greater than 2.
explains 51% of the variance (r2=0.512, f (=177, p<.01). It was also found that employees voice (ss
= 0.722, p<.01) significantly predicts Work Motivation among employees.
Discussion
Our research is in strong support of the employee voice to increase employee effectiveness.
Research of Robinson et al (2004) confirms our findings, basing its survey on great Britain's NHS
10000 employees. According to this research when the employees' feel that they are being valued,
involved in critical organizational decisions, are able to freely voice their ideas, found opportunities
for career development and perceives organizational concerns as regard to their wellbeing and
health, all such components would make them increase their engagement with organization.
Research of Rees et al. (2013) illuminated that there exist not only a direct but also an indirect
linkage between employee voice and its engagement with work when it is tested to bring
improvements in the work group performance. For analysis of this relationship, two organizations
were selected. An intervening role of trust developed by employees related to higher level
management and supervisor-subordinate relationship is established. These findings also support
and backup our results related to existence of tie between employee voice and its engagement with
organization
The results of our research are also indicative of the direction that greater the organization feels that
its employees should be provided with the platform to highlight the matters of their discomforts the
greater would be the probability that employees remain committed to the organization. Similar
findings are reported by Spencer (1986). His research reported a lower turnover among the
registered general hospital nurses who were given chance to voice their dissatisfaction. He
controlled a lot many regressors that are associated with employee turnover in order to maximize the
utility of its research. He also suggests that organization with numerous employee voice raising
mechanism may be associated with high expected and perceived problem solving approach adopted
within that organization.
If the organization wants to increase its employee commitment and reduce the chances of its
turnover then organization needs to install sophisticated procedures so that employee could spotlight
grievances and get adequate solutions. Those who leave their employment without bringing matters
on table or with no attempts to alter the dissatisfactory conditions give organization no signals that
there exist something of worst nature. The organizations that attempt to listen, motivates, and
provide appropriate platform for employees to come up with solutions to the worst could be effective
enough to retain its most precious asset of human resource. In fact it could foster retention policies
by gaining information that could never be available if the employees would leave silently (freeman,
1976; Hirschman, 1976), employee voice is also positively associated with work motivation like other
effectiveness variables. One of such conclusions is drawn by Alfayad and Mohd Arif (2017).
They conducted a research on 300 employees who belong to a non-managerial group. The findings
of the research are again consistent with our research findings that when organization gives
recognition to the voice of its employees there excels a motivative culture which in turn accelerates
the level of satisfaction of employees. Looking at such findings an organization should encourage its
employee to raise their voice in matters of concerns so that efficiency and satisfaction are
encouraged.
The findings of our research are consistent with many expert's opinions and a large number of
researchers and management gurus are advocate of employee voice and its effectiveness within
individual and organizational context. Although fostering a climate that encourages openness in
communication is crucial in organization. Defense mechanism lying in every individual makes him
careful while they deal with people high up in authority position. Hence each information that is
received by higher authorities is passed through multiple levels of filtration to make it reach level of
appropriateness (Knight, 2014).
Conclusion
All the variables highlighted in the hypothesis were found out to be, positively related to employees
voice. Hence in organizations where, employees are given opportunity to raise their voice and
convey their concerns, suggestions or opinions, those organizations will have increased level of
employee effectiveness and efficacy. As proved by the results, employee voice directly impacts the
organizational commitment, employee engagement, and work motivation.
Study Limitation and Area of Future Research
This research was conducted in the corporate sector of Karachi city only. Not all variables that
impact employee effectiveness were covered in this research. Further dimensions of employee voice
need to be explored in the context of Pakistan. Furthermore, research needs to be conducted in
various geographical areas including semi-urban and rural areas for better understanding of this
topic.
References
Adams, J. S. (1965) inequality in social exchange in advances in experimental Psychology, l.

Berkowitz (ed.), academic Press, new York, nY.267-299.
Akinboye, J. O. (2001). Executive behavior battery. Ibadan: stirling-Horden Publishers.
Alfayad, Z., and Mohd Arif, L. (2017). Employee Voice and Job satisfaction: an application of
Herzberg two-factor theory. International Review of Management and Marketing, 7(1) 150-156.
Batt, R., Colvin, a., and Keefe, J. (2002). Employee voice, human resource practices and quit rates:
evidence from the telecommunications industry. Industrial and Labor Relations Review, 55(4), 573-
594.
Boroff, K. E., and lewin, D. (1997). Loyalty, voice, and intent to exit a union firm: a conceptual and
empirical analysis. Industrial and Labor Relations Review.
Botero, I. C., and Van Dyne, L. (2009). Predicting voice: interactive effects of lMX and power
distance in the U. S. And Columbia. Management Communication Quarterly, 23, 84-104.
Bryson, a. (2004). Managerial responsiveness to union and nonunion workers voice in Britain.
Industrial relations, 43(1), 213-241.
Christian, M. S., Garza, A. S., and Slaughter, J. E. (2011). Work engagement: a quantitative review
and test of its relation with task and contextual performance. Personnel Psychology, 64(1), 89-136.
Detert, J., and Burris, E. (2007). Leadership Behavior and employee Voice: is the Door really open?
Academy of Management Journal, 50(4), 869-884.
Dietz, G., Wilkinson, A., and Redman, T. (2009). Involvement and Participation. London: the sage
Handbook of Human resource Management.
Farndale, E., Van Ruiten, J., Kelliher, C., and Hope-Hailey, V. (2011). The influence of perceived
employee voice on organizational commitment: an exchange perspective. Human Resource
Management, 50(1), 113-129.
Folger, R. (1977).combined impact of "voice" and improvement on experienced inequity. Journal of

Personality and Social Psychology, 108-119.
Freeman, R. (1980). The exit-voice tradeoff in the labor market: unionism, job tenure, quits, and
separations. Quarterly Journal of Economics, 94(4), 643-673.
Freeman. R. B. (1976). Individual mobility and union voice in the labor market. American Economic
Review, 66, 361-368
Freeman, R., and Medoff, J. (1984). What do unions do? new York: Basic Books.
Gollan, P. (2001). Be aware of the voices. People Management, 7, 52-54.
Gao, L., Janssen, O., and Shi, K. (2011). Leader trust and employee voice: the moderating role of
empowering leader behaviors. The Leadership Quarterly, 22(4), 787-798.
Herzberg, F., Mausner, B., and Snyderman, B. B. (1959). The Motivation to Work (2nd ed.). New
York: John Wiley and sons
Hirschman, A. (1970). Exit, Voice, and Loyalty. Cambridge: Harvard university Press.
Hirschman, A. (1976). Some uses of the exit-voice approach-Discussion. American Economic

Review, 66, 386-391.
Janssen, G. L., and Shi, K. (2011). Leader trust and employee voice: the moderating role of
empowering leader behaviors. The Leadership Quarterly, 22, 787-798.
Knight, R. (2014). How to get your employees to speak up. Harvard Business Review.
Liu, W., Renhong, Z., and Yongkang, Y. (2010). I warn you because i like you: Voice behavior,
employee identifications, and transformational leadership. The Leadership Quarterly, 21(1), 189-202.
Macey, W., and Schneider, B. (2008). The meaning of employee engagement. Industrial and
organizational psychology, 1(1), 3-30.
Maslow, A. H. (1943). A theory of human motivation. Psychological review, 50(4), 370.
Mobley. W. H. (1977). Intermediate linkages in the relationship between job satisfaction and
employee turnover. Journal of Applied Psychology, 62, 237-240.
Mobley, W. H., Griffeth, R. W., Hand. H. H., and Meglino, B. M. (1979). Review and conceptual
analysis of the employee turnover process. Psychological Bulletin, 86, 493-522.
Morrison, E. W., and Millinken, F. J. (2000). Organizational silence: a barrier to change and develop
in a pluralistic world. Academy of Management Review, 25, 706-731.
Morrison, E. W., Wheeler-smith, S. L., and Kamdar, D. (2011). Speaking up in groups: a cross-level
study of group voice climate and voice. Journal of Applied Psychology, 96(1), 183-191.
Mowday, R. T., steers, R. M. and Porter, l. W. (1979). The measurement of organizational

commitment. Journal of Vocational Behavior, 14: 224-247
Podsakoff, P. M., MacKenzie, S. B., Paine, J. B., and Bachrach, D. G. (2000). Organizational
citizenship behaviors: a critical review of the theoretical and empirical literature and suggestions for
future research. Journal of Management, 26(3), 513-563.
Purcell, J., Kinnie, N., Hutchinson, S., rayton, B., and Swart, J. (2003). Understanding the people
and performance link: Unlocking the black box. London: CiPD.
Pyman, A., Cooper, B., Teicher, J., and Holland, P. (2006). The comparison of the effectiveness of
employee voice arrangements in australia. Industrial Relations Journal, 37(5), 543-559.
Rees, C., Alfesb, K., and Gatenbyc, M. (2013). Employee voice and engagement: connections and
consequences. The International Journal of Human Resource Management, 24(14), 2780-2798.
Robinson, D., Perryman, S., and Hayday, s. (2004). The drivers of employee engagement. Brighton:
ies.
Rousseau, D., sitkin, S. B., Burt, R. S., and Camerer, C. (1998). Not so different after all: a cross-
discipline view of trust. Academy of Management Review, 23(3), 393-404.
Spencer, D. (1986). Employee voice and employee retention. Academy of Management, 29(3), 488-
502.
Skinner, B. F. (1953). Science and Human Behavior. New York: Macmillan.
Stueart, R. D., and Moran, B. B. (2007). Library and Information Center Management (7th edition
ed.). Westport, Ct: libraries unlimited.
Steers, R. M., and Mowday. R. T. (1981). Employee turnover and post-decision accommodation
processes. In l. L. Cummings and B. M. Staw (eds.). Research in Organizational Behavior: 235-281.
Greenwich. Conn.: Jai Press
Tangirala, S., and Ramanujam, R. (2012). Ask and you shall hear (but not always): examining the
relationship between manager and consultation and employee voice. Personnel Psychology, 65(2),
251-282.
Truss, C., Soane, E., Edwards, C., Wisdom, K., CRoll, A., and Burnett, J. (2006). Working
life: Employee Attitudes and Engagement. London: CiPD.
Van Dyne, L., Graham, J., and Dienesch, R. M. (1994). Organizational citizenship behavior:
Construct redefinition, operationalization, and validation. Academy of Management Journal 37, 765-
802.
Van Dyne, L., and Lepine, J. A. (1998). Helping and voice extra role behavior: evidence of construct
and predictive validity. Academy of Management Journal, 41, 108-119.
Van Dyne, L., Kamdar, D. A., and Joireman, J. (2008). In-role perceptions buffer the negative impact
of low lMX on helping and enhance the positive impact of high lMX on voice. Journal of Applied
Psychology, 93, 1195-1207.
Vroom, V. H. (1964). Work and Motivation. New York: Wiley.
Walumbwa, F. O., and Schaubroeck, J. (2009). Leader personality and employee voice behavior:
Mediating roles of ethical leadership and work group psychological safety. Journal of Applied
Psychology, 94, 1275-1286.
Zehir, C., and Erdogan, E. (2011). The association between organizational silence and ethical
leadership through employee performance. Procedia Social and Behavioral Sciences, 24, 1389-
1404.
9. A regression-based approach to library fund allocation, William H. Walters, Library

Resources & Technical Services. 51.4 (Oct. 2007)
Full Text:
While nearly half of all academic libraries use formulas to allocate firm order funds on behalf of
particular departments or subject areas, few have adopted systematic methods of selecting or
weighting the variables. This paper reviews the literature on library fund allocation, then presents a
statistically informed method of weighting and combining the variables in a fund allocation formula.
The regression-based method of fund allocation uses current, historical, or hypothetical allocations
to generate a formula that excludes the influence of non-relevant variables as well as the influence
of arbitrary or non-systematic variations in funding. The resulting fund allocations are based on the
principle of equity--the idea that departments with the same characteristics should receive the same
allocations.
**********
Methods of allocating book funds among academic programs have been discussed in the library
literature since 1931, when Randall proposed that each department's allocation should account for
the number of titles published in the discipline as well as the cost per title. (1) Subsequent studies
have presented a wide range of fund allocation methods, including some of great sophistication. This
paper reviews the literature on library fund allocation, then presents a systematic, statistically
informed method of weighting and combining the variables in a fund allocation formula.
The approach described here is most useful for identifying the relationships that underlie a set of
previously established allocations--for revealing the formula that best matches the allocation levels
set in previous years. It is therefore especially appropriate for institutions that already allocate funds
based on historical precedent but without an explicit formula. Other libraries may find the method
helpful as a means of evaluating and refining the formulas already in place. Specifically,
the regression-based approach to library fund allocation can be used in at least three ways: to
generate an allocation formula based on previous years' allocations (in those cases where funds
have been allocated based on historical precedent without the use of a formula); to generate an
allocation formula based on subjectively established allocations (in those cases where funds have
not been allocated among departments); and to evaluate and refine the formulas already in use (in
those cases where the current formulas are unsatisfactory or otherwise in need of modification).
Rational, well-documented methods of fund allocation have several advantages over informal or ad
hoc approaches. According to the Association of Research Libraries, allocation formulas and similar
techniques promote transparency and the explicit recognition of underlying assumptions, encourage
funding practices that are consistent with the library's goals and priorities, ensure that adequate fund
monitoring mechanisms are in place, and help the library demonstrate to the university community
how its funds are being spent. (2) Fund allocation formulas also are likely to promote budgetary
stability over time (i.e., to reduce the likelihood that funding levels will vary unexpectedly from year to
year) and to allow departments to predict how changes in curriculum, enrollment, and staffing will
influence their library allocations.
Allocation formulas can even be used to influence the behavior of faculty and students. For instance,
the formula developed at Washburn University includes a variable representing the use of the library
for course-related instruction--a variable that results in higher allocations for those departments that
make greater use of the library. (3) Perhaps most important, however, is the principle of equity--the
idea that departments or programs with the same characteristics should get the same amount of
money. (4) The equitable distribution of funds requires the careful selection of funding determinants
(variables) that correspond to the goals of the library, an understanding that some determinants
should be weighted more heavily than others, and an acknowledgement that changing conditions
may require the revision or refinement of the initial allocation formula. (5)
The Fund Allocation Literature
Past and Current Practices
Although simple funding formulas have been in use since the 1930s, most libraries have relied on
subjective allocation methods until recently. (6) In the 1920s, departmental library allocations often
were set by the college president, sometimes in consultation with the faculty and occasionally with
the assistance of the library director. Many institutions simply allocated the same amount of money
to each department. (7) From the 1930s through the 1970s, approximately 70 percent of academic
libraries reserved at least some of their book funds for the use of particular departments or
programs. (8)
Not all departmental allocations were based on systematic procedures or criteria, however. Even
today, many libraries simply set each year's departmental allocation equal to the previous year's
allocation, perhaps with across-the-board adjustments for inflation or for changes in the overall
library budget. Another common practice is to reduce the allocations of those departments that did
not spend all their funds in the previous year. This can lead to rush buying and a possible decline in
selection standards at the end of the year. Still other libraries set departmental allocations based on
the total funding received by each department from the university. (9)
A review of relevant studies suggests that formula-based allocations first came into prominence in
the 1970s. Only eleven significant scholarly papers dealing with fund allocation formulas were
published from 1930 to 1969, but thirteen appeared in the 1970s, ten in the 1980s, and sixteen in the
1990s. In recent decades, roughly 40 percent of academic libraries have used formulas to allocate
funds. This proportion is likely to be somewhat higher among undergraduate colleges and somewhat
lower among research universities. (10) Of the ten university libraries described in a 1977
Association of Research Libraries report, only one used a formula with a weight for each variable.
(11)
While allocation formulas can be applied to all kinds of library materials, relatively few institutions
use formulas when allocating budgets for subscriptions or continuing resources. (12) Formulas are
used more often in the allocation of book budgets or other firm order funds. Moreover, many libraries
reserve part of the firm order budget for interdisciplinary or nondepartmental acquisitions. The
portion of the firm order budget allocated for the use of particular departments is typically around 65
percent, with values ranging from 26 to 89 percent among a set of approximately 200 institutions.
(13)
For the most part, librarians who have used fund allocation formulas are satisfied with them. Of the
college librarians who reported using allocation formulas in a 1995 survey, 77 percent felt that the
formulas they used were equitable. (14) Another large-scale survey revealed widespread satisfaction
among librarians who expressed either positive or negative views of their libraries' fund allocation
formulas. (15) Unfortunately, no published evidence shows librarians' satisfaction with other methods
of fund allocation or with book budgets that are not allocated along departmental lines.
Critiques of Formula-Based Fund Allocation
Several authors have argued that allocation formulas leave no room for the kinds of scholarly
judgments that have traditionally been made by subject librarians. For example, Brownson asserts
that when an allocation formula is adopted, "the role of expert judgment has thus been withdrawn to
the administrator, whose judgment is managing and political rather than scholarly." (16) In reality,
nothing about the formula-building process privileges administrative authority or excludes scholarly
expertise. If anything, fund allocation formulas reduce the likelihood that allocations will be assigned
on arbitrary or purely political grounds. While Brownson carefully avoids expressing his own
allocation method in algebraic terms, it is nonetheless a formula.
Likewise, Freeman argues that "College librarians should replace formulas with good judgment
achieved through (1) continuous discussions with every faculty member; (2) thorough analysis of
course syllabi; (3) feedback from librarians conducting bibliographic instruction and from reference
librarians handling reference questions; (4) systematic evaluation of faculty publications and
research in progress; and (5) information about new courses, majors, and programs." (17)
Interestingly, every one of these assessment activities can be used as a means of gathering precise,
quantitative information for use in an allocation formula. Moreover, the development of an 'allocation
formula often provides both the opportunity and the incentives needed for exactly the kinds of
evaluative tasks that Freeman mentions. The process of developing an allocation formula lends itself
to a project-based framework in which goals, objectives, and expectations are made explicit--a
framework that may be especially useful at those institutions where collection assessment activities
have not been conducted systematically or rigorously over the years.
A second criticism is that fund allocation formulas lack objectivity--that they give the appearance of
scientific rigor without removing the need for subjective decision making. (18) Strictly speaking, this
assertion is correct with regard to regression-based formulas. Although regression analysis is used
widely in the sciences, the technique is not inherently scientific. In the context of fund
allocation, regression is used primarily to specify the relationships among the variables and only
secondarily to discover the true determinants of past or current funding levels. Like conventional
fund allocation formulas, regression-based formulas rely on subjective judgment at several stages of
the process: in the selection of variables, the compilation or construction of those variables, and the
specification of the regression model. The merit of funding formulas is not that they are objective
(they are not), but that they are systematic and unbiased--that departments with the same relevant
characteristics will receive the same allocations, and that non-relevant characteristics will have no
bearing on the results. The ultimate goal, an equitable distribution of funds, requires both the careful
selection of funding determinants and the use of a formula-building procedure that results in a
systematic and unbiased outcome.
Variables Used in Fund Allocation Formulas
The development of a fund allocation formula can be viewed as a two-stage process that involves
(1) selecting the determinants of funding (the variables to be used in the equation), and (2) deciding
how to combine and weight the variables so that the most important factors have a greater role in
determining the outcome of the formula.
At least nine papers have described the variables that are potentially useful as determinants of
departmental funding levels. (19) Together, the nine papers present more than sixty distinct
variables representing a wide range of departmental, subject-based, and library-specific attributes.
Fortunately, several methods can be used to arrive at a more manageable list of potentially relevant
variables. One method is to solicit librarians' rankings of the various indicators. For example,
Greaves asked librarians at fifty-four colleges and universities to rate the importance of twenty-four
variables that might be included in a hypothetical allocation formula. (20) The variables considered
most important, in order, were the adequacy of the library collection within the subject area, the
number of new courses offered by the department, the number of students associated with the
department as majors or graduate students, the number of recognized disciplines included within the
department, the number of undergraduate majors, and total enrollment (credit hours).
A second method of identifying potentially relevant variables is to determine which ones have been
used most often in actual fund 'allocation formulas. Table 1 presents the results of three
majorsurveys of academic libraries along with a content analysis of the variables that appear in fifty-
five published allocation formulas. (See appendix A for details.) Together, the values shown in table
1 represent the allocation formulas used at several hundred colleges and universities. The three
surveys and the content analysis yield similar results, revealing that certain variables have been
used far more often than others. The eight most frequently used variables measure two external
factors--the number and cost of the titles published within each discipline--along with various aspects
of the departments' courses (course offerings, course enrollment), personnel (number of faculty,
number of students), and library use (circulation, course-related use).
Several authors have presented classifications of the variables that are typically used as
determinants of departmental funding levels. (21) These classifications can be used to help ensure
that all relevant factors are included within a fund allocation formula. The most conceptually useful
classification groups the variables into three categories: supply (number of new titles published),
demand (departmental enrollment, faculty, course offerings, and so on), and cost (average price per
title). (22)
Supply, often represented by the number of titles published or reviewed in the previous year, is
important because it accounts for the fact that far more books appear in certain subject areas than in
others--far more in history than in chemistry, for example. An equitable fund allocation formula might
be defined as one that allows each department to acquire a roughly equal percentage of the relevant
titles published each year.
Demand variables, such as the number of students, faculty, or courses, are significant because they
represent the relative importance of each department or program within the university--not in an
educational sense, but in the competition for students and institutional resources. A department
offering more courses, serving more students, or supporting more faculty research will presumably
require a greater share of the book budget. Most allocation formulas include several demand
variables, partly to represent the various dimensions of demand (courses, personnel, library use),
and partly because demand-related data are often readily available.
Some authors feel that demand is of primary importance-that library use or circulation should be the
sole or chief determinant of departmental library allocations. (23) For instance, Carrigan argues that
"only through use are benefits from investment in library collections realized." (24) This assertion is
contrary to economic evidence, which demonstrates that several components of value are
independent of use or only indirectly related to it. (25) User value (the value derived from actual use)
can be contrasted with option value (the value associated with potential future use), existence value
(the value assigned to the existence of a resource even by those who never intend to use it), and
bequest value (the value associated with the maintenance or preservation of a resource for use by
others). Moreover, low circulation can represent several factors other than low demand: the unmet
need associated with weak or outdated subject collections, the presence of specialized research
programs in certain fields, inexpert book selection by library patrons, or subject-specific publishing
practices that encourage photocopying rather than borrowing--the publication of edited collections
rather than single-authored monographs, for example.
Finally, the inclusion of a cost variable acknowledges the fact that library materials in some
disciplines (art and chemistry, for instance) are more expensive than those in others. The cost
variable can therefore help ensure equity in the number of titles purchased.
The most effective fund allocation formula will include not just the best variables, but the best set of
variables. The goal is to represent all the appropriate determinants of funding while avoiding the use
of multiple variables to represent a single concept. For example, enrollment might be expressed in
terms of either students or credit hours, but normally not both. At least one study has shown how
factor analysis can be used to select those variables that best represent the underlying
characteristics found within a set of many potentially relevant variables. Using data for the South
Dakota School of Mines and Technology, McGrath and associates constructed three factors that
accounted for 85 percent of the variation within a set of twenty-two variables. (26) The three factors--
-course-related demand, research-related demand, and size of the user population--were closely
associated with three of the original variables: the total number of credit hours taught within the
department, the number of works cited in the graduate theses accepted over a two-year period, and
the total number of undergraduate majors and graduate students registered with the department.
While not all institutions will benefit from the use of such a sophisticated procedure, the technique
developed by McGrath and associates is the best way to identify the most representative variables
for use in an allocation formula. Four authors provide especially good introductions to factor
analysis. (27)
Regardless of the method used to arrive at a set of variables, the choice of variables is ultimately
subjective and prescriptive rather than descriptive--not "Which variables are most closely related to
current funding levels?" but "Which variables ought to determine how much money is allocated to
each department?"
Methods of Weighting and Combining the Variables

Every fund allocation formula must weight the variables and combine them. Even the simplest
approach--listing the variables and adding up their values--implicitly incorporates a system of
weights (each variable weighted equally), and a combination method (additive).
Approximately two-thirds of the institutions that use allocation formulas specify unequal weights for
the variables. (28) The weights do matter. Applying the formulas in use at seven different colleges to
a single data set representing one particular library, Young found substantial variation in the
resulting allocations. (29) For example, the proportion of the total book budget allocated to Biology
varied from 27 to 47 percent when different weights were used. The proportion allocated to geology
varied from 4 to 26 percent. Unfortunately, few colleges and universities have systematic procedures
for weighting the variables in their allocation formulas. As noted in the guidelines published by the
Association of Research Libraries:
There is no generally recognized standard for weighting the

[variables]. The weight given to a particular factor in a library
will be determined by the goals and resources of the library, and
will be tailored to the individual library. Many institutions
determine their own weightings; e.g., enrollment in upper division
units is worth two of lower division units. Others simply weight
all factors in a formula equally. (30)
Many librarians realize the importance of devising a formula consistent with the institution's collection
development policy as well as the need to solicit input from stakeholders both inside and outside the
library. (31) Beyond that, however, most appear to use subjective or even arbitrary weights. Of the
fifty-four institutions that have published their allocation formulas (appendix A), none provide an
explicit rationale for the assignment of weights.
Most fund allocation formulas combine just a few variables. For example, a typical formula might
express each department's share (percentage) of the total allocated budget as
(0.4 x e/E) + (0.3 x m/M) + (0.2 x p/P) + (0.1 x g/G)
where
e is the total enrollment in courses offered by the department or program
E is the sum of the e values, all departments combined
m is the number of undergraduate majors in the department
M is the sum of the m values, all departments combined
p is the estimated price per title in the relevant subject area
P is the sum of the p values, all departments combined
g is a variable coded 1 if the department offers graduate courses and 0 otherwise
G is the sum of the g values, all departments combined.

In this example, undergraduate enrollment, number of majors, average book price, and the presence
or absence of graduate programs are weighted 40 percent, 30 percent, 20 percent, and 10 percent,
respectively. This formula also illustrates the most common method of combining the variables. Each
department's share (of students, courses, library funds, and so on) is expressed as a percentage of
the total for the university as a whole. (32) The equation, a simple additive model, incorporates the
assumption that no special relationships exist among the variables--that each contributes directly
and proportionally (although not necessarily equally) to the outcome.
While most of the fifty-four libraries listed in appendix A have adopted very simple allocation
formulas, at least three modifications to the basic formula can be found in the literature. The first
modification is to include one or more variables as negative (undesirable) factors when determining
the level of funding each department ought to receive. For instance, the formula
(0.4 x e/E) + (0.4 x m/M) + (0.2 x p/P) + (0.1 x g/G) - (0.1 x x/X)
specifies that departments with higher levels of x should receive less money than the others. (The x
variable can be anything: new books that never circulate, unspent library funds, faculty who receive
poor evaluations, and so on.)
A second possible modification is to express one or more of the variables in square root or
logarithmic form. For instance, modifying the basic formula so that e (course enrollment) equals the
square root of undergraduate enrollment produces a formula in which differences in enrollment at
the lower end of the scale count much more than differences in enrollment at the higher end of the
scale. This approach is especially useful when one or two departments are far larger than the others
when the largest departments ought to get more money, but not in direct proportion to their size. As
Lowry states:
A strong case can be made that as the number of credit hours
generated increases, particularly in large classes of service
courses with many sections, there is a diminishing need to provide
book funds to support credit-hour production. Put another way, in
the allocation of acquisitions funds, the credit hours produced by
the first student . . . should count tar more than [those produced
by] the 251st student. (33)
A third possible modification is to treat the cost variable separately, as in
[(0.5 x e/E) + (0.4 x m/M) + (0.1 x g/G)] x p/P
where P is the average price of a book, all disciplines combined. By introducing the price multiplier
outside the rest of the equation, this formula ensures that departments with the same characteristics
will be able to purchase the same number of books.
Although the library literature reveals no cases in which institutions have adopted methods of fund
allocation based on the principle of resource optimization, several such techniques have been
proposed. For example, Goyal describes a fund allocation method based on linear programming, a
mathematical technique used to determine the optimal allocation of resources under specified
conditions. (34) Unfortunately, Goyal's method does not provide clear guidance for the construction
or weighting of the variables. It requires the subjective determination of "the importance which
society attaches to the work of the department" and "the importance which the university gives to the
work of the department"--judgments that must be made outside the linear programming framework.
(35) Likewise, the economic model developed by Gold relies on the subjective assessment of the
extent to which students' library use benefits the university. (36) Gold's method also has been
criticized for its emphasis on economic efficiency rather than equity among departments. (37) More
recent applications of linear goal programming avoid the use of weights but do require the
specification and ranking of the library's goals and priorities before the analysis is performed. (38)
A Regression-based Approach
Multiple regression, a standard statistical technique, can be used to assign weights to a set of
variables so that the resulting formula produces results consistent with a set of predetermined
values. For instance, regression can be used to assign weights to a set of supply, demand, and cost
variables so that the resulting allocations are consistent with previous years' fund allocations. Within
this context, regression analysis can be used in at least three ways:
* to construct an allocation formula based on previous years' allocations; this is appropriate for
libraries that already assign funds to departments, but without the use of an explicit formula,
* to construct an allocation formula based on subjectively established allocations; this is appropriate

for libraries that have never allocated funds to particular departments but that have nonetheless
determined (at least in general terms) the amount that each department should receive, and
* to evaluate or refine existing allocation formulas or procedures.
The refinement of existing formulas or procedures can take several forms. For
instance, regressioncan be used to modify a set of allocations so that they reflect the influence of
only those variables that have been explicitly selected. This procedure removes the unique influence
of any other variables--those excluded from the equation--as well as any random or non-systematic
variations in funding. Likewise, regression can be used to create a new formula that maintains
allocations similar to those used in previous years, but based on a new set of variables--variables
that are more appropriate or more readily operationalized than those that were used in the past.
Regression analysis has been used in at least three previous fund allocation studies. More than
thirty years ago, Pierce used stepwise log-linear regression to create an innovative book use
variable incorporating only those components of circulation that could not be attributed to collection
size or previous years' acquisitions. (39) Pierce did not use regression to weight or combine the
variables in his formula, however. Similarly, Brownson constructed a conventional allocation formula,
then used regression to examine the relationships among the variables. (40) He found, among other
things, that past years' expenditures are closely related to current research activity but not closely
related to perceived collection strength.
Finally, at least one university has used regression in the construction of a fund allocation formula.
(41) The formula, incorporating seven variables (undergraduate majors, graduate students, faculty,
courses taught, library circulation, current collection size, and average book price), is presented only
briefly, however. The university is not identified, and just one sentence of commentary is provided:
"The formula was derived from a regression analysis using over ten years of historical data." (42)
The process used to arrive at a regression-based allocation formula is the same regardless of the
reasons for undertaking the analysis. The regression-based approach to fund allocation has five
essential steps:
1. Select a dependent variable that represents current, past, or hypothetical funding levels.
2. Identify a set of potential explanatory variables--factors that ought to influence the amount of
money spent on behalf of each department.
3. Select the final set of variables for use in the regression analysis. (Compile and prepare the data,
then examine the correlations among the explanatory variables. Decide which ones to include.)
4. Perform the regression analysis using a statistical package, such as SPSS, MINITAB, or SAS.
5. Interpret the results.
This approach will result in a fund allocation formula that is a weighted combination of the variables
included in the analysis.
Step 1: Select a Dependent Variable
The regression-based approach requires not only a set of variables for inclusion in the allocation
formula, but a separate variable (called a dependent variable) that represents current, past, or
hypothetical funding levels. For libraries that have previously allocated funds for the use of particular
departments, this variable can be the most recent set of allocations or the average of several years'
allocations. For libraries with no history of departmental fund allocation, this variable must be
developed outside the regression framework based on the professional expertise of the librarians,
faculty, and staff.
Although the initial assignment of subjectively determined allocations is no more systematic than the
ad hoe development of a fired allocation formula, the regression-based approach is appropriate
whenever the individuals who allocate funds have more confidence in their ability to assign
allocations to departments than in their ability to develop a new formula from scratch. Even though
the initial allocations are assigned on subjective grounds before the analysis is undertaken,
the regression procedure results in a new set of allocations that incorporate only the influence of
those variables included in the equation. Any arbitrary or non-systematic variation in the original
allocations will be excluded from the final allocations that emerge from the regression-based
procedure.
Step 2: Identify a Set of Potential Explanatory Variables
For the explanatory variables--those that will be included in the fired 'allocation formula--several
methods of selection can be used. As discussed earlier, most libraries' allocation formulas include
variables representing external factors (the number and cost of the titles published within each
discipline) as well as internal, institutional characteristics, such as courses (course enrollment,
course offerings), personnel (number of faculty, number of students), and library use (circulation,
course-related use). While previous research and practice provide guidance in the selection of
variables, the decision to include or exclude a particular variable is subjective and likely to depend
on local conditions.
Ideally, the explanatory variables will reflect the situation that ought to prevail rather than the
historical conditions that are most likely to have resulted in the current fund allocations. Because
practical considerations cannot be ignored, most allocation formulas include variables that are
"resistant to deliberate local manipulation" and that can be represented adequately by data compiled
within the library or elsewhere on campus. (43) Data for at least ten of the fourteen variables shown
in table 1 are available at most universities from the registrar's office, the office of institutional
research, or the library's own acquisitions, circulation, and interlibrary loan systems. The number of
titles published, cataloged, or reviewed in each subject area can be estimated from data presented
in the American Book Publishing Record, Books in Print, the Bowker Annual Library and Book Trade
Almanac, Choice, Publishers Weekly, or WorldCat, or from vendors' approval plan records. One
advantage of using Choice or approval plan records is that they cover only those titles that are
appropriate for academic libraries. The cost of library materials can be estimated from many of the
same sources, or calculated from internal library records. Shreeves lists ten sources of price data.
(44)
Not every potentially relevant variable should be included in the regression analysis, however. There
are at least three reasons to limit the number of variables: to avoid specification error (the exclusion
of important variables or the inclusion of irrelevant variables), to avoid unnecessary work in
compiling the data, and to achieve more robust results--results that are less likely to vary as a result
of minor changes in the model specification or the data. With regard to robustness, many
statisticians recommend using no more than one-tenth as many variables as cases. In practice,
however, a less stringent standard is often applied. For a set of thirty or thirty-five departments, the
use of five or six variables is a reasonable compromise between the need to include all the important
determinants of funding and the need to limit the number of variables relative to the number of
cases.
Figure 1 describes the explanatory variables that might be considered for inclusion in the fund
allocation formula of a typical college library. They include six of the seven variables most often used
in actual fund allocation formulas (table 1) as well as an indicator of student research activity--the
number of senior projects and master's theses completed. Each variable represents a particular
aspect of the external or internal environment. Specifically, the seven variables correspond to the
three categories identified by Sweetman and Wiedemann:
* supply: t (number of titles published);

* demand: c (number of courses offered), e (course enrollment), h (number of projects and theses
completed), f (number of faculty), and m (number of majors and graduate students); and
* cost: p (price per title). (45)
Step 3: Select the Final Set of Variables
The example analysis presented in this paper is based on data for St. Lawrence University, a small
liberal arts college in Canton, New York. Data for the seven variables shown in figure 1 are
presented in appendix B. If the procedures described in this paper are carried out properly using
those data, the results should be identical to those reported here.
While a regression analysis might be undertaken with all seven variables, a more reliable technique
is to first identify and exclude those explanatory variables that are closely related to the other
variables in the set. The use of closely related variables can result in two related problems:
specification error and multicollinearity. (46) Broadly speaking, the unique impact of a particular
variable (and the importance of that variable as a determinant of fund allocation levels) is more
difficult to ascertain when the variable is closely related to the others in the equation. No absolute
standard exists for identifying closely related variables, although any explanatory variable correlated
with two or more others at the 0.80 level or higher is likely to warrant further examination. The
correlations among the explanatory variables can be assessed using Excel (the CORREL function),
SPSS (Analyze--Correlate--Bivariate), MINITAB (Stat--Basic Statistics--Correlation), or another
statistical package.
In the example analysis, variables m (majors), e (course enrollment), f (faculty), and h (projects and
theses) are all interrelated (see table 2). Closer examination reveals that all the correlations with
absolute values greater than 0.80 can be eliminated through the exclusion of two variables: m and
either e or f.
The m (majors) variable should be excluded for two reasons. First, it is closely related to at least
three other explanatory variables: e (course enrollment), h (projects and theses), and f (faculty).
Second, data on the number of majors are especially likely to be adversely affected
bymeasurement error. Many students change their majors, others have no declared major despite
their strong interest in a particular field, and at least some intend to graduate with a major different
than the one for which they are officially enrolled.
Although neither e (course enrollment) nor f (faculty) must be excluded due to the correlations
shown in table 2, variable e should probably be excluded due to the characteristics of the particular
institution represented by these data. Specifically, St. Lawrence University is a small college where
many of the stronger or more distinctive departments do not have high course enrollments. On the
other hand, variable f (faculty) ought to be included in the equation, as the number of faculty tends to
correspond to the number of distinct teaching or research areas represented within each
department. At St. Lawrence, faculty often are hired to cover specific disciplinary areas that are likely
to require unique library resources.
Variables t (number of titles published), c (number of courses offered), and p (price per title) are only
weakly related to the other explanatory variables (see table 2). This is not unexpected, as variables t
and p represent supply and cost rather than demand. The absence of strong relationships between
variable c and the other demand variables indicates that the number of courses represents a
component of demand that is essentially unrelated to the number of students, faculty, or research
projects.
Step 4: Perform the Regression Analysis
Regression analysis reveals the relationships between a single dependent variable and a set of
explanatory variables (also called independent or predictor variables). In this example, the
dependent variable is the amount allocated for monographic firm orders in a recent year (variable a)
and the explanatory variables are those that will be included in the fund allocation formula: t (number
of titles published), c (number of courses offered), h (number of projects and theses completed), f
(number of faculty), and p (price per title) (see figure I for details). The regressionequation can be
expressed in the form
a = ([w.sub.t] x t) + ([w.sub.c] x c) + ([w.sub.h] x h) + ([w.sub.f x f]) + ([w.sub.p x p) + b
where the w values are the weights associated with each variable. The b term at the end of the
equation is a constant--a specific, fixed value to be added to each department's allocation. The
constant, also called a y-intercept, emerges from the regression analysis; it is not specified in
advance. While most statistical software packages will let the user specify a y-intercept of zero, the
inclusion of a non-zero constant will produce a regression equation that better fits the data.
The regression procedure can be understood best through an example involving a single
explanatory variable. Figure 2 shows the regression line corresponding to the equation
a = (0.00166 x t) + 1.84347
where a is the allocation for each department and t is the number of titles published in each
corresponding subject area. If the number of titles published were the only factor influencing the
departmental allocations, then one would expect each dot (each department) to fall somewhere
along the regression line. The allocation for any particular department could then be determined by
finding the number of titles on the horizontal axis, finding the same place on the regression line, and
reading off the allocation on the vertical axis. In fact, however, the allocation for each department is
influenced by several factors other than the number of titles published. Consequently, most of the
dots are above or below the regression line rather than right on it.
Nonetheless, the regression line and the corresponding equation have been calculated to most
effectively represent the linear relationship between the two variables. The regression line is the line
that most closely conforms to the pattern of dots. Specifically, it is the line that minimizes the sum of
the squared vertical distances between the dots (which represent the actual situation) and the line
itself (which represents the situation that would exist if the number of titles published were the only
determinant of fund allocation levels). Figure 2 is a relatively simple example showing just two
variables--one dependent variable and one explanatory variable. With three variables, the graph
would need to be represented in three dimensions, and the line would become a plane. With six
variables, the graph would need six dimensions. While the six-variable regression cannot be shown
geometrically, the corresponding equation can be solved algebraically.
The data for the example regression appear in appendix B. When regression is used to construct an
allocation formula, one does not need to express each variable as a percentage of the total for the
university as a whole. The inclusion of one or more variables in square root or logarithmic form may
sometimes be appropriate, however. As noted earlier, such transformations can be used to specify a
non-linear relationship between a particular variable (enrollment, for example) and fund allocation
levels. In most applications of regression, the goal is to identify the model that best fits the observed
data; the best-fitting model (linear or otherwise) is selected. When regression is used to construct a
fund allocation formula, however, the goal is to produce an acceptable model that is consistent with
the library's objectives. If there is good reason to believe that the largest departments should not
receive allocations in proportion to their size, then it is appropriate to incorporate that stipulation into
the regression equation through a transformation of the relevant variable-even if the resulting model
does not provide the best possible fit. The assumptions underlying regression, and the effects of
intentionally or unintentionally violating those assumptions, are described most clearly by Achen,
Berry, Kahane, and Lewis-Beck. (47)
To conduct the regression analysis, first enter data for all the relevant variables into SPSS,
MINITAB, or another statistical package. (Appendix B shows how the data should be arranged.)
Next, choose the type of analysis and specify the variables. In SPSS, select Analyze--Regression--
Linear; in MINITAB, select Stat--Regression--Regression. The fund allocation variable is the
dependent or response variable; the other variables are independent or predictor variables. The
default analysis options will not need to be altered.
[FIGURE 2 OMITTED]
Step 5: Interpret the Results
In the fund allocation context, the most important statistics to emerge from the analysis are the
unstandardized regression coefficients. These can be found near the end of the SPSS output (in the
Coeffieients table--the column labeled B) or near the top of the MINITAB output (the column labeled
Coef). In SPSS, eliek on each value in the Coefficients table to see additional decimal places.
Because each coefficient is a weight in the fired 'allocation formula, the coefficients can simply be
inserted into the standard regression equation. For the data shown in appendix B,
theregression equation is
a = (0.00154 x t) + (0.00602 x c) + (0.01561 x h) + (0.00518 x f) + (0.06251 x p) - 1.56631.
This formula can be used to calculate an allocation for each department. It also can be used to show
how a change in the number of course offerings, senior projects, or faculty would affect each
department's library allocation. For example, each senior project or master's thesis (h) brings in an
additional 0.01561 percent of the allocated firm order budget. With a total allocated budget of
$200,000, each senior project or thesis represents an additional $31.22 in departmental library
funding. Under the same assumptions:
* Each new title published in the relevant subject area (t) brings an additional 83.08.
* Each new departmental course (e) brings an additional $12.04.
* Each new faculty position (f) brings an additional $10.36.
* An increase of $1 in average cost per title (p) brings an additional $125.02.
In comparison with most fund allocation formulas, this particular formula emphasizes external factors
(the number and cost of the books available for purchase) rather than internal factors, such as the
number of courses and faculty associated with each department. The formula is "correct'" because it
accurately represents the implicit and previously unspecified variables and weights on which earlier
'allocations were based. Many librarians would argue that the number of courses and faculty should
have a greater impact on departmental allocations, and an adjustment to reflect that view would be
entirely legitimate. However, any such adjustment would represent an intentional modification of the
procedure (the implicit formula) on which earlier allocations were based--a fact that may be
important when justifying the change to constituents outside the library.
A second important statistic to emerge from the regression analysis is the adjusted [R.sup.2] value.
This value ranges from 0 to 1, but is typically within the 0.3 to 0.7 range. A high [R.sup.2] value
indicates that the regression equation fits the data well--that the allocation levels that emerge from
the regression analysis are similar to the previous or hypothetical allocations (those represented by
the dependent variable). In contrast, a low [R.sup.2] value indicates a relatively poor fit--that the new
allocations are substantially different from the previous or hypothetical allocations.
A low [R.sup.2] value does not mean that the regression-based formula is deficient or unreliable,
however. In fact, it indicates only that the original fund allocations were established through a non-
systematic process, or a process based on factors that are no longer relevant and therefore not
found in the regression equation. That is, a low [R.sup.2] value shows that the original fired
allocations were inequitable (in the sense that departments with similar characteristics could receive
different allocations) and that a more systematic method of fund allocation--the regression-based
method, for instance--is likely to be an improvement over past practices. For St. Lawrence
University, the adjusted [R.sup.2] value is 0.44. This indicates that 44 percent of the
interdepartmental variation in the original fired allocation levels can be attributed to the explanatory
variables included in the regression equation.
Guides to regression analysis often emphasize significance tests, which are used to make
generalizations about a population based on data for a sample. Significance tests are not especially
meaningful in the fund allocation context, as the entire population of interest (the set of all the
departments at a particular institution) is included within the data used in the analysis.
Commentary
Regression removes the influence of non-systematic variations in funding as well as the influence of
variables not included in the equation. Consequently, the allocations that result from a regression-
based formula may be appreciably different from those used in the past. This is most likely to occur
among the smaller academic departments, especially when the [R.sup.2] value is low. In an
analytical sense, this is not a problem; it represents the natural result of an approach that treats
similar departments similarly. Realistically, however, reductions in funding are not likely to be
greeted enthusiastically by the departments affected.
One method of dealing with the problem is to set aside a portion of the allocated budget for
distribution in accordance with the earlier allocation procedure, gradually increasing the proportion of
the budget that is allocated in accordance with the new formula. Another approach is to ask for
special short-term funding to ensure that no department experiences a sudden reduction in its library
allocation. These approaches to implementation are not specific to regression-based fund allocation
methods. For example, the conventional formula adopted in the mid-1980s by Ohio University was
put into place gradually so that no department's allocation was reduced. (48)
Two strategies might be used to encourage the acceptance of a regression-based approach to fund
allocation. The first is to introduce regression strictly as an analytical technique--as a means of
evaluating the extent to which each variable influences current funding levels and as a means of
identifying those departments with actual allocations substantially higher or lower than the calculated
values. This strategy, which can be adopted without external support or collaboration, is especially
appropriate when stakeholders outside the library are unlikely to accept the procedure on its own
merits when they are likely to evaluate its acceptability primarily in terms of its impact on their own
departments. This strategy also is appropriate when the regression analysis is based not on actual
allocations, but on a set of hypothetical allocations established subjectively by the librarians.
When regression is used primarily for analytical purposes, the resulting formula (with or without
subsequent adjustments) can be presented to faculty and administrators without reference to the
means by which it was developed.
A second strategy is to introduce regression right from the start as a method of allocating funds--to
gain support for the procedure before the analysis is conducted so that the results will be less
subject to criticism afterward. This approach is especially useful when a high proportion of faculty
and administrators are familiar with regression analysis and willing to participate in the most
important part of the process--the selection of relevant variables. Because the regression-based
formula still may require modification, the interested parties may want to agree beforehand about the
appropriate procedure for adjusting the formula. In particular, the legitimate reasons for adjustment
(disciplinary accreditation requirements, for example) should be specified in advance.
If the regression-based approach to fired allocation proves acceptable, subsequent years' allocations
can be set by using the same proportional allocation of funds each year, or by using the same
formula with new (current) data. If the second approach is adopted, the new departmental
allocations will not necessarily total 100 percent, so they may need to be increased or decreased
proportionally (multiplied or divided by a constant). With either approach, the formula itself should be
re-evaluated after several years.
The regression-based method of fund allocation relies on a statistical technique that has been in use
for several decades. At the same time, the particular application of regression presented in this
paper has not been tested through implementation and practice. Further investigation is needed to
assess the effectiveness of the method at various types of institutions and to determine the usual
range of variation in the formulas developed at particular colleges and universities.
The same regression-based approach can be used for strictly analytical purposes--to examine the
broader relationships underlying various fund allocation strategies. Several questions might be
considered. For example, are the fund allocation strategies adopted by major research universities
more strongly affected by external factors than those adopted by liberal arts colleges? Does
the regression-based method of fund allocation produce systematically high or low allocations for
certain fields of study or certain kinds of academic departments? One advantage of the regression-
based approach is that it can be used to evaluate the determinants of funding even at those
institutions that do not use explicit fund allocation formulas.
Appendix A. Data Sources for Table 1
Table 1 presents the results of three major surveys of academic

libraries (Budd and Adams 1989; Greaves 1974; Tuten and Jones 1995)
along with a content analysis of the variables that appear in
fifty-five published allocation formulas. The fifty-five published
formulas represent fifty-four colleges and universities.
Institution Source
Arizona State University Brownson (1991)

Arkansas Technical University Tuten and Jones (1995)
Augusta College Tuten and Jones (1995)
Aurora University Tuten and Jones (1995)
Baker University Tuten and Jones (1995)
Berry College Tuten and Jones (1995)
California University of Tuten and Jones (1995)
Pennsylvania
Carleton College Richards (1953)
Catawba College Tuten and Jones (1995)
Central Missouri State University Brookshier and Littlejohn (1990);
Niemeyer et al. (1993)
Colorado State University Association of Research
Libraries (1977)
Columbia Union College Tuten and Jones (1995)
Curtin Institute of Technology Allen and Tat (1987)
Davidson College Tuten and Jones (1995)
Elon College Jones and Keller (1993)
Florida Gulf Coast University Donlan (2006)
Fort Valley State College Tuten and Jones (1995)
George Mason University Rein et al. (1993)
Georgia College Tuten and Jones (1995)
Goucher College Falley (1939)
Illinois Wesleyan University Tuten and Jones (1995)
Keuka College Tuten and Jones (1995)
Lander College Tuten and Jones (1995)
Lynchburg College Scudder (1987)
Lyndon State College Tuten and Jones (1995)
Manchester College Willmert (1984)
Methodist College Tuten and Jones (1995)
Mount St. Mary's College and Tuten and Jones (1995)
Seminary
Notre Dame University of Nelson Welwood (1977)
Ohio University Mulliner (1986)
Olivet Nazarene University Tuten and Jones (1995)
Shepherd College Tuten and Jones (1995)
Simon Fraser University Copeland and Mundle (2002)
South Dakota School of Mines and McGrath (1967)
Technology
Southern Arkansas University Tuten and Jones (1995)
Southwest Texas State University Bourgeois et al. (1998)
Southwestern Oklahoma State Tuten and Jones (1995)
University
St. John Fisher College Tuten and Jones (1995)
St. Mary's University Tuten and Jones (1995)
St. Norbert College Tuten and Jones (1995)
Stetson University Tuten and Jones (1995)
SUNY College at Potsdam Tuten and Jones (1995)
Transylvania University Tuten and Jones (1995)
Union University Tuten and Jones (1995)
University of Colorado Ellsworth (1942)
University of Constance Schmitz-Veltin (1984)
University of South Carolina at Tuten and Jones (1995)
Aiken
University of Southwestern McGrath (1975)
Louisiana
University of Stellenbosch Graf Eckbrecht von Durckheim-
Montmartin et al. (1995)
University of Texas Coney (1942)
University of Wichita Hekhuis (1936)
Washburn University Tuten and Jones (1995)
Western Washington University Packer (1988)
Youngstown State University Genaway (1986)
Data Sources
Allen, G. G., and Lee Ching Tat. "The Development of an Objective Budget Allocation Procedure for
Academic Library Acquisitions." Libri 37, no. 3 (Sept. 1987): 211-21.
Association of Research Libraries. The Allocation of Materials Funds in Academic Libraries. SPEC
Kit 36 (Washington, D.C.: Association of Research Libraries, 1977). Bourgeois, Eugene, et al.
"Faculty-Determined Allocation Formula at Southwest Texas State University." Collection
Management 23, no. 1/2 (1998): 113-23.
Brookshier, Doris, and Nancy E. Littlejohn. "Resource Allocation within an Automated Fund
Accounting System." In Proceedings, Acquisitions "90 Conference on Acquisitions, Budgets, and
Collections, ed. David C. Genaway, 45-55 (Canfield, Ohio: Genaway and Assoc., 1990).
Brownson, Charles W. "Modeling Library Materials Expenditure: Initial Experiments at Arizona State
University." Library Resources & Technical Services 35, no. 1 (Jan. 1991): 87-103.
Budd, John M., and Kay Adams. "Allocation Formulas in Practice." Library Acquisitions: Practice &
Theory 13, no. 4 (Winter 1989): 381-90.
Coney, Donald. "An Experimental Index for Apportioning Departmental Book Funds for a University
Library." Library Quarterly 12, no. 3 (July 1942): 422-28.
Copeland, Lynn, and Todd M. Mundle. "Library Allocations: Faculty and Librarians Assess
'Fairness.'" portal: Libraries and the Academy 2, no. 2 (Apr. 2002): 267-76.
Donlan, Rebecca. "How Much Does Biology Really Need, Anyway? Determining Library Budget
Allocations." Library Issues 26, no. 6 (July 2006): 1-4.
Ellsworth, Ralph E. "Some Aspects of the Problem of Allocating Book Funds among Departments in
Universities.'" Library Quarterly 12, no. 3 (July 1942): 486-94.
Falley, Eleanor, W. "An Impersonal Division of the College Book Fund." Library Journal 64, no. 21
(Dec. 1, 1939): 933-35.
Genaway, David C. "The Q Formula: The Flexible Formula for Library Acquisitions in Relation to the
FTE Driven Formula." Library Acquisitions: Practice & Theory 10, no. 4 (Winter 1986): 293-306.
Graf Eckbrecht yon Durckheim-Montmartin, Max E., et al. "Library Materials Fund Allocation: A Case
Study." Journal of Academic Librarianship 21, no. 1 (Jan. 1995): 39-42.
Greaves, F. Landon, Jr. The Allocation Formula As a Form of Book Fund Management in Selected
State-Supported Academic Libraries. Ph.D. dissertation, Florida State Univ., 1974.
Hekhuis, L. "A Formula for Distribution of Library Funds among Departments." Library Journal 61,
no. 14 (Aug. 1936): 574-75.
Jones, Phimmer Alston, Jr., and Connie L. Keller. "From Budget Allocation to Collection
Development: A System for the Small College Library." Library Acquisitions: Practice & Theory 17,
no. 2 (Summer 1993): 183-89.
McGrath, William E. "Determining and Allocating Book Funds for Current Domestic Buying." College
& Research Libraries 28, no. 3 (July 1967): 269-72.
--. "A Pragmatic Book Allocation Formula for Academic and Public Libraries with a Test for Its
Effectiveness." Library Resources & Technical Services 19, no. 4 (Fall 1975): 356-69.
Mulliner, Kent. "The Acquisition Allocation Formula at Ohio University." Library Acquisitions: Practice
& Theory 10, no. 4 (Winter 1986): 315-27.
Niemeyer, Mollie, et at. "Balancing Act for Library Materials Budgets: Use of a Formula Allocation."
Technical Services Quarterly 11, no. 1 (1993): 43-60.
Packer, Donna. "Acquisitions Allocations: Equity, Politics, and Formulas." Journal of Academic
Librarianship 14, no. 5 (Nov. 1988): 276-86.
Rein, Laura O., et al. "Formula-Based Subject Allocation: A Practical Approach." Collection
Management 17, no. 4 (1993): 25-48.
Richards, James H., Jr. "Allocation of Book Funds in College Libraries." College & Research
Libraries 14, no. 4 (Oct. 1953): 379-80.
Schmitz-Veltin, Gerhard. "Literature Use As a Measure for Funds Allocation." Trans. John J. Boll.
Library Acquisitions: Practice & Theory 8, no. 4 (Winter 1984): 267-74.
Scudder, Mary C. "Using Choice in an Allocation Formula in a Small Academic Library." Choice 24,
no. 10 (June 1987): 1506-11.
Tuten, Jane H., and Beverly Jones, eds. Allocation Formulas in Academic Libraries, CLIP Note 22
(Chicago: Association of College & Research Libraries, 1995).
Welwood, Ronald J. "Book Budget Allocations: An Objective Formula for the Small Academic
Library." Canadian Library Journal 34, no. 3 (June 1977): 213-19.
Willmert, John Allen. "College Librarians and Professors: Partners in Collection Building and Fund
Allocation." In Academic Libraries: Myths and Realities, ed. Suzanne C. Dodson and Gary L.
Menges, 293-97 (Chicago: Association of College & Research Libraries, 1984).
Appendix B. Data Used in the Example Analyses
Variable
Department a t c e
African Studies 3.05 232 9 85

Anthropology 2.50 678 23 415
Asian Studies 2.33 1004 24 0
Biology 4.73 1122 34 603
Canadian Studies 3.17 244 10 77
Chemistry 5.28 248 19 412
Economics 4.33 1082 29 1045
Education 2.27 946 88 1169
English 6.47 2432 62 1270
Environmental Studies 2.82 886 36 430
Fine Arts 7.04 1028 41 653
French 2.50 130 10 213
Gender Studies 2.96 662 27 164
Geology 3.96 160 22 241
German 2.00 164 8 76
Global Studies 5.00 694 56 344
Government 6.33 1528 33 1035
History of Science 1.00 280 0 0
History 4.75 2388 54 794
Italian 0.58 50 2 52
Japanese 1.96 54 2 35
Latin American Studies 0.90 326 25 63
Mathematics 2.04 1082 43 1321
Music 3.30 458 30 294
Philosophy 1.67 688 24 377
Physics 1.29 514 17 231
Psychology 2.92 546 27 1302
Religious Studies 4.17 1076 17 403
Russian 0.34 132 0 0
Sociology 3.54 1926 32 686
Spanish 2.50 124 11 267
Speech and Theatre 1.46 576 37 452
Sports and Athletics 0.83 162 21 442
Variable
Department h f m p
African Studies 0 0 0 54.02

Anthropology 3 4 11 50.18
Asian Studies 1 0 0 49.72
Biology 29 11 91 50.43
Canadian Studies 0 3 1 39.80
Chemistry 3 6 15 68.64
Economics 10 9 95 56.91
Education 1 13 48 34.86
English 15 21 118 50.52
Environmental Studies 3 5 40 47.35
Fine Arts 1 7 43 72.02
French 1 4 1 43.82
Gender Studies 0 1 0 39.60
Geology 6 5 24 76.43
German 0 2 2 38.02
Global Studies 3 5 16 62.69
Government 26 10 99 52.90
History of Science 0 0 0 50.86
History 3 10 75 45.63
Italian 0 1 0 42.98
Japanese 0 1 0 47.64
Latin American Studies 0 0 0 58.61
Mathematics 20 12 71 55.93
Music 1 3 8 48.10
Philosophy 3 4 5 49.32
Physics 1 5 11 50.60
Psychology 26 12 146 57.83
Religious Studies 1 4 12 34.36
Russian 0 0 0 49.13
Sociology 11 9 48 49.15
Spanish 3 4 15 75.97
Speech and Theatre 1 7 23 50.19
Sports and Athletics 0 8 0 48.53
Notes: Each row represents a particular academic department. Variable

a is the previous year's fund allocation-the percentage of the firm
order budget allocated for materials acquired in support of each
department or program during the 2004-05 academic year. See table 1
for descriptions of the other variables.
I am grateful for the advice and assistance of Bart Harloe, Esther Isabelle Wilder, and the
anonymous referees.
Submitted September 16, 2006; tentatively accepted pending revision November 11, 2006; revised
and resubmitted December 7, 2006, and accepted for publication.
References
(1.) William M. Randall, "The College-Library Book Budget," Library Quarterly 1, no. 4 (Oct. 1931):
421-35.
(2.) Association of Research Libraries, The Allocation of Materials Funds in Academic Libraries,
SPEC Kit 36 (Washington, D.C.: Association of Research Libraries, 1977).
(3.) Jane H. Tuten and Beverly Jones, eds., Allocation Formulas in Academic Libraries, CLIP Note
22 (Chicago: Association of College & Research Libraries, 1995).
(4.) Edward Shreeves, ed., Guide to Budget Allocation for Information Resources, Collection
Management and Development Guides, no. 4 (Chicago: ALA, 1991); Tuten and Jones, Allocation
Formulas in Academic Libraries.
(5.) Mollie Niemeyer et al., "Balancing Act for Library Materials Budgets: Use of a Formula
Allocation," Technical Services Quarterly 11, no. 1 (1993): 43-60.
(6.) Charles M. Baker, "Apportioning of College and University Library Book Funds," Library Journal
57, no. 4 (Feb. 15, 1932): 166-67; Randall, "The College-Library Book Budget."
(7.) Floyd W. Reeves and John Dale Russell, "The Administration of the Library Budget," Library
Quarterly 2, no. 3 (July 1932): 268-78.
(8.) Baker, "Apportioning of College and University Library Book Funds"; John M. Budd, "Allocation
Formulas in the Literature: A Review," Library Acquisitions: Practice & Theory 15, no. 1 (Spring
1991): 95-107; E Landon Greaves Jr., The Allocation Formula As a Form of Book Fund
Management in Selected State-Supported Academic Libraries (Ph.D. dissertation, Florida State
Univ., 1974).
(9.) Bette Dillehay, "Book Budget Allocation: Subjective or Objective Approach," Special Libraries 62,
no. 12 (Dec. 1971): 509-14; David C. Genaway, "PBA: Percentage Based Allocation for Acquisitions:
A Simplified Method for the Allocation of the Library Materials Budget," Library Acquisitions: Practice
& Theory 10, no. 4 (Winter 1986): 287-92.
(10.) John M. Budd and Kay Adams, "Allocation Formulas in Practice," Library Acquisitions: Practice
& Theory 13, no. 4 (Winter 1989): 381-90; Geoffrey Ford, "Finance and Budgeting," in Collection
Management in Academic Libraries, ed. Clare Jenkins and Mary Morley, 21-56 (Brookfield, Vt.:
Gower, 1991); Tuten and Jones, Allocation Formulas in Academic Libraries.
(11.) Association of Research Libraries, The Allocation of Materials Funds.
(12.) Donna Packer, "Acquisitions Allocations: Fairness, Equity, and Bundled Pricing," portal:
Libraries and the Academy 1, no. 3 (July 2001): 209-24.
(13.) David C. Genaway, "The Q Formula: The Flexible Formula for Library Acquisitions in Relation
to the FTE Driven Formula," Library Acquisitions: Practice & Theory 10, no. 4 (Winter 1986): 293-
306; James H. Richards Jr., "Allocation of Book Funds in College Libraries," College & Research
Libraries 14, no. 4 (Oct. 1953): 379-80; Mary C. Scudder, "Using Choice in an Allocation Formula in
a Small Academic Library," Choice 24, no. 10 (June 1987): 1506-11; Tuten and Jones, Allocation
Formulas in Academic Libraries.
(14.) Tuten and Jones, Allocation Formulas in Academic Libraries.
(15.) Budd and Adams, "Allocation Formulas in Practice."
(16.) Charles W. Brownson, "Modeling Library Materials Expenditure: Initial Experiments at Arizona
State University," Library Resources & Technical Services 35, no. 1 (Jan. 1991): 88.
(17.) Michael S. Freeman, "Allocation Formulas As Management Tools in College Libraries: Useful
or Misapplied?" in Collection Development in College Libraries, ed. Joanne Schneider Hill, William E.
Hannaford Jr., and Ronald H. Epp, 71-77 (Chicago: ALA, 1991), 75.
(19.) Baker, "Apportioning of College and University Library Book Funds"; Donald Coney, "An
Experimental Index for Apportioning Departmental Book Funds for a University Library," Library
Quarterly 12, no. 3 (July 1942): 422-28; Ralph E. Ellsworth, "Some Aspects of the Problem of
Allocating Book Funds among Departments in Universities," Library Quarterly 12, no. 3 (July 1942):
486-94; Genaway, "The Q Formula"; William E. McGrath, Ralph C. Huntsinger, and Gary R. Barber,
"An Allocation Formula Derived From a Factor Analysis of Academic Departments," College &
Research Libraries 30, no. 1 (Jan. 1969): 51-62; Jasper G. Schad, "'Allocating Materials Budgets in
Institutions of Higher Education," Journal of Academic Librarianship 3, no. 6 (Jan. 1978): 328-32;
Reeves and Russell, "The Administration of the Library Budget"; Shreeves, Guide to Budget
Allocation; Norman E. Tanis, "The Departmental Allocation of Library Book Funds in the Junior
College: Developing Criteria," Library Resources & Technical Services 5, no. 4 (Fall 1961): 321-27.
(20.) Greaves, The Allocation Formula.
(21.) Coney, "An Experimental Index"; McGrath et al., "An Allocation Formula"; Schad, "Allocating
Materials Budgets"; Peter Sweetman and Paul Wiedemann, "Developing a Library Book-Fund
Allocation Formula," Journal of Academic Librarianship 6, no. 5 (Nov. 1980): 268-76; Tanis, "The
Departmental Allocation."
(22.) Sweetman and Wiedemann, "Developing a Library Book-Fund Allocation Formula."
(23.) Anish Arora and Diego Klabjan, "A Model for Budget Allocation in Multi-Uuit Libraries," Library
Collections, Acquisitions & Technical Services 26, no. 4 (Winter 2002): 42338; Dennis P. Carrigan,
"Improving Return on Investment: A Proposal for Allocating the Book Budget," Journal of Academic
Librarianship 18, no. 5 (Nov. 1992): 292-97; Neale S. Grunstra, The Development of Library
Resource Allocation Procedures in Higher Education Based on the Analysis of Materials Utilization
(Ph.D. dissertation, Univ. of Pittsburgh, 1976); Andrea C. Hoffman, An Investigation of a Predicted
Use Approach to Resource Allocation, D.A. field study, Simmons College, 1978; William E. McGrath,
"A Pragmatic Book Allocation Formula for Academic and Public Libraries With a Test for Its
Effectiveness," Library Resources & Technical Services 19, no. 4 (Fall 1975): 356-69.
(24.) Carrigan, "Improving Return on Investment," 293.
(25.) Hans-Jorg Blochliger, "Exploratory Framework," in The Contribution of Amenities to Rural

Development, ed. Hans-Jorg Blochliger, 7-21 (Paris: Organisation for Economic Co-operation and
Development, 1994); Douglas B. Diamond Jr. and George S. Tolley, "The Economic Roles of Urban
Amenities," in The Economics of Urban Amenities, ed. Douglas B. Diamond Jr. and George S.
Tolley, 3-54 (New York: Academic Pr., 1982).
(26.) McGrath et al., "An Allocation Formula."
(27.) Dennis Child, The Essentials of Factor Analysis (New York: Holt, Rinehart, and Winston, 1970);
Jae-On Kim and Charles W. Mueller, Factor Analysis: Statistical Methods and Practical Issues,
Quantitative Applications in the Social Sciences 14 (Beverly Hills, Calif.: Sage Publ., 1978); Jae-On
Kim and Charles W. Mueller, Introduction to Factor Analysis: What It Is and How to Do It,
Quantitative Applications in the Social Sciences 13 (Beverly Hills, Calif.: Sage Publ., 1978); Paul
Kline, An Easy Guide to Factor Analysis (New York: Routledge, 1994).
(28.) Tuten and Jones, Allocation Formulas in Academic Libraries.
(29.) Ian R. Young, "A Quantitative Comparison of Acquisitions Budget Allocation Formulas Using a
Single Institutional Setting," Library Acquisitions: Practice & Theory 16, no. 3 (Autumn 1992): 229-
42.
(30.) Association of Research Libraries, The Allocation of Materials Funds, 9.
(31.) Eugene Bourgeois et al., "Faculty-Determined Allocation Formula at Southwest Texas State
University," Collection Management 23, no. 1/2 (1998): 113-23; Brownson, "Modeling Library
Materials Expenditure"; Lisa B. German and Karen A. Schmidt, "Finding the Right Balance: Campus
Involvement in the Collections Allocation Process," Library Collections, Acquisitions & Technical
Services 25, no. 4 (Winter 2001): 421-33.
(32.) Genaway, "The Q Formula"; Charles B. Lowry, "Reconciling Pragmatism, Equity, and Need in
the Formula Allocation of Book and Serial Funds," College & Research Libraries 53, no. 2 (March
1992): 121-38.
(33.) Lowry, "Reconciling Pragmatism," 126.
(34.) S. K. Goyal, "Allocation of Library Funds to Different Departments of a University: An

Operational Research Approach," College & Research Libraries 34, no. 3 (May 1973): 219-22.
(35.) Ibid., 220.
(36.) Steven D. Gold, "Allocating the Book Budget: An Economic Model," College & Research
Libraries 36, no. 5 (Sept. 1975): 397-402.
(37.) Joseph J. Kohut, "Allocating the Book Budget: A Model," College & Research Libraries 35, no.
3 (May 1974): 19299; Joseph J. Kohut and John E Walker, "Allocating the Book Budget: Equity and
Economic Efficiency," College & Research Libraries 36, no. 5 (Sept. 1975): 403-10.
(38.) Kenneth Wise and D. E. Perushek, "Goal Programming As a Solution Technique for the
Acquisitions Allocation Problem," Library & Information Science Research 22, no. 2 (June 2000):
165-83; Kenneth Wise and D. E. Perushek, "'Linear Goal Programming for Academic Library
Acquisitions Allocations," Library Acquisitions: Practice & Theory 20, no. 3 (Autumn 1996): 311-27.
(39.) Thomas John Pierce, The Economics of Library Acquisitions: A Book Budget Allocation Model
for University Libraries (Ph.D. dissertation, Univ. of Notre Dame, 1976).
(40.) Brownson, "Modeling Library Materials Expenditure."
(42.) Ibid., 388.
(43.) McGrath et al., "An Allocation Formula," 60.

(44.) Shreeves, Guide to Budget Allocation.
(45.) Sweetman and Wiedemann, "Developing a Library Book-Fund Allocation Formula."
(46.) Christopher H. Achen, Interpreting and Using Regression, Quantitative Applications in the
Social Sciences 29 (Beverly Hills, Calif.: Sage Publ., 1982); William D. Berry,
Understanding Regression Assumptions, Quantitative Applications in the Social Sciences 92
(Newbury Park, Calif.: Sage Publ., 1993); Leo H. Kahane, Regression Basics (Thousand Oaks,
Calif.: Sage Publ., 2001); Michael S. Lewis-Beck, Applied Regression: An Introduction, Quantitative
Applications in the Social Sciences 22 (Beverly Hills, Calif.: Sage Publ., 1980).
(47.) Ibid.
(48.) Kent Mulliner, "The Acquisition Allocation Formula at Ohio University," Library Acquisitions:
Practice & Theory 10, no. 4 (Winter 1986): 315-27.
William H. Waiters (william.waiters@millersville.edu) is a Librarian at Millersville (Pa.) University, As

of November 2007, he will be Dean of Library Services at Menlo College in Atherton, Calif.
Table 1. Variables most often used in fund allocation formulas
Data source
Budd and
Variable Greaves (%) Adams (%)
Course enrollment (students or credit hours) 56 84

Cost of library materials in subject area 33 61
Number of faculty 31 50
Number of majors, minors, graduate students -- 24
Circulation of materials within subject area 19 40
Number of courses offered 31 32
Number of titles published in subject area -- 13
Extent to which courses require library use -- --
Type or level of programs offered -- -5
Number and level of degrees awarded -- -5
Scholarly activity of faculty 6 --
Previous years' allocations or expenditures 20 8
Interlibrary loan activity -- -5
Adequacy of library collection 20 -5
Data source
Tuten and Content

Variable Jones (%) analysis (%)
Course enrollment (students or credit hours) 53 87

Cost of library materials in subject area 62 76
Number of faculty 54 55
Number of majors, minors, graduate students 41 36
Circulation of materials within subject area 51 33
Number of courses offered 33 29
Number of titles published in subject area 26 20
Extent to which courses require library use -- 18
Type or level of programs offered 43 13
Number and level of degrees awarded -- 11
Scholarly activity of faculty -- 9
Previous years' allocations or expenditures -- 9
Interlibrary loan activity -- 5
Adequacy of library collection -- 4
Note: Numbers indicate the percentage of formulas that incorporate

each variable. See appendix A for further information.
Table 2. Correlations among the variables considered for inclusion in

the fund allocation formula
t c e h f m p
t -- 0.67 0.66 0.46 0.70 0.66 -0.12

c 0.67 -- 0.71 0.29 0.73 0.53 -0.02
e 0.66 0.71 -- 0.70 0.91 0.89 0.08
h 0.46 0.29 0.70 -- 0.66 0.85 0.18
f 0.70 0.73 0.91 0.66 -- 0.84 0.06
m 0.66 0.53 0.89 0.85 0.84 -- 0.15
p -0.12 -0.02 0.08 0.18 0.06 0.15 --
Note: Correlations with absolute values greater than 0.80 are shown
in bold.
Figure 1. Variables considered for inclusion in a regression-based fund

allocation formula
Variable Definition
t Estimated number of relevant titles published in the subject

area each year. Equal to twice the number of approval plan
books and slips received over a 26-week period (May 14 to
November 5, 2003). Includes items not kept or purchased.
Excludes interdisciplinary and multidisciplinary titles.
c Number of distinct courses offered in the Fall 2004 semester

plus the number of distinct courses offered in the Spring
2005 semester. Excludes non-credit courses and courses
without scheduled meeting times.
e Total enrollment in courses offered by the department or

program, 2004-05 academic year; the sum of individual course
enrollments. Courses formally sponsored by more than one
department are attributed partly (equally) to each
sponsoring department.
h Number of senior projects and master's theses submitted in

the 2002-03, 2003-04, and 2004-05 academic years.
f Number of regular faculty positions plus one-fourth the

number of adjunct instructors and other part-time academic
staff not on the faculty list, 2004-05 academic year.
m Number of undergraduate majors and graduate students in the

department, Fall 2004. Students with more than one major are
counted more than once. Students registered for joint
majors (Economics & Mathematics, Environmental Studies &
Biology, etc.) are counted partly (equally) for each
department.
p Estimated price per title in the relevant subject area,

2004-05 academic year. Based on approval plan data for May
2003 to November 2003, inflated by 3 percent.
Walters, William H.
10. Constructing analysis of variance (ANOVA), Mathew Mitchell, Journal of Computers in

Mathematics and Science Teaching
Spreadsheets can be a valuable tool for helping students construct a deeper understanding of
statistical concepts. This investigation incorporated active-learning into the structure of an
intermediate-level statistics course for doctoral students in education. Active-learning was
implemented primarily by having students create learning playgrounds, which would instruct a novice
about specific statistical concepts. This format led these doctoral-level education participants to
become more fully involved in the process of mathematical storytelling with the beneficial
consequence of a richer understanding of key analysis of variance concepts and techniques.
**********
One of the biggest problems in statistics education is that students tend to suffer from inert
knowledge. Whitehead (1929) first used this term to describe knowledge, which can usually be
recalled by students when explicitly asked to do so, but it is not used spontaneously in problem
solving even though the knowledge is relevant. A new graduate student in the social sciences may
take only two classes in statistical analysis and research design. A year or so later the student will
then be expected to make use of that knowledge, either as a research assistant on a project or in
their own research study. All too often students respond to these applied situations as if they had
never taken a statistics or research design course. Although they likely remember key statistical
concepts, the process of taking data and transforming it into meaningful information is frustrating and
frightening to many. Of course this description is a general "best case" scenario. Using statistics
goes beyond simply remembering knowledg e. Some of the ideas in statistics, and especially their
connection to research design, are often never well understood by students.
In addressing the general problem of inert knowledge, Dewey made the case for an active-learning
approach in which students are learning new material for some practical project or need in the
present. As Dewey put it:
When preparation is made the controlling end, then the potentialities of the present are sacrificed to
a suppositious future. When this happens, the actual preparation for the future is missed or distorted.
The ideal of using the present simply to get ready for the future contradicts itself. It omits, and even
shuts out, the very conditions by which a person can be prepared for his future. We always live at
the time we live and not at some other time, and only by extracting at each present time the full
meaning of each present experience are we prepared for doing the same thing in the future. This is
the only preparation which in the long run amounts to anything. (1963, p. 49)
Benware and Deci (1984) hypothesized that active-learning where the student is "...learning material
to teach it will lead to enhanced learning and to a more positive emotional tone than learning
material to be tested on it, even when the amount of exposure to the material being learned is the
same" (p. 756). Benware and Deci built their study upon the theoretical approaches of Bruner (1966)
and Rogers (1969) who both suggested that students learn better if the content of the instruction is
useful for a task they are undertaking. The "activity" would in turn result in a fuller engagement of the
material. The logic behind this line of thinking is fairly simple: students approach the material with the
anticipation of using it, so they become more fully involved.
This article presents an investigation into an active-learning approach to statistics through the use
of spreadsheet software. The focus of the intermediate-level statistics course used in this study was
on learning analysis of variance (ANOVA) techniques. In the past this course had been taught
through a combination of lecture, computer-based activities with SPSS, and structured activities with
Excel. The instructor (author) hypothesized that increasing the level of active-learning in the course
through problem-based challenges would result in greater learning and higher student motivation.
Indeed, previous research has indicated that active-learning approaches can be quite effective
(Benware & Deci, 1984; Brophy & Alleman, 1991; Kafai, 1995; Mitchell, 1993; Mitchell, 1997).
Excel is a powerful spreadsheet software program that allows individuals to do more than simply
crunch numbers. Through Excel's open-endedness, the ability to incorporate conceptual formulas,
the use of intuitive naming of cells and arrays, dual coding features, and Excel's design tools,
students can potentially create rich educational products. This full-featured software program
appeared to offer a great way to pragmatically implement an active-learning curriculum. Specifically,
students actively learned statistical concepts by creating learning playgrounds that would instruct a
novice about particular statistical techniques. This kind of learning challenge was hypothesized to
result in greater learning for the "constructors," or students, in the course.
In the present investigation, active-learning was used primarily for out-of-classroom activities that
students completed. The classroom experience itself used instructor-led multimedia presentations,
discussions, and computer-based examples. Indeed, the in-class experience could be described as
being teacher-led because there was relatively little opportunity for true active-learning to take place.
Instead it was the way students spent engaging with the course material outside of the classroom
that incorporated the active-learning challenges.
Two studies previous to Benware and Deci tested the active-learning hypothesis and found positive
indications that active-learning is effective (Bargh & Schul, 1980; Zajonc, 1960). However both of
those studies used very short treatment periods. Benware and Deci's study represented the first
systematic attempt to test the "learning-to-teach" hypothesis using a reasonable treatment period of
three hours. Their results confirmed that students under the experimental "active" condition learned
both rote and conceptual material significantly better than the control group. Just as importantly, the
Benware and Deci study incorporated motivational variables and found that students in the
experimental condition found the process of learning more interesting and enjoyable than those in
the control group.
More recently, Mitchell (1997) described a classroom learning environment in which students
learned about a computer-based approach to learning statistics through Microsoft Excel. The
students in that study created educational worksheets. Mitchell highlighted five factors that helped
explain the advantages of spreadsheet software in creating an active-learning based environment
including creating multiple representations of statistical measures, making "number playgrounds,"
incorporating story lines, and the opportunity for student creativity. The result of making such
products resulted in a type of active-learning in which students' thinking about basic statistical
concepts grew much richer through the development of these products.
The present study sought to build upon the Benware and Deci (1984) and Mitchell (1997) studies
with three important enhancements. First, like Mitchell, the study covers a time period of one
semester as opposed to the three hour treatment used by Benware and Deci. Second, like Mitchell,
the study used statistics as the content area, a subject that many students find difficult to understand
and motivationally unappealing. However, unlike Mitchell, this study takes place in the context of a
regular course focusing on statistical concepts rather than using a course that is primarily
computerbased in its content. Thus this study hoped to address the more realistic concern of
integrating technology into regular content courses. Finally the focus of this study is on a content
analysis of students' products: did these products demonstrate significant understanding of the
course concepts?
The challenge faced by intermediate-level students in this study was to create learning playgrounds
that would instruct novice doctoral students about analysis of variance techniques. Students were
told by the instructor that the best products resulting from the class would be used as learning aids in
his beginning level statistics courses in the future. Since all the students were involved in education
at the K-12 or higher education levels, there was the additional incentive for them to take advantage
of their educational expertise to create a compelling set of learning playgrounds. This active-learning
project took place within the context of using spreadsheet software as their instructional creation
tool. The benefits of using spreadsheets, specifically Excel, to accomplish these goals was crucial.
THE POWER OF SPREADSHEETS
In this course spreadsheets were used as a pedagogical tool. For the purposes of statistical analysis
only a software program such as SPSS or SAS would be much better choice. Excel is more akin to a
numbers-based LOGO program. Just as Kafai (1995) and Papert (1980) had taken advantage of the
relatively simple programming language called LOGO to teach elementary level kids about
mathematics, so in this study Excel was used to help students learn about statistical concepts in a
hands-on manner. Kafai's study was particularly relevant because she spent six months helping a
fourth-grade class learn how to use LOGO. Those students had a special challenge: to create a
learning environment that would teach younger students at their school about fractions. Kafai found
significant results in both learning and motivation when students were faced with such a concrete
and worthy challenge. Many students went to painstaking lengths to create LOGO products that
were "special." The final products created by these students w ere truly impressive.
A number of researchers have used spreadsheets as tools for exploring mathematical concepts. For
instance, Dugdale (1998) discussed a class in which spreadsheets proved to be a highly effective
tool for middle-grade students learning about sequences and series. Similarly Abramovich (1995),
Abramovich & Nabors (1998), and Sutherland & Rojano (1993) have provided several examples of
using spreadsheets to enhance mathematical understanding with regards to word problems and
algebra. In each of these cases the open-ended nature of spreadsheets was used to advantage so
novices could effectively explore a mathematical concept. Spreadsheets have also been promoted
as a valuable tool for learning statistical concepts (e.g., Arganbright, 1992; Bakeman, 1992; Piele,
1990).
On the other hand, not all researchers are enthralled with spreadsheets per se. For instance,
Connell (1998) wrote:
It is not surprising that technology's role has often been to replace the labors of computation while
offering little for developing student problem...spreadsheets...suffer from deficiencies when applied
to classroom instruction. The problem is that they remain a black box to the elementary student with
only the outcome being visible. The methods of solution which lead to this answer and rationale
remain invisible to the elementary student--thus weakening their potential applicability.
Connell's critique is right on the mark. Spreadsheets offer the potential to enhance a learner's
problem solving skills and conceptual understanding, but if not implemented wisely spreadsheets
can easily turn into a mechanical time-saving device that does nothing to challenge and enhance
student understanding. Towards this end, great care was taken to frame the instructional challenges
in such a manner that the pedagogical benefits of interacting with Excel would be emphasized, while
minimizing the mechanical aspects.
To the novice, spreadsheets can seem daunting. In fact spreadsheets can appear very similar to the
LOGO programming language. As Excel has developed over the years it has kept all the power of its
bare-bones formuladriven approach but it has also incorporated features that make the product more
friendly by allowing users to easily incorporate color, buttons, sounds, and other graphical or
multimedia elements into their spreadsheets. From the perspective of the doctoral students in this
study, these user-friendly features were a decided bonus. However, the major pedagogical
improvement to Excel over the years has been in the "guts" of the program which now allows users
to name cells and arrays. The consequence of this improvement is that the resulting formulas can
look and read in a more intuitive manner. Naming, along with the Excel's spreadsheet design tools,
make the program much more approachable to novice users
There are five primary features that contribute to Excel's power as a pedagogical tool. Those
features include: the value of spreadsheets as open-ended tools, the importance of conceptual
formulas, the usefulness of intuitive naming, the ability for dual-coding, and the variety of design
tools that a program like Excel provides users. Each of these five features is discussed briefly in the
next section.
OPEN-ENDED TOOLS
The power of a spreadsheet program as a pedagogical tool lies in its open-endedness. The program
itself gives the user nothing except a workbook filled with empty cells. This is much like the "look" of
entering a programming software program or a blank piece of paper. The onus, therefore, is on the
student to construct a worksheet which demonstrates their ability to develop a step-by-step way to
calculate a particular measure.
Excel offers a wide variety of built-in functions that can be easily accessed either from the "insert
function" command or from the "analysis toolpak" add-in. Using these two sources of support, an
individual can easily have Excel calculate means, standard deviations, t-tests, and fairly
complex ANOVA tests with the push of a button. However, students in the course were not allowed
to use any of these functions or add-ins beyond what were defined as six core functions. The
functions they were allowed to use included: Sum, Average, Count, CountA, DevSq, and Sqrt. All
other calculations had to be built on the foundation of these core functions. It also should be noted
that the DevSq function (which would automatically calculate the sum of squares for an array of
numbers) was not introduced until about 30% of the way through the course. Thus this function was
provided to students as a time-saving device for calculating complex ANOVA designs after they had
already demonstrated the ability to calculate sum of squa res using the more basic commands.
While Excel had many advantages from the point-of-view of the instructor, this open-endedness
needed to be sold to students as a benefit because the program appears so ugly initially. Students
needed to understand that Excel doesn't "do" but it "makes." Towards this end students were
provided with examples of what Excel can do through some simple instructor-made learning
playgrounds, The basic learning playground structure they saw was similar in design to
the ANOVAplayground in Figure 1. There are a few essential elements of this sample playground
that needed to be incorporated into the subsequent student-made learning playgrounds. First, the
purpose of a playground is to serve a pedagogical end so that the data set entered should not be
very large. A small data set allows novices to better understand the contribution of each score to the
final measure or test. Second, a learning playground provides the user with both an analytic pathline
and a graph. The analytic pathline is simply a way of making the step s involved in calculating a
measure transparent to the user. The graph is used as a complementary way to help students
visualize some aspect of the measure. Third, users were encouraged to engage in the learning
playground by changing the raw scores. In the case of Figure 1 there were challenges which were
included on a separate piece of paper. For example one challenge asked students if they could
create a set of scores involving three groups in which there is "a significant Between groups effect
with [eta.sup.2] between 0.15 and 0.25 and the homogeneity of variances assumption is met." By
having a model of a basic learning playground students were better able to conceptualize how Excel
might be used to create meaningful products.
Conceptual Formulas
One of the conundrums in basic statistics texts has been how to present formulas. Through
algebraic transformations the same formula can have very different looks. While these algebraic
transformations may rightly be seen as relatively trivial to an expert, the novice sees these different
formats as being entirely different concepts. More importantly, some equation formats are more
intuitive to students because they are directly connected to the concept. Many transformed formulas
have become standard use because they historically represented an easier way to calculate the
concept through the use of a calculator or by hand. While the days of hand calculation or basic
calculators have gone, some of these raw score formulas continue to be used. Shavelson (1996)
provided an excellent discussion of deviation versus raw score formulas as well as showing how the
two are algebraically equivalent (p. 108).
Let's consider an example. A fundamental concept in statistics (and especially central to the course
in this study) is that of sum of squares. This measure represents the sum of all the squared deviation
scores in a group. The idea of sum of squares (or SS) is used extensively in ANOVAtests, which
essentially take advantage of different methods for partitioning the total sums of squares into
separate components (e.g., [SS.sub.within] and [SS.sub.between] a simple one-way ANOVA).
The raw score formula for SS (Shavelson, 1996) is not linked to the conceptual idea. In fact this
formula does not include the mean. Yet, for beginning students, they understand that deviation
scores are central to calculating SS. Thus for a novice, the representation for a deviation score
should show up somewhere in the SS formula. Alternatively, the deviation score formula is much
more intuitive and meaningful to students. The formula says that SS is the result of summing up all
of the squared deviation scores in a group. The key is that the notion of "deviation scores" and their
relationship to SS are built into the formula itself.
Admittedly if one were stranded on an island with only a seventies-style calculator and a large group
of numbers then the raw score approach would be easier to calculate than the deviation score
approach. However, in a spreadsheet program it is easy to incorporate the deviation score formula
that reinforces the conceptual structure of SS.
Intuitive Naming
The power of spreadsheets lies in the ability to write user-created formulas that can calculate a wide
variety of mathematical measures. For the novice, however, it can appear that they are learning two
languages at once: statistical formulas and spreadsheet language. The newer versions of
spreadsheets have helped to bridge this gap through the ability to name cells and arrays.
Only a few years ago the formula for the mean of a group of scores would have looked something
like "=average(A4:A12)." With the newer versions of Excel, however, a student could have named
the A4:A12 array as "girls" and so the subsequent formula for the mean of this group of scores would
be "=average(girls)". This ability to name individual cells or groups of cells (arrays) results in more
intuitive formula creation by students.
Dual Coding
Many novices have problems "seeing" or visualizing key statistical concepts. It's often helpful for
students to have a graph that allows them to get a better sense of what's going on with the data. A
typical example would be the use of a scatter plot to get a visual snapshot of whether there is a
relationship (linear or otherwise) between two variables. In this study the most useful graph for
students was a simple pie graph which could dynamically display the partitioned sum of squares in a
specific ANOVA design. While graphs themselves have been used productively for a long time,
spreadsheets offer the possibility of dynamic graphs which automatically change as the raw scores
are altered. This provides the user with quick visual information about how the change in a score
contributes to changes in the bottom-line measures of interest such as the partitioned sum of
squares.
There is increasing research support for the idea that learners better understand material when there
is an effective combination of text (or numbers), sound, and images (e.g., Mayer, 2001 for an
extensive discussion). In a modest way the students in this course sought to take advantage of the
link between analytic numbers and a visual representation to create better learning.
Design Tools
Finally, the fifth area that contributes to the power of Excel lies in its design tools. At its simplest level
a spreadsheet program can consist only of cells and the ability to write formulas. One of the
advantages of Excel is the great variety of educationally-relevant design tools that it offers the user.
Those design tools include pop-up notes, buttons, URL links, inclusion of movies and sound,
accounting arrows, and other tools. Each of these tools provides flexibility and power to a student's
ability to create an educational product using Excel.
CONTEXT AND PARTICIPANTS
All of the participants in this study were students in an intermediate-level doctoral education course
in statistics. The students had already taken general courses in research methods and introductory
statistics. None of the students had any significant previous experience using the Excel program. Of
the eight students, three were K-6 grade level teachers, one a high school teacher, and four were
higher education instructors (in management, nursing, and the social sciences). Seven of the
students were female and one male.
The use of a spreadsheet program for creating learning playgrounds was new to all of the
participants. Two of the participants had seen instructor-made learning playgrounds in a previous
introductory statistics course. However, in the first class all participants were familiarized with the
instructor's set of prebuilt playgrounds and the ways in which the expectations of the course were
different from simply emulating the instructor's past work.
This study describes how students' understanding, as evidenced by their final learning playgrounds,
demonstrated conceptual growth. All students completed four basic learning playgrounds and two
special learning playgrounds over the period of one semester. The challenge of creating a learning
playground was straightforward: "imagine you are creating a learning product for first-year doctoral
students in a beginning level statistics course." The context for this challenge was given in their
syllabus as:
Statistics, like great philosophical storytelling, is essentially about high-powered exciting ideas. In a
beginning statistics course it is primarily the instructor that does the storytelling. In this course you
will be equally responsible for the storytelling thorough the creation and sharing of learning
playgrounds. The major course outcome is the development of a spreadsheet portfolio
demonstrating through story, graphs, and spreadsheet savvy your understanding of the
core ANOVA techniques. You will choose two of the four ANOVA spreadsheets to re-package as an
effective electronic learning tool for future students. For this outcome you can stretch your own
creativity, educational expertise, and attention to detail to create two compelling learning
experiences for future students. In the final class session you will have the opportunity to present
and explain one of your e-learning products to your colleagues.
While students in the course had many questions about the final products, they had a basic idea of
what could be accomplished through the instructor-developed model playgrounds. It was equally
clear to them that the expectation was that their products would be more fully-rounded than the "bare
bones" playgrounds developed by the instructor.
The biggest problem for the instructor was developing an instructional format that would allow
students to learn key skills in Excel without taking up an excessive amount of in-class time. Towards
that end the following components of the course were structured to support student learning of
Excel: classroom support, student guide and CD, and e-mail. Each of those components is briefly
described below.
Classroom. General motivation was not a problem with this small group of students. However, most
of them were not computer savvy, and none were Excel savvy. This provided a challenge in terms of
structuring the course: after all the course was about ANOVA designs, not about computers. The
first class meeting was used as a three-hour workshop on how to use Excel. In addition,
approximately 20-30 minutes was spent during each of the subsequent 4.5 hour class meetings to
address Excel questions and provide instruction on more refined Excel skills. In this way the
instructor hoped to balance the needs of students for in-class Excel support while maintaining a
primary focus on learning new statistical techniques.
Guide and CD. While many Excellent books exist about how to learn Excel, none of them addressed
the key features of Excel that students needed to learn aligned to the specific challenges students
faced in the course. To help remedy that problem the instructor created a 49-page guide to Excel
specifically designed to address the core spreadsheet skills they would need. In addition, students
were presented with a CD at the beginning of the course that contained QuickTime movies showing
students how to do specific procedures in Excel. QuickTime movies allowed the instructor to show a
procedure rather than write about it. The CD also contained additional support materials.
48 hours and e-mail. Students were told that if they e-mailed the instructor with a question they
would receive an answer within 48 hours, but that usually they would receive a response within 24
hours. While some Excel questions could be adequately addressed by way of text, others
necessitated creating additional QuickTime screen movies which could be sent as an e-mail
attachments to students. Through these support mechanisms it was hoped that complete computer
novices in the class would be provided with adequate support in learning Excel.
LEARNING PLAYGROUNDS
Students made four learning playgrounds and then selected two of these to "make special." The idea
behind the "special" playgrounds was that it gave students a culminating opportunity to bring to bear
the full scope of their statistical knowledge as well as their full abilities in working with Excel to create
an effective learning playground product. For instance, their original one-way ANOVAlearning
playground was made early in the semester when their understanding of key ANOVAconcepts and
their skills with Excel were at the lowest. By being able to "make special" their one-
way ANOVA playground later in the course, students were given the opportunity to demonstrate a
refined understanding of both the statistical technique and a more skilled use of employing Excel to
create a learning environment.
The four learning playgrounds they needed to create were a one-way ANOVA, factorial ANOVA,
repeated measures ANOVA, and split-plot (one within and one between factor) ANOVA. Student
choices for the special playgrounds were evenly distributed amongst the four techniques. Each
student presented one of their special learning playgrounds at the last class meeting. The
presentations were audio recorded. About three months later the instructor interviewed three of the
students to get a better idea of student perceptions of the course.
A content analysis of materials was conducted by reviewing students' earlier work in the course, then
looking at each students' two special learning playgrounds, relistening to their verbal presentations,
and reviewing the three recorded interviews. Three key factors emerged as benefits of using an
active-learning approach in the statistics classroom. First, students' working knowledge of statistical
techniques was deeper than in previous classes. A second benefit was that students became active
reflective thinkers about how one can best learn statistical concepts. The third benefit was that
students incorporated multiple modes of learning into their final products. The last two benefits
(reflective thinking and multiple modes) are not directly beneficial to statistical understanding per se
but were highly motivating to the group of students in this study since they all shared an interest in
education and the educational process. Each of these three benefits of constructing learning
playgrounds is discussed i n the following section.
Deeper Understanding
One of the key benefits of developing learning playground products was that student understanding
of key concepts appears to have been much more thorough. This was reflected in five specific ways:
(a) they showed a greater connection to research design, (b) their technical and conceptual
understanding was integrated, (c) their interpretive abilities increased, (d) they benefitted from dual
coding of sum of squares partitioning, and (e) they asked better questions in class.
Connection to research design. This was probably the outstanding difference between previous
classes and the active-learning one. Students were constantly trying to link new statistical
techniques with potential research designs. An unintended benefit of the learning playground
challenge was that students initially chose to create one context which they could use across the
various statistical techniques. Below are two sample contexts. Each student would then alter the
basic context, as appropriate, so that it fit the demands of the specific statistical technique under
study.
Example Context 1: Twelve students (six females & six males) were in a control group (received
lecture and reading material on Arterial Blood Gas (ABG) analysis) and twelve students (six females
& six males) were in an experimental (treatment) group (received lecture, reading material, and were
assigned to participate in an electronic ABG learning activity). These undergraduate nursing
students were enrolled in an Introduction to Pathophysiology course at a private University in
Northern California.
Example Context 2: You decide to investigate the effects of cooperative learning on your fifth grade
students' achievement in social studies. Three table groups worked cooperatively on reading the
textbook and answering questions. Three other table groups worked individually on the same
materials. At the end of the two-week unit you administered a test on the social studies unit to
determine which method had been most effective.
While adapting one "master context" to the various ANOVA techniques makes sense in terms of
saving time, it had the important pedagogical benefit of prompting students to think more deeply
about the relationship between context and statistical technique. The end result was that students
were more savvy in terms of thinking about design first, then statistical technique to match that
design. In Figure 2 part of a student's work is displayed that is teaching a future user about how to
think through creating an ANOVA analysis. From the start, this student begins with the theme of
thinking about research design.
Technical and conceptual merged. Among novices there tends to be a wide gulf between technical
and conceptual understanding of a statistical technique. Typically some students will be better at
retaining a technical understanding as reflected in an ability to remember and use appropriate
formulas. On the other hand, other students will be better at retaining knowledge of the conceptual
basis and implications of a statistical technique. For example, a student may be fairly good at
remembering what a basic regression analysis does and what it may tell us but not know how to
interpret the constant and slope numbers. In general there is a trend amongst faculty to emphasize
conceptual understanding over technical competence in many statistics courses because of the
pervasive influence of personal computers and statistical analysis packages. The intended benefit of
using the learning playground challenge in this course was that students' conceptual understanding
of ANOVA designs would be increased. The unintended r esult of the challenge, however, was that
students had a much better understanding of both the conceptual and technical underpinnings
of ANOVA. More importantly, they generally saw how one "type" of understanding increased and
supported the other "type." The end-of-semester student presentations emphasized this link, and
how for them being able to blend the technical with the conceptual made the ANOVAconcepts both
stronger and easier to understand for the students.
Figure 3 shows one example of how a student took the time to provide meticulous conceptual and
technical support in a learning playground. This figure captures the portion of the learning
playground which contains the ANOVA table for a repeated-measures design. Through text boxes,
and hidden pop-up notes, the student was able to guide a user though how to think about both the
conceptual and technical information contained just in this one small portion of the product.
In Figure 4 another student approached this issue m a different way. Here they've chosen to make
explicit the technical underpinnings of the calculations in the ANOVA table. Although Figure 4 only
shows a portion of one sheet, you can see there is conceptual support (in terms of the "statistical
significance" and the "effect size guidelines" balloons) while if you scroll down the sheet you'll also
find more detailed support regarding specific calculations.
Interpretive abilities increased. Interpretation implies the "What does it mean?" question in statistics.
This was easier for students to address because students created their own context for a learning
playground. Interpretation was an area in which many students previously felt "shaky." Consequently
it was an issue everyone addressed in their products even though it was not an explicit standard for
a learning playground. Many students also chose to include questions in their final products, which
would prod the user to think more deeply about statistical analysis. In Figure 5 you see part of a
learning playground where the student used a steady stream of questions (with an accompanying
"answer" sheet) to keep users thinking more deeply about what was going on in terms of design and
analysis.
Dual coding of partitioning. Students were required to include a graph in their learning playground.
However, many students went beyond this simple requirement as they were acutely aware of the
benefits of dual coding. Specifically they spoke about the benefits of being able to enter data, or
change data, and immediately see the impact those changes had on a pie or interaction graph. A
number of students placed their graph right next to the data entry area. As a user enters, or
changes, data they get immediate feedback on how that new number impacts the resulting
partitioning of the sum of squares. Since all of the learning playgrounds used small sample sizes
(typically 5-10 subjects per cell of a design) it was also much easier for a user to see how changing
one score might impact the subsequent partitioning.
Better questions in class. Increased student attention to detail and "making connections" in the
learning playground products also resulted in better classroom questions. This was true even for
topics which were not the focus of any student product. Not only did students ask more questions
than in previous versions of the course, they also tended to ask better questions. Many of those
questions focused on the linkage between statistical analysis and research design or on the linkage
between conceptual importance and the technical structure of a particular ANOVA design.
Reflective Thinking
During their inclass presentations most students. said, "When I thought about what it was like for me
when I was taking my first course in statistics...." These students took advantage of their educator's
expertise, and interest, to reflect on their own strengths and limitations as novice learners of
statistics to try to develop products that would explicitly meet the gaps they felt existed in their own
knowledge base. Perhaps just as important they also paid attention to motivational factors. They
realized that for them, and others, that first course in statistics was often a struggle motivationally
and cognitively. As a result of their reflections on how we learn there were three general themes that
emerged in their products: (a) valuing the importance of context, (b) valuing the importance of
ongoing questions, and (c) linking the technical with the conceptual.
The importance of context. In reflecting on how they and others learn, students found it quite
important that their learning playground products include problems in contexts that their audience
would readily understand and relate to. They also realized the value of having a consistent context
and general problem, so that only specific questions changed with different statistical techniques but
an overarching general context was maintained.
The importance of questions. Students reflected on the relatively passive nature of most statistical
learning. Specifically, they could imagine their own learning playground products easily becoming
very passive unless they built in some form of interactivity. Students did this in two ways. First, most
built in a series of questions which the user was supposed to answer as they progressed through a
learning playground. A second general "questioning" device was the use of a challenge to get the
user to interact with the basic data entry area. All learning playground products had a way in which
they encouraged the user to change the raw data and make sense of the resulting changes in
statistical analysis. Some challenges were more specific than others because they asked the user to
try to create a raw data set that would result in a prespecified statistical result (e.g., resulting in a
nonsignificant difference for both independent variables but a significant finding for the interaction
effect in a 2-way ANOVA) .
Linking technical with conceptual. As mentioned earlier, students reflected on the intimate link
between conceptual idea and technical procedure. Most learning playground products tried to
explicitly combine both factors. In Figure 6 part of a worksheet is displayed for a one-
way ANOVAlearning playground. On this sheet the student is trying to communicate both the "big
idea" of sum of squares, and help the user link this idea to the practical ability to calculate each SS.
Not seen in Figure 6 is how the sheet scrolls down and provides the user with a practical example of
calculating each SS, then it moves back towards the conceptual by linking the calculations with
substantive meaning.
Multiple Modes
Perhaps more than anything else, students thought about the nature of the learning process itself.
They were conscious that most people are not motivated to take statistics courses (especially
doctoral education students who may not have taken a mathematics course for over 20 years).
While paying attention to statistical learning by itself, students were also savvy about including
elements of instructional design that would be generally helpful regardless of the content being
learned.
Design sensitivity and color coding. All students were aware of the importance of creating a design
that was attractive for users. For some students this meant trying to simplify their learning
playground products as much as possible. Often simplification was achieved by "hiding" information
through devices such as pop-up notes so the user could get information and support on a need-to-
know basis but was not overwhelmed with information when first navigating a learning playground.
Others focused less on simplification by itself, but tried to make the layout of their product appealing
and organized so that a user would feel motivated to progress through the learning playground.
Figure 7 shows the beginning sheet for one such learning playground. While the design of this sheet
is relatively complex, it makes great use of organizational features that would help the user progress
through the sheet in a helpful manner: notice the step-boxes on the left side with arrows pointing to
relevant features. Even at the ve ry beginning there is the use of a question card (far right). On the
far upper left there is a toolbox icon: if the user clicks on this icon they are hyper-linked back to the
"main page" of this student's learning playground product.
A number of students also made conscious use of color coding. While all students used colors to
make their products look better, some went further and used color coding to make their learning
playground visually intuitive to a user. For instance, one student always colored the cells for the first
independent variable with one color, for a second independent variable with another. A simple
device, but one that allowed the user to easily make sense of the contributions of each variable to
the statistical analysis.
Schema theory. Students also tried to take advantage of how people tend to organize categories of
objects in their minds. Specifically, several students took advantage of schema theory to help
learners better understand the relationships between concepts. Figure 8 shows the beginning of one
such learning playground. This particular student used colored figures to represent the different
"players" in their product. [SS.sub.within] was represented by a red figure in Figure 8, his "brother"
[SS.sub.between] was represented by a blue figure. Later the user meets their cousins, d(Cohen's d)
and [eta.sup.2]. Then there's the cranky old uncle, F-test, who "just sits around and judges things."
By presenting these various statistical entities as part of a connected "family" of concepts, and by
using an appropriate level of humor, the student created a memorable learning playground that
succeeded in emphasizing both the conceptual, and the technical.
Interactivity. Another way to engage a user was by incorporating elements that would make the
learning playground products more interactive. As mentioned earlier, one such device was the use of
questions within the products. Another way to add interactivity was through the use of buttons. One
student became particularly engaged with the potential benefits of buttons. A problem with Excel as
a educational tool for end-users is that most people are unfamiliar with using spreadsheet software
and it's various capabilities. The value of buttons is they provide a way of incorporating some of the
more advanced features of spreadsheets without requiring the user to know how to access those
features: press a button and the event happens! For instance, in Figure 9 buttons are used to help a
user access Excel's ability to show "trace precedents" and "trace dependents" arrows. A user can
select any cell in the sheet and then press the "Show Before" (or trace precedents) button to see
which cells (or numbers) contributed to the calculation of the number in the cell selected. In Figure 9
the arrows are coming from four arrays of numbers (representing the four cells in a 2x2 factorial
design). Press the "Show After" (or trace dependents) button and the user sees what subsequent
calculations the selected cell contributes to. Again, in Figure 9 the [SS-.sub.within] cell is shown to
contribute to [MS.sub.within] and the three observed F-test numbers.
Visual/analytic dual coding. This feature was discussed earlier, but students saw it as crucial that a
user see how key statistical measures change as the raw data is altered. The natural interactive
ability of Excel was used in that this spreadsheet program automatically and immediately updates a
graph (and other subsequent calculations) when a data point is changed.
Storyline as building block. The idea of storytelling was central to most learning playground products.
There were two basic techniques students used for storytelling, and most students used both
techniques simultaneously. The first technique was to create a context that the user could
understand. The second technique was to name worksheets within a workbook to create a
systematic way for the user to sensibly progress through a learning playground product. A few
students also used hyperlinks so the user could click on an icon and be taken to a relevant web site
or other worksheet. The organization used by students was typical of a story structure: introduction,
playground (or body), more details, and a conclusion.
Building their own products. A number of students went one step further by challenging the user to
create their own spreadsheet calculations. They predicted that a very active, hands-on mode of
learning was going to most effective. Figure 10 shows the beginning instructions that one student
provided their user for how to name cells and arrays.
LIMITATIONS
Overall the learning activities worked very well, resulting in a higher level of student learning relative
to previous versions of the course given by the same instructor. Nonetheless, there were limitations
to the learning playground challenge that could be improved in the future. These limitations were
understood from informal student comments and from three student interviews conducted a few
months after the course had ended. The key areas for improvement included: a critical time zone for
learning Excel, using more of a "studio" approach, and making stronger connections between Excel
and statistical analysis software.
Critical Zone
The course was about ANOVA techniques, not about computers by themselves. Excel served a
powerful role as a pedagogical tool for helping students enhance their understanding of key
concepts. However for Excel to function in a supportive role, the basics of Excel need to be learned
by students quickly and well. Student comments indicated two key issues regarding learning Excel:
(a) there is a 2-3 week window in which they are willing to live with the ambiguity of not knowing how
to use Excel competently and (b) early on they need to have tangible evidence that they are making
good progress as novice Excel users. Put differently, learning "refined" Excel skills as the course
progresses is fine, but the core Excel skills need to be mastered by students in a relatively short
amount of time (three weeks or less). Most important to note is that it is students who feel this need
for relatively quick mastery of basics. Related to this critical learning zone of three weeks are the
connected issues of a hands-on worksh op, debugging skills, and the student guide.
Hands-on workshop. Students were provided with an initial three hour workshop in a computer lab
about learning Excel. While the workshop was useful, student comments indicated that the workshop
was not as hands-on as needed. In short, there was too much looking at an overhead screen about
how to do it rather than the students themselves trying out the software. In the future the initial three
hour workshop will be revised so that it is more hands-on.
Debugging skills. For many students Excel is like learning a programming language. One implication
is that to learn Excel partly involves having good "debugging" skills. However, especially at the
beginning, students felt ill equipped to deal with situations in which Excel did not work as advertised.
Put differently, they learned from trial-and-error and from e-mailing the instructor, but they thought an
up-front explicit tutorial on debugging Excel would be very useful for future novice learners.
Student guide. Students were given an instructor-written 49-page guide called Excel to the Rescue!
The guide provided information on all the key spreadsheet skills they would need, in the order in
which they would use them, accompanied by lots of screen-shot pictures showing exactly what the
procedure would look like on the computer screen. The guide was supplemented by some instructor-
made QuickTime movies showing what to do in Excel. Despite the benefits of this guide and
accompanying movies, the guide did have some typos regarding specific statistical procedures.
These were easy to correct through e-mail and in class, but for novice learners (again in the first
three weeks) it was sometimes hard for them to discern whether a problem they were experiencing
was due to a typo or their perceived low level of competence in Excel.
Studio Approach
As the course progressed some student products were shared with the class to highlight various
ideas or approaches to designing a learning playground. However the only thorough opportunity
students had to see and hear about the thinking behind each other's playgrounds was at the last
class session. Many students commented that they would have loved to have seen other students'
work "in progress" as the course went along. They noted that other people had very good ideas and
by sharing ideas and experiments it would have spurred them to make even richer and better
learning playground products. In other words, students commented that something like a "studio
session" as a small part of each class where two students share their work-in-progress would be a
great addition to future courses. Such a studio session could be done in 20 minutes (out of a 4.5
hour class meeting) and seems to be a reasonable change to make.
Making Connections
While students made many connections between ANOVA concepts, statistical formulae, and
research design, some mentioned that they lacked the mechanical connection between the results
presented in an Excel learning playground and the output from SPSS. Since SPSS is prevalent at
many institutions, it was a reasonable request to integrate the Excel-SPSS connection into a future
version of the student guide.
CONCLUSIONS
The overall result of using Excel as a pedagogical tool for students to create learning playground
products appeared to be a rich and powerful way for students to learn statistics at the intermediate
level. One student described the general experience succinctly:
Most meaningful was that I was able to understand the statistical concept behind the actual
assignment when I used the computer. It was probably the first time in statistics I was able to go,
"Aha, I understand." I used everything: the text, the class notes, the presentations in class. But then I
went to the computer and did the hands-on part. It was then that I was able to put it all together in a
way that I could understand.
In past versions of the course there were always clearly designed presentations, a useful text, class
notes, and the such. The unique added feature to this particular course was the inclusion of Excel as
a pedagogical tool. Another student aptly described the value of going back and forth between
conceptual understanding and mechanical implementation:
I think the best thing (about the course) was that in order to make the playgrounds I really had to
understand what ANOVA was in general, as well as the particular ANOVA we were working on.
Plugging in equations was busy work, but it was crucial that I had a solid conceptual understanding
before I started creating the playground. Then when I was working through it I understood the
concept more and more. So at the end, when we had to come up with a scenario that fit with the
design, that was Excellent because I really had to think about it. Then when we had to address what
it all really meant in practical terms--that really solidified everything!
This quote speaks not just to the general value of Excel as a learning tool, but specifically to the
importance of engaging in a process where the learner is going back and forth between the
conceptual and the technical since each of these different types of understanding reinforce, and
deepen, the overall learning experience of the student. One student highlighted another key feature
of the learning playground products:
I was able to integrate creativity and a great deal of fun into something I originally perceived as dry
and uninteresting. It made it a real exercise in design and how I present the material. Also the way
you framed it--that it would be an instructional product--that made me think carefully about novices
and wanting to help them.
So, in addition to the pure statistical learning features, the "fun" features of the learning playground
products was crucial in the long run. The course did challenge learners to use their educational
expertise and to use a modicum of design sensitivity to create learning playground products that
were appealing and effective. Perhaps the most important potential aspect of the course was
revealed in an e-mail message one student sent to the instructor after the end of the course:
I think the most important thing that I learned from the class was that I could be good at things I had
more or less written off. I considered myself pretty poor at statistics and a lot worse at Excel
(ignorant, to say the least)! At some point during the class I realized that I was quite good at
designing ANOVA playgrounds. How amazing!!! Learning the discrete skills was, of course,
important. But the most valuable thing was learning that I could be good at just about anything given
excellent instruction and the opportunity to approach the subject from a position of strength (my
creative pedagogic skills). The wonderful thing about teaching teachers is that there are also second
order benefits--I'll find ways to incorporate what I've learned into teaching others. So you've taught a
lot more people through me.
While it is unrealistic to hope that a course would have a similar impact on all students, it is
encouraging that some students see the long-term benefits of creating and making worthy products
as part of the learning process.
FUTURE DIRECTIONS
This investigation yielded positive results supporting the notion that active-learning environments
can be effective, but there is still much to learn. This study was different from the Mitchell (1997)
study where the course focused on learning computer technology. Instead this study more closely
resembled the needs of a typical instructor: using technology as a pedagogical tool which supports
learning but is not the focus of the learning process.
There are some areas of practical concern for instructors that this study does not address. All the
students in this intermediate level statistics course already had taken courses in introductory
statistics and introductory research design. How would such a course work with true novices in a
beginning level introductory course? In addition, the course had the luxury of containing only eight
students. What problems would be incurred in adapting this course to the more realistic situation of
an introductory doctoral-level statistics course where the enrollment is typically 25-30 students? In
the near future this approach will be tried out in an introductory statistics class. Hopefully the lessons
learned from this initial experiment into a technology-supportive role for active-learning can be
modified to meet the even greater needs of novices grappling with introductory level statistical
concepts.
IMPLICATIONS FOR OTHER FIELDS
This study looked at only one field of study: statistics. What implications are there for other fields--
either in mathematics or the sciences in general? One of the general problems in mathematics and
science education has been that so few students are attracted to them. In the sciences especially
there has been some attention paid to the developmenet of Problem-Based Learning (PBL) curricula
that challenge students to be more active learners. With the advent of more intuitive software such
as Excel, some multimedia programs, and specialty mathematics/science software (such as the
Geometric Sketchpad) there are even greater oppotunities now for students to use such computer
based tools to actively construct meaning from their learning experiences. The essence of the
challenge used in this study ("Can you teach others about the concept you just learned?") is the kind
of challenge that could be used in many other fields of study. The specific advantages to instructors
in mathematics and science is that such an approach seems to help students learn the material
better while also allowing them to develop a deeper, more positive relationship with the content.
These positive cognitive and motivational benefits seem to indicate that pursuing this general "teach-
to-learn" approach may have a fairly wide applicability.
APPENDIX
STATISTICAL TERMS
Sum of Squares: This is a measure of all the squared deviation scores in a sample of numbers. The
statistical measures of variance and standard deviation are derived from this basic measure. The
general notion of sum of squares gained even greater importance in the early 20th Century with
Ronald Fischer's insight that the total sum of squares for a sample could be partitioned into
independent pieces.
ANOVA: A basic one-way ANOVA is one way of partitioning the total sum of squares into two
independent pieces: sum of squares within (variation we can't explain) versus sum of squares
between (variation which can be explained due to systematic differences between the levels of the
independent variable).
F-Test: The F-test (named after Ronald Fisher) is used to assess if there is a statistically significant
difference between the levels of the independent variable. The test essentially takes advantage of
the partitioned sum of squares to create a ratio that is then used to assess statistical significance.
Eta-squared: A measure of effect size. In a one-way ANOVA it is calculated by dividing the sum of
squares between by the sum of squares total. The resulting number gives us the percentage of the
total sum of squares that can be explained by between group differences. In more
complicated ANOVA designs, eta-squared can be calculated by using the sum of squares for a
specific effect diviided by the sum of squares total.
References
Abramovich, S. (1995). Technology-motivated teaching of advanced topics in discrete mathematics.

Journal of Computers in Mathematics and Science Teaching, 14(3), 391-418.
Abramovich, S., & Nabors, W. (1998). Enactive approach to word problems in a computer
environment enhances mathematical learning for teachers. Journal of Computers in Mathematics
and Science Teaching, 17(2/3), 161-180
Arganbright, D.E. (1992). Using spreadsheets in introductory statistics. Statistics for the twenty-first
century (pp. 226-242). Washington, DC: Mathematical Association of America.
Bakeman, R. (1992). Understanding social science statistics: A spreadsheet approach. Hillsdale, NJ:
Lawrence Erlbaum.
Bargh, J.A., & Schul, Y. (1980). On the cognitive benefits of teaching. Journal of Educational
Psychology, 72, 593-604.
Benware, C.A., & Deci, E.L. (1984). Quality of learning with an active versus passive motivational
set. American Educational Research Journal, 21, 755-765.
Brophy, J., & Alleman, J. (1991). Activities as instructional tools: A framework for analysis and
evaluation. Educational Researcher, 20(4), 9-23.
Bruner, J. (1966). Toward a theory of instruction. Cambridge, MA: Harvard University Press.
Connell, M. (1998). Technology in constructivist mathematics classrooms. Journal of Computers in

Mathematics and Science Teaching, 17(4), 311-338.
Dewey, J. (1963). Experience and education. New York: Collier Books.
Dugdale, S. (1998). A spreadsheet investigation of sequences and series for middle grades through
precalculus. Journal of Computers in Mathematics and Science Teaching, 17(2/3), 203-222.
Kafai, Y. (1995). Minds in play: Computer game design as a context for children's learning. Hillsdale,
NJ: Lawrence Erlbaum.
Mayer, R. (2001). Multimedia learning. Cambridge, England: Cambridge University Press.
Mitchell, M. (1993). Situational interest: It's multifaceted structure in the secondary school
mathematics classroom. Journal of Educational Psychology, 85, 427-439.
Mitchell, M. (1997). The use of spreadsheets for constructing statistical understanding. Journal of
Computers in Mathematics and Science Teaching, 16(2/3), 201-222.
Papert, S. (1980). Mindstorms: Children, computers, and powerful ideas. New York: Basic Books.
Piele, D. (1990). Introductory statistics with spreadsheets. New York: Addison-Wesley.
Rogers, C. (1969). Freedom to learn. Columbus, OH: Merrill.
Shavelson, R. (1996). Statistical reasoning for the behavioral sciences. Boston: Allyn and Bacon.
Sutherland, R., & Rojano, T. (1993). A spreadsheet approach to solving algebra problems. Journal
of Mathematical Behavior, 12, 353-383.
Whitehead, A.N. (1929). The aims of education and other essays. New York: Macmillan.
Zajonc, R.B. (1960). The process of cognitive tuning in communication. Journal of Abnormal and
Social Psychology, 61, 159-167.
RELATED ARTICLE: ADDITIONAL RESOURCES
An example of an instructor-made learning playground and a variety of student-made playgrounds

can be downloaded at the following web link: mitchellprion.com/constructanova
Mitchell, Mathew

Course Syllabus

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Course Syllabus

Uploaded by

Copyright:

Available Formats

ADVANCED STATISTICAL METHODS

CAPITOL UNIVERSITY (C.U.)

Part I. Descriptive Statistics and Inferential Statistics

1. Chris Drewberry: Statistical Methods for Organizational Research (2012)

1. Correlations: how do we ever establish definite causation?, Morton E. Tavel., Skeptical

Thanks for all the excellent work.

How Do We Proceed in the 'Real World'?

There are, of course, instances in

1. Strength of association--the degree to which a certain disease is increased following a given

2. Consistency of association--whether the association has been repeatedly observed by different

3. Specificity of association--whether a specific disease is related to a single type of exposure. Even

5. Biologic gradient--whether there is a dose-response curve. As implied in example 1, the death

Kudos to Dr. Morton E. Tavel for his article

William F. Vitulli, PhD

[T.sub.Statistic] = [d.sub.Mean] / ([d.sub.Stdev / Sqrt(n))

Step 2. Calculate the mean difference of the paired observations, [d.sub.Avg].

[T.sub.Statistic] = [d.sub.Avg] / ([d.sub.Stedev / Sqrt(n)).

Byline: Dan Frieberg 1

Highest rate 36K - 260 bu/ac - Okoboji soil type

Lowest rate 27K = 216 bu/acre - Harps soil type

(1) J. Hinshaw, LCGC Europe 30(7), 358-361 (2017).

(2) The Comprehensive R Archive Network (CRAN R Project), https://cran.r-project.org/, v. R-3.4.1,

John V. Hinshaw, GC Connections Editor

"GC Connections" editor John V. Hinshaw is a Senior Scientist at Serveron Corporation in

Please Note: Illustration(s) are not available due to copyright restr

Keywords: Global competitiveness, Stage of development, Economic freedom

Third pillar: Macroeconomic environment: The stability of the macroeconomic environment is

Research hypotheses were as follows:

[H.sub.2]: There is a relationship between Economic Freedom and Stages of Development.

Descriptive of Economic Freedom Ranking

Correlation Test for 5-Year Average Scores (2010: 2014)

Sayed M. Elsayed Elkhouly, Ain Shams University, Egypt

Mohamed Gamal Amer, Asec Automation, Egypt

Index of Economic Freedom. The Heritage Foundation. Retrieved from

Index of Economic Freedom, 2010. The Heritage Foundation. Retrieved from

Index of Economic Freedom, 2011. The Heritage Foundation. Retrieved from

Index of Economic Freedom, 2012. The Heritage Foundation. Retrieved from

Index of Economic Freedom, 2013. The Heritage Foundation. Retrieved from

Index of Economic Freedom, 2014. The Heritage Foundation. Retrieved from

Stephane, G. (2011). IMD world competitiveness yearbook 2011, 495.

N Mean Std. Std.

Free Economies 6 83.8761 3.86974 1.57981

95% Confidence Minimum Maximum

Free Economies 79.815 87.9371 80.14 89.74

Stages of Development GCI

Factor Efficiency Innovation

Pears on 0.667 0.59 -0.121 0.269

Stages of Development GCI

Factor Efficiency Innovation

Pearson 0.077 0.19 0.058 0.126

Factor Efficiency Innovation GCI

Pearson .510 ** .445 ** .377 ** .481 **

**. Correlation is significant at the 0.01 level (2-tailed).

Stages of Development GCI

Factor Efficiency Innovation

Pearson 0.057 0.214 0.253 0.137

Stages of Development GCI

Factor Efficiency Innovation

Pearson 0.019 0.071 0.18 0.048

Total Sample Size

Factor Efficiency Innovation GCI

Pearson .751 ** .781 ** .698 ** .759 **

Elkhouly, Sayed M. Elsayed^Amer, Mohamed Gamal

Pearson .510 .445 .377 .481

Pearson .751 .781 .698 .759