You are on page 1of 5

A reprint from

American Scientist
the magazine of Sigma Xi, The Scientific Research Society

This reprint is provided for personal and noncommercial use. For any other use, please send a request to Permissions,
American Scientist, P.O. Box 13975, Research Triangle Park, NC, 27709, U.S.A., or by electronic mail to perms@amsci.org.
©Sigma Xi, The Scientific Research Society and other rightsholders
Macroscope

To Throw Away Data:


Plagiarism as a Statistical Crime
Andrew Gelman and Thomas Basbøll

“The distortion of a text,” says Freud claiming expertise on subjects in which


in Moses and Monotheism, “is
not unlike a murder. The difficulty
Whether data are they were not experts. Wegman contin-
ues to deny having plagiarized, even in
lies not in the execution of the deed
but in doing away with the traces.”
numerical or the face of direct evidence that several
of his publications (on topics ranging
—James Wood from network analysis to color vision)
narrative, removing include unattributed material previous-

M
ly published by others.
them from their We shall avoid speculating about the
uch has been written on the motives for plagiarism here. Generally,
ethics of plagiarism. One context represents an however, the ethical dilemma seems to
aspect that has received less notice be analogous to the person who robs
is plagiarism’s role in corrupting our act of plagiarism a store to feed his or her family, or the
ability to learn from data: We propose politician who lies to achieve a larger
that plagiarism is a statistical crime. It political goal. In all of these cases, the
involves the hiding of important infor- behavior in question is generally recog-
mation regarding the source and con- A statistical perspective on plagia- nized to be unethical, so if the broader
text of the copied work in its original rism might seem relevant only to cases context in which the action takes place
form. Such information can dramatical- in which raw data are unceremoniously is deemed ethical, it can only be thus be-
ly alter the statistical inferences made and secretively transferred from one urn cause the unethical action serves some
about the work. to another. But statistical consequences larger, more important goal. In Weg-
In statistics, throwing away data is also result from plagiarism of a very man’s case, no such argument about a
a no-no. From a classical perspective, different kind of material: stories. To un- larger context has been made (perhaps
inferences are determined by the sam- derestimate the importance of contex- because that would require admitting
pling process: point estimates, confi- tual information, even when it does not the ethical violation in the first place).
dence intervals and hypothesis tests concern numbers, is dangerous. The Wegman case came to public
all require knowledge of (or assump- Perhaps the most prominent statisti- notice after the Canadian blog Deep Cli-
tions about) the probability distribu- cian to have repeatedly published mate- mate found the first few pages of mate-
tion of the observed data. In a Bayesian rial written by others without attribu- rial in the report to be plagiarized from
analysis, it is necessary to include in tion is Edward Wegman, formerly of the a book by Ray Bradley, one of the au-
the model all variables that are relevant Office of Naval Research and currently thors whose work was attacked in that
to the data-collection process. In either a professor at George Mason Univer- document. The blog post stirred oth-
case, we are generally led to faulty in- sity. The case is especially interesting ers to study this and other documents
ferences if we are given data from urn because Wegman has a distinguished written by Wegman and his students,
A and told they came from urn B. record of public service and scholar- at which point additional incidents of
ship (he received the Founders Award copying without attribution turned up.
in 2002 from the American Statistical In 2011, a related article by Wegman
Andrew Gelman is a professor in the departments of Association) and because one of the pla- and a collaborator in the journal Com-
statistics and political science at Columbia Univer- giarized documents was part of a report putational Science and Data Analysis was
sity, New York, and the author of Red State, Blue on climate change delivered to the U.S. formally retracted by the publisher on
State, Rich State, Poor State: Why Americans
Congress. The ethical dimensions of this grounds of plagiarism.
Vote the Way They Do (Princeton University Press,
2008). Thomas Basbøll is an independent writing
copying seem clear enough: By taking Despite the human and political dra-
coach and an external lecturer in the department of others’ work without giving credit— ma of the Wegman case, it may not ap-
management, politics and philosophy at the Copenha- even copying from Wikipedia at one pear immediately interesting from the
gen Business School. Address for Gelman: Columbia point (see the appendix to this essay at standpoint of statistics. Perhaps counter-
University, 1016 Social Work Building, New York, American Scientist’s website)—Wegman intuitively, a purely qualitative example
NY 10027. E-mail: gelman@stat.columbia.edu and his research team were implicitly reveals why this appearance is wrong.

168 American Scientist, Volume 101 © 2013 Brian Hayes. Reproduction with permission only.
Contact bhayes@amsci.org.
Stephanie Freese

A poem by Miroslav Holub that appeared in the Times Literary Supplement in 1977 tells the story of a Hungarian reconaissance unit caught in
a snowstorm in the Alps. Holub recounts how the lieutenant who sent the unit out feared for their lives—but that the unit returned after three
days, saying that one of their number had a map. The map, however, turns out to be of the Pyrénées, not the Alps. The story has been widely
retold in the field of organization studies. Whether and how its source is cited, the authors argue, is a matter of statistical concern.

Snowstorm, Map, Conundrum nied by the slogan, “When you are as Kahneman noted in his talk, some
An anecdote that has been widely circu- lost, any old map will do.” It was even irregularities in Weick’s referencing (or
lated in the organization studies litera- retold by noted psychologist Daniel lack thereof) have emerged.
ture goes something like this: A group of Kahneman at the 2009 Digital Live In 2006, one of us (Basbøll), and a
soldiers are sent out by their leader and Design conference as part of an ac- Ph.D. student in his department, Hen-
get lost in a snowstorm in the Alps. After count of the importance of confidence. rik Graham, published a paper show-
discovering that one of their number has Kahneman attributed the story to the ing that Weick had simply transcribed
a map, they regain their confidence, wait “famous organizational psychologist the story from a poem by Miroslav Hol-
out the storm and return to camp. Only Karl Weick.” Weick, like Wegman, is an ub that had been published in 1977 in
afterward do they realize that the map award-winning and highly regarded the Times Literary Supplement. The text
was not of the Alps but of the Pyrénées. scholar in his field, and he is the com- has minor changes but is nearly identi-
This story has made the rounds in monly cited source for the anecdote in cal to Holub’s—without the line breaks,
management circles, often accompa- the organization studies literature. But, of course. (See the online appendix to

© 2013 Brian Hayes. Reproduction with permission only.


www.americanscientist.org 2013 May–June 169
Contact bhayes@amsci.org.
this essay.) In his earliest uses of the especially slippery aspect of the case, but it’s better when their names are
anecdote, Weick provided no reference and others like it: the denial or avoid- given and their work recognized as
to Holub whatsoever, despite the fact ance of the topic by colleagues of the their own.
that his account was a nearly verbatim offenders. Weick is influential in his Similarly, if Wegman, a nonexpert
reproduction of the poem. In later ver- field, known for his counterintuitive in network analysis, plagiarizes a de-
sions, he mentioned Holub’s poem but management advice. Often, when peo- scription of the field (and, as the blogger
continued to represent the story as his ple who attain such stature misbehave, known as Deep Climate noted, in the
own prose, without enclosing it in quo- others find it hard to believe or don’t process introduces a typo that wrecks
tation marks. want to hear about it. The assumption, one of the mathematical expressions),
Importantly, Weick also began to perhaps, is that any misbehavior was that casts doubt on any empirical stud-
alter Holub’s framing of the story. for the greater good. ies he performs using network analysis.
Like Holub, he invoked Albert Szent- In the wake of the paper’s publica- Ultimately, such analyses must be eval-
Gyorgyi, the Nobel Prize–winning tion in ephemera, Basbøll and Graham uated on their own terms—but with-
physiologist, as the original source of were mocked by organizational strat- out the nudge toward acceptance that
the story (though he did not clearly cite egy professors Teppo Felin and Omar might come from the knowledge that
Holub as the source for this source). Lizardo (the latter referred to them as they were performed by an eminent
Holub described the anecdote as a “what’s his name and watchumacal- statistician. In the Weick case, the copier
“story from the war,” whereas Weick lit”) on the orgtheory blog. When Bas- was getting credit for an interesting sto-
repeatedly called it “an incident that bøll tried to mention Weick’s plagia- ry, as well as credit for Holub’s writing
happened during military maneuvers rism on the online correspondence site style—indeed, for certain very specific
in Switzerland.” With this phrasing, of the Journal of Management Studies, turns of phrase. In addition, by obscur-
not only did he conceal the nature of he was rejected on the grounds that ing the source, he became more free to
his evidence from his readers (it is a Weick might sue the journal. And the alter its meaning in different tellings.
poem with a unique author, not a story American Statistical Society, which Some organization theorists, such
recounted aloud or included in some presented its Founders Award to Weg- as Barbara Czarniawska, have argued
unspecified report), he also exaggerated man in 2002, has not to our knowledge that the truth or falsity of the original
the veracity of the account (and gave commented publicly on the issue. story has no bearing on the reception
the war story an implausible Swiss set- Learning that part of a corpus of work of Weick’s theory. But we disagree. We
ting, perhaps by associating the men- is plagiarized can degrade one’s trust believe, for example, that Weick’s argu-
tion of the Alps with Switzerland). in the rest of the work. This is not just ment would not have been so well re-
The article set off a back-and-forth a moral or psychological argument of ceived if he had presented the material
of publications. The journal that pub- the sort that one might legitimately use as the poem it was rather than calling
lished Basbøll and Graham’s 2006 against a scientist known to have fab- it “an incident that happened during
article, ephemera, printed a response ricated or misrepresented data, such as military maneuvers in Switzerland.” In
from Weick in the same issue. In it, he Diederik Stapel or Marc Hauser—if the a sense, the vaguer attribution, by plac-
dismissed the charge. In 2010, Basbøll guy cheated with data in one place, you ing the story in the category of folklore,
published a response to Weick, along can’t trust his other statements either. gives it an implication of broader sig-
with further examples of plagiarism, Indeed, Basbøll found that the first four nificance—in the same way it can be
which Weick again dismissed. In 2012 pages of one of Weick’s most widely cit- disappointing to learn that a purported
Basbøll published a rhetorical analysis ed books, Sensemaking in Organizations folk ballad was in fact the product of a
of the exchanges so far. (1995), reproduce the work of several forgotten songwriter.
Weick claimed that by the time he other scholars without adequate attribu-
realized the anecdote had relevance to tion. The book also includes an instance Decoupling Story and Source
his work, he had forgotten where he of the Holub plagiarism. To see more clearly how plagiarism is a
first encountered it, and that he “recon- But we are saying something more: crime against statistics, we need to ex-
structed the story as best [he] could.” It If Weick represented a story recount- amine how it helps to decouple the story
seems unlikely that a scholar would add ed in a poem as if it were a historical from the source. In Weick’s case, this dis-
to his own writing a nearly word-for- event, that casts doubt on his rules of tancing allowed him to convey a mes-
word copy of a text whose citation he evidence. It’s not that an unsourced sage that was virtually the opposite of
did not have—and this in the era before anecdote has more authority than a the story’s original meaning. Weick first
computer copy-and-paste. Beyond this, published poem. Rather, obscuring the told the story in 1982 when, five years
Weick’s reaction when the news came source makes the story free-floating, after the appearance of Holub’s poem,
out also gives us reason to doubt his immune from any detail-based exami- Robert Swieringa and he published an
account. Instead of being embarrassed nation. Meanwhile, Weick’s reputation article in the Journal of Accounting Re-
and bending over backward to add a as an original thinker is threatened if search including a nearly word-for-word
clear, apologetic citation in subsequent it turns out that he was appropriating transcription of the poem text, but not
appearances of the material, he seemed others’ ideas while concealing his debt using quotation marks or acknowledg-
all too eager to explain the event away. to them. In a 2004 article in the journal ing Holub at all. In a 1987 essay, Weick
Gelman had never heard of any of Organization Studies, Weick explains his added a “twist” to the story that had
the people involved in these incidents reputation in terms of the “hidden con- resulted from a conversation with Rob-
before Basbøll drew his attention to the nections” that exist between his own ert Engel, a Wall Street executive. Engel,
case of plagiarism. What brought us to- work and that of his precursors. Ac- he relates, suggested the possibility that
gether was a shared frustration with an knowledging one’s precursors is good, the leader who was out with the troops

© 2013 Brian Hayes. Reproduction with permission only.


170 American Scientist, Volume 101
Contact bhayes@amsci.org.
might have known that the map was the story and arrogated for himself the Basbøll, T. 2010. Softly constrained imagina-
tion: Plagiarism and misprision in the theory
false and still used it effectively. Weick right to alter it at will. The act of plagia- of organizational sensemaking. Culture and
concurs with Engel and expounds on rism was the first step in a process that Organization 16:163–178.
the implications as follows: unmoored the story from its sources Basbøll, T. 2012. Any old map won’t do: Im-
and removed its evidential value. proving the credibility of storytelling in sen-
What is interesting about En- semaking scholarship. WMO Working Paper
gel’s twist to the story is that he Series, Copenhagen Business School.
A Statistical Crime
has described the basic situation Basbøll, T. 2012. Legitimate peripheral irrita-
Returning to the statistical language of
that most leaders face. Followers tions. Journal of Organizational Change Man-
probability and likelihood, to falsify the
are often lost and even the leader agement 25:220–235.
provenance of a story is to imply an
is not sure where to go. All the Basbøll, T., and H. Graham. 2006. Substitutes
incorrect likelihood function and thus for strategy research: Notes on the source of
leader knows is that the plan or
to lose inferential validity. (Statistical- Karl Weick’s anecdote of the young lieuten-
the map he has in front of him is
ly speaking, systematically excluding ant and the map. ephemera 6(2):194–204.
not sufficient by itself to get them
data without revealing the exclusion is Czarniawska, B. 2005. Karl Weick: Concepts,
out. What he has to do, when faced style and reflection. Sociological Review 53:
a misspecification of the model.) As one
with this situation, is instill some 267–278.
of us (Basbøll) eventually showed, any
confidence in people, get them Deep Climate. 2011. Wegman and Said 2011:
telling of the story is a selection from
moving in some general direc- Yet more dubious scholarship in full colour,
several possible versions of it. By not part 1. Deep Climate blog. March 26. http://
tion, and be sure they look closely
sourcing it properly, Weick hides the deepclimate.org/2011/03/26/wegman-
at what actually happens, so that and-said-2011-dubious-scholarship-in-full-
opportunism of his sampling and sets
they learn where they were and get colour/
Engel up to propose a convenient (for
some better idea of where they are Deep Climate. 2011. Said and Wegman 2009:
top management) “truth” about cor-
and where they want to be. Suboptimal scholarship. Deep Climate blog.
porate strategy. This is not to say that, Oct. 4. http://deepclimate.org/2011/10/04/
He goes on to suggest that the key in had Weick cited Holub appropriately, said-and-wegman-2009-suboptimal-
this kind of situation is to “get peo- he would not have ultimately used it scholarship/
ple moving.” But in Holub’s poem— to draw lessons about leadership, even Felin, T. 2006. Charges of plagiarism in org the-
Weick’s primary source material—the ones that executives would find useful. ory. Orgtheory blog. July 22. http://orgthe-
ory.wordpress.com/2006/07/22/charges-
soldiers’ recounting stands in direct But if he had done so, he would have of-plagiarism-in-org-theory/
opposition to this interpretation: They had to justify his argument, rather than Hechter, O. 1972. Reflections on General Mem-
say that the map “calmed us down” merely retell the story in his own way brane Structure: The Conference in Review.
and that they “pitched camp, lasted to suit his purposes. Annals of the New York Academy of Sciences
out the snowstorm.” Scholars in fields ranging from psy- 195:506–519.
Making speculations about what chology to history to computer science Holub, M. 1977. Brief thoughts on maps. Trans-
might have happened differently in a have recognized that stories are part of lated by J. and I. Milner. Times Literary Sup-
plement. Issue 3908:118. February 4.
situation is not an invalid strategy in how people understand the world. As
Mallon, T. 1989. Stolen Words: Forays into the
all settings; it’s just a nonempirical one. statisticians, we can consider reasoning Origins and Ravages of Plagiarism. New York:
In this case the line between fact and from stories as a form of approximate Ticknor & Fields.
supposition was blurred so badly that inference. From this perspective, statisti- Pullman, B. 1974. Summary of the chemical
no such distinction could be made. But cal principles should provide some ap- aspects of carcinogenesis. In Chemical Carci-
facts exist that can be adduced to deter- proximate guidance about the potential nogenesis. P. O. P. Ts’o and J. A. DiPaolo, eds.
mine whether Engel’s supposition was biases and precision of such inferences. New York: Marcel Dekker.
correct. Assuming that any such event One key principle is not to throw away Swieringa, R., and K. E. Weick. 1982. An as-
sessment of laboratory experiments in ac-
actually occurred, then his notion about information and, if discarding data is counting. Journal of Accounting Research 20
what happened is either right or wrong. for some reason necessary, to describe (supplement):56–101.
As it turns out, versions of the story that as clearly as possible the mechanism Vergano, D. 2011. Experts claim 2006 climate re-
predate Holub’s poem appear in reports by which the relevant information was port plagiarized. USA Today. November 22.
given by medical researchers Oscar excluded. Plagiarism violates both these Weick, K. E. 1987. Substitutes for strategy. In The
Hechter and Bernard Pullman at scien- rules and, as such, is a violation of statis- Competitive Challenge: Strategies for Industrial
tific symposia in the early 1970s. These tical ethics, beyond any other consider- Innovation and Renewal, ed. D. J. Teece. Cam-
bridge, MA: Ballinger. pp. 222–233.
versions suggest that the anecdote as ations of moral behavior.
Weick, K. E. 1995. Sensemaking in Organizations.
told by Szent-Gyorgyi had the troops’ Thousand Oaks, CA: Sage Publications.
immediate leader thinking it was a map Acknowledgment Weick, K. E. 2001. Making Sense of the Organiza-
of the Alps, too. Those versions rule out Parts of this essay are adapted from tion. Oxford: Blackwell Publishing.
Engel’s interpretation. Gelman’s blog, Statistical Modeling, Weick, K. E. 2004. Mundane poetics: Searching
That interpretation may, of course, Causal Inference, and Social Science, at for wisdom in organization studies. Organi-
be more appealing to Wall Street execu- zation Studies 25:653–668.
http://andrewgelman.com.
tives. Given the evolution of modern Weick, K. E. 2006. Dear editor: A reply to Bas-
finance since the mid-1980s, the fact that bøll and Graham. ephemera 6(2):193.
they appear to have thought that “any Bibliography Weick, K. E. 2010. Comment on “softly con-
Basbøll, T. 2010. JMS suppresses scholarly de- strained imagination.” Culture and Organi-
old map will do” is somewhat disturb- zation 16:179.
bate. Research as a Second Language blog.
ing. But Engel’s idea was generated in May 25. http://secondlanguage.blogspot. Wood, James. 2009. James Wood writes about
a problematic context, one in which dk/2010/05/jms-suppresses-scholarly- the manipulations of Ian McEwan. London
Weick had, in effect, taken ownership of debate.html Review of Books 31(8):14–16.

© 2013 Brian Hayes. Reproduction with permission only.


www.americanscientist.org 2013 May–June 171
Contact bhayes@amsci.org.

You might also like