You are on page 1of 9

PSYCHOLOGICAL SCIENCE

General Article

SCIENCE AND ETHICS IN


CONDUCTING, ANALYZING,
AND REPORTING
PSYCHOLOGICAL RESEARCH
By Robert Rosenthal
The relationship between scientific quality and ethical quality is be ethically questionable because of the shortcomings of
considered for three aspects ofthe research process: conduct of the design.
the research, data analysis, and reporting of results. In the area
of conducting research, issues discussed involve design, re-
cruitment, causism, scientific quality, and costs and utilities. Issues of Design
The discussion of data analysis considers data dropping, data
exploitation, and meta-analysis. Issues regarding reporting of Imagine that a research proposal that comes before an
results include misrepresentation of fmdings, misrepresenta- institutional review board proposes the hypothesis that
tion of credit, and failure to report results as a result of self- private schools improve children's intellectual function-
censoring or external censoring. ing more than public schools do. Children from randomly
selected private and public schools are to be tested ex-
tensively, and the research hypothesis is to be tested by
The purpose of this article is to discuss a number of comparing scores earned by students from private versus
scientific and ethical issues relevant to conducting, ana- public schools. The safety of the children to be tested is
lyzing, and reporting psychological research. A central certainly not an issue, yet it can be argued that this re-
theme is that ethics and scientific quality are very closely search raises ethical issues because of the inadequacy of
interrelated. Everything else being equal, research that is its design. The goal of the research is to learn about the
of higher scientific quality is likely to be more ethically causal impact on performance of private versus public
defensible. The lower the quality ofthe research, the less schooling, but the design of the research does not permit
justified we are ethically to waste research participants' reasonable causal inference because of the absence of
time, funding agencies' money, and journals' space. The randomization or even some reasonable attempt to con-
higher the quality of the research, the better invested sider plausible rival hypotheses (Cook & Campbell,
have been the time ofthe research participants, the funds 1979).
ofthe granting agency, the space of the journals, and, not
least, the general investment that society has made in How does the poor quality of the design raise ethical
supporting science and its practitioners. objections to the proposed research? Because students',
teachers', and administrators' time will be taken from
potentially more beneficial educational experiences. Be-
cause the poor quality of the design is likely to lead to
CONDUCTING PSYCHOLOGICAL RESEARCH unwarranted and inaccurate conclusions that may be
damaging to the society that directly or indirectly pays
Let us turn first to considerations of research design, for the research. In addition, allocating time and money
procedures employed in a study, and the recruitment of to this poor-quality science will serve to keep those finite
human participants. In evaluating the ethical employ- resources of time and money from better quality science
ment of our participants, we can distinguish issues of in a world that is undeniably zero-sum.
safety from more subtle issues of research ethics. Obvi-
ously, research that is unsafe for participants is ethically It should be noted that had the research question ad-
questionable. However, I propose that perfectly safe re- dressed been appropriate to the research design, the eth-
search in which no participant will be put at risk may also ical issues would have been less acute. If the investiga-
tors had set out only to learn whether there were
performance differences between students in private ver-
Address correspondence to Robert Rosenthal, Department of Psy- sus public schools, their design would have been per-
chology, Harvard University, 33 Kirkland St., Cambridge, MA 02138. fectly appropriate to their question.

VOL. 5, NO. 3, MAY 1994 Copyright © 1994 American Psychological Society 127
PSYCHOLOGICAL SCIENCE

Science and Ethics

Issues of Recruitment self-serving because it makes the causist's result appear


more important or fundamental than it really is.
The Amencan Psychological Association's (APA) If a perpetrator of causism is unaware of the causism,
Committee for the Protection of Human Participants in its presence simply reflects poor scientific training. If the
Research and its new incarnation, the Committee on perpetrator is aware of the causism, it reflects blatantly
Standards in Research, and such pioneer scholars of the unethical misrepresentation and deception.
topic as Herbert Kelman have thoughtfully considered a
Whereas well-trained colleagues can readily differen-
variety of ethical issues in the selection and recruitment
tiate causist language from inferentially more accurate
of human participants (APA, 1982; Blanck, Bellack, Ros-
language, potential research participants or policymakers
now, Rotheram-Borus, & Schooler, 1992; Grisso et al.,
ordinarily cannot. When a description of a proposed re-
1991; Kelman, 1968). Only a few comments need be made
search study is couched in causal language, that descrip-
here.
tion represents an unfair recruitment device that is at best
On the basis of several reviews of the literature, my inaccurate, when it is employed out of ignorance, and at
friend and colleague Ralph Rosnow and I have proposed worst dishonest, when it is employed as hype to increase
a number of procedures designed to reduce volunteer the participation rates of potential participants. As a
bias and therefore increase the generality of our research member of an institutional review board, I regret that I
results (Rosenthal & Rosnow, 1975, 1991; Rosnow & have seen such use made of causist language in proposals
Rosenthal, 1993). Employment of these procedures has brought before us.
led us to think of our human participants as another
"granting agency"—which, we believe, they are, since
they must decide whether to grant us their time, atten- Bad Science Makes for Bad Ethics
tion, and cooperation. Part of our treating them as such is
Causism is only one example of bad science. Poor
to give them information about the long-term benefits of
quality of research design, poor quality of data analysis,
the research. In giving prospective participants this in-
and poor quality of reporting of the research all lessen the
formation, we have a special obligation to avoid hyper-
ethical justification of any type of research project. I be-
claiming.
lieve this judgment applies not only when deception, dis-
comfort, or embarrassment of participants is involved,
Hyperclaiming but for even the most benign research experience for
Hyperclaiming is telling our prospective participants, participants. If because of the poor quality of the science
our granting agencies, our colleagues, our administra- no good can come of a research study, how are we to
tors, and ourselves that our research is likely to achieve justify the use of participants' time, attention, and effort
goals it is, in fact, unlikely to achieve. Presumably our and the money, space, supplies, and other resources that
granting agencies, our colleagues, and our administrators have been expended on the research project? When we
are able to evaluate our claims and hyperclaims fairly add to the "no good can come of it" argument the ines-
well. However, our prospective participants are not; capable zero-sum nature of time, attention, effort,
therefore, we should tell them what our research can money, space, supplies, and other resources, it becomes
actually accomplish rather than that it will yield the cure difficult to justify poor-quality research on any ethical
for panic disorder, depression, schizophrenia, or cancer. basis. Eor this reason, I believe that institutional review
boards must consider the technical scientific competence
Causism of the investigators whose proposals they are asked to
Closely related to hyperclaiming is the phenomenon of evaluate. Yes, that will increase the work required of
causism. Causism refers to the tendency to imply a causal board members and change boards' compositions some-
relationship where none has been estabhshed (i.e., where what to include a certain degree of methodological ex-
the data do not support it). pertise. No, it will not always be easy to come to a de-
cision about the scientific competence of an investigator
and of a particular proposal, but then it is not always easy
Causism: Characteristics and Consequences to come to a decision about the more directly ethical
Characteristics of causism include (a) the absence of aspects of a proposal either.
an appropriate evidential base; (b) the presence of lan- Poor quality of research makes for poor quality of ed-
guage implying cause (e.g., "the effect of," "the impact ucation as well. Especially when participation is quasi-
of," "the consequence of," "as a result of") where the coercive, the use of participants is usually justified in part
appropriate language would have been "was related to," by the fact that they will benefit educationally. But if
"was predictable from," or "could be inferred from"; participants are required to participate in poor-quality
and (c) self-serving benefits to the causist. Causism is research, they are likely to acquire only misconceptions

128 VOL. 5, NO. 3, MAY 1994


PSYCHOLOGICAL SCIENCE

Robert Rosenthal

about the nature of science and of psychology. When the cancer cure or the racism reducer for the more basic
participants' scores on personality scales are correlated as compared with the more applied research.
with their scores on standardized tests or course grades, This idea of lost opportunities has been applied with
and they are told that "this research is designed to learn great eloquence by John Kaplan (1988), of the Stanford
the impact of personality on cognitive functioning," they University Law School. The context of his remarks was
have been poorly served educationally as part of having the use of animals in research and the efforts of "animal
been misled scientifically. rights" activists to chip away "at our ability to afford
animal research. . . . [I]t is impossible to know the costs
of experiments not done or research not undertaken.
Costs and Utilities Who speaks for the sick, for those in pain, and for the
Payoffs for doing research future?" (p. 839).
When individual investigators or institutional review In the examples considered so far, the costs of failing
boards are confronted with a questionable research pro- to conduct the research have accrued to future genera-
posal, they ordinarily employ a cost-utility analysis in tions or to present generations not including the research
which the costs of doing a study, including possible neg- participants themselves. But sometimes there are inci-
ative effects on participants, time, money, supplies, ef- dental benefits to research participants that are so impor-
fort, and other resources, are evaluated simultaneously tant that they must be considered in the calculus of the
against such utilities as benefits to participants, to other good, as in the following example:
people at other times, to science, to the world, or at least I was asked once to testify to an institutional review
to the investigator. The potential benefits of higher qual- board (not my own) about the implications of my re-
ity studies and studies addressing more important topics search for the ethics of a proposed project on karyotyp-
are greater than the potential benefits of lower quality ing. The study was designed to test young children for the
studies and studies addressing less important topics. Ros- presence of the XYY chromosome, which had been hy-
now Eind I have often diagrammed this type of cost-utility pothesized to be associated with criminal behavior. The
analysis as a two-dimensional plane in which costs are youngsters would be followed up until adulthood so that
one dimension and utilities the other (Rosenthal & Ros- the correlation between chromosome type and criminal
now, 1984, 1991; Rosnow, 1990). Any study with high behavior could be determined. I was asked to talk about
utility and low cost should be carried out forthwith. Any my research on interpersonal expectancy effects because
study with low utility and high cost should not be carried it was feared that if the research were not done double-
out. Studies in which costs equal utilities are very diffi- blind, the parents' or researchers' expectations for in-
cult to decide about. creased criminal behavior by the XYY males might be-
come a self-fulfilling prophecy (Rosenthal, 1966;
Payoffs for failing to do research Rosenthal & Jacobson, 1968, 1992). A double-blind de-
However, Rosnow and I have become convinced that sign should have solved that problem, but the board de-
this cost-utility model is insufficient because it fails to cided not to permit the research anyway.
consider the costs (and utilities) of not conducting a par- The enormous costs to the participants themselves of
ticular study (Rosenthal & Rosnow, 1984, 1991; Rosnow, the study's not being done were not considered. What
1990; Rosnow & Rosenthal, 1993). were those costs? The costs were the loss of 20 years of
The failure to conduct a study that could be conducted free, high-quality pediatric care to children whose par-
is as much an act to be evaluated on ethical grounds as is ents could never have afforded any high-quality pediatric
conducting a study. The oncology group that may have a care. Was it an ethically defensible decision to deprive
good chance of finding a cancer preventive but feels the scores or hundreds of children of medical care they
work is dull and a distraction from their real interest is would otherwise not have received in order to avoid hav-
making a decision that is to be evaluated on ethical ing a double-blind design that had very little potential for
grounds as surely as the decision of a researcher to in- actually harming the participants? At the very least, these
vestigate tumors with a procedure that carries a certain costs of failing to do the research should have received
risk. The behavioral researcher whose study may have a full discussion. They did not.
good chance of reducing violence or racism or sexism,
but who refuses to do the study simply because it in-
volves deception, has not solved an ethical problem but DATA ANALYSIS AS AN ETHICAL ARENA
only traded in one for another. The issues are, in princi-
Data Dropping
ple, the same for the most basic as for the most applied
research. In practice, however, it is more difficult to Ethical issues in the analysis of data range from the
make even rough estimates of the probability of finding very obvious to the very subtle. Probably the most ob-

VOL. 5, NO. 3, MAY 1994 129


PSYCHOLOGICAL SCIENCE

Science and Ethics

viotis and most serious transgression is the analysis of result not be significant at the .05 level, we were taught,
data that never existed (i.e., that were fabricated). Per- we should bite our lips bravely, take our medicine, and
haps more frequent is the dropping of data that contradict definitely not look further at our data. Such a further look
the data analyst's theory, prediction, or commitment. might turn up results significant at the .05 level, results to
which we were not entitled. All this makes for a lovely
Outlier rejection morality play, and it reminds us of Robert Frost's poem
There is a venerable tradition in data analysis of deal- about losing forever the road not taken, but it makes for
ing with outliers, or extreme scores, a tradition going bad science and for bad ethics.
back over 200 years (Barnett & Lewis, 1978). Both tech- It makes for bad science because while snooping does
nical and ethical issues are involved. The technical issues affect p values, it is likely to turn up something new,
have to do with the best ways of dealing with outliers interesting, and important (Tukey, 1977). It makes for
without reference to the implications for the tenability of bad ethics because data are expensive in terms of time,
the data analyst's theory. The ethical issues have to do effort, money, and other resources and because the anti-
with the relationship between the data analyst's theory snooping dogma is wasteful of time, effort, money, and
and the choice of method for dealing with outliers. For other resources. If the research was worth doing, the data
example, there is some evidence to suggest that outliers are worth a thorough analysis, being held up to the light
are more likely to be rejected if they are bad for the data in many different ways so that our research participants,
analyst's theory but treated less harshly if they are good our funding agencies, our science, and society will all get
for the data analyst's theory (Rosenthal, 1978; Rosenthal their time and their money's worth.
& Rubin, 1971). At the very least, when outliers are re- Before leaving this topic, I should repeat that snooping
jected, that fact should be reported. In addition, it would in the data can indeed affect the p value obtained, de-
be useful to report in a footnote the results that would pending on how the snooping is done. But statistical ad-
have been obtained had the outliers not been rejected. justments, for example, Bonferroni adjustments (Estes,
1991; Howell, 1992; Rosenthal & Rubin, 1984), can be
Subject selection helpful here. Most important, replications will be
A different type of data dropping is subject selection in needed—whether the data were snooped or not!
which a subset of the data is not included in the analysis.
In this case, too, there are technical issues and ethical
Meta-Analysis as an Ethical Imperative
issues. There may be good technical reasons for setting
aside a subset ofthe data—for example, because the sub- Meta-analysis is a set of concepts and procedures em-
set's sample size is especially small or because dropping ployed to summarize quantitatively any domain of re-
the subset would make the data more comparable to search (Glass, McGaw, & Smith, 1981; Rosenthal, 1991).
some other research. However, there are also ethical is- We know from both statistical and empirical research
sues, as when just those subsets are dropped that do not that, compared with traditional reviews ofthe literature,
support the data analyst's theory. When a subset is meta-analytic procedures are more accurate, comprehen-
dropped, we should be informed of that fact and what the sive, systematic, and statistically powerful (Cooper &
results were for that subset. Similar considerations apply Rosenthal, 1980; Hedges & Olkin, 1985; Mosteller &
when the results for one or more variables are not re- Bush, 1954). Meta-analytic procedures use more of the
ported. information in the data, thereby yielding (a) more accu-
rate estimates of the overall magnitude of the effect or
relationship being investigated, (b) more accurate esti-
Exploitation Is Beautiful mates of the overall level of significance of the entire
That data dropping has ethical implications is fairly research domain, and (c) more useful information about
obvious. An issue that has more subtle ethical implica- the variables moderating the magnitude of the effect or
tions is exploitation. Exploiting research participants, relationship being investigated.
students, postdoctoral fellows, staff, and colleagues is of
course reprehensible. But there is a kind of exploitation Retroactive increase of utilities
to be cherished: the exploitation of data. Meta-analysis allows us to learn more from our data
Many of us have been taught that it is technically im- and therefore has a unique ability to increase retroac-
proper and perhaps even immoral to analyze and reana- tively the benefits ofthe studies being summarized. The
lyze our data in many ways (i.e., to snoop around in the costs of time, attention, and effort ofthe human partici-
data). We were taught to test the prediction with one pants employed in the individual studies entering into the
particular preplanned test and take a result significant at meta-analysis are all more justified when their data enter
the .05 level as our reward for a life well-lived. Should the into a meta-analysis. That is because the meta-analysis

130 VOL. 5, NO. 3, MAY 1994


PSYCHOLOGICAL SCIENCE

Robert Rosenthal

increaises the utility of all the individual studies being get results significant at the .05 level. If three studies
summarized. Other costs of individual studies—costs of were carried out, there would be only one chance in eight
funding, supplies, space, investigator time and effort, and that all three studies would yield significant effects, even
other resources—are similarly more justified because the though we know the effect in nature is both real and
utility of individual studies is so increased by the bor- important in magnitude.
rowed strength obtained when information from more
studies is combined in a sophisticated way. Significance testing
The failure to employ meta-analytic procedures when Meta-analytic procedures and the meta-analytic
they could be used thus has ethical implications because worldview increase the utility of the individual study by
the opportunity to increase the benefits of past individual their implications for how and whether we do significance
studies has been forgone. In addition, when public funds testing. Good meta-analytic practice shows little interest
or other resources are employed by scientists to prepare in whether the results of an individual study were signif-
reviews of literatures, it is fair to ask whether those re- icant or not at any particular critical level. Rather than
sources are being used wisely or ethically. Now that we recording for a study whether it reached such a level, say,
know how to summarize literatures meta-analytically, it p = .05, two-tailed, meta-analysts record the actual level
seems hardly justified to review a quantitative literature of significance obtained. This is usually done not by re-
in the pre-meta-analytic, prequantitative manner. Money cording the p value but by recording the standard normal
that funds a traditional review is not available to fund a deviate that corresponds to the p value. Thus, a result
meta-analytic review. significant at the .05 level, one-tailed, in the predicted
It should be noted that a meta-analytic review is a direction is recorded as Z = + 1.645. If it had been sig-
good deal more than simply an overall estimate of the size nificant at the .05 level, one-tailed, but in the wrong or
of the basic effect. In particular, meta-analytic reviews unpredicted direction, it would be recorded as Z =
try to explain the inevitable variation in the size of the - 1.645 (i.e., with a minus sign to indicate that the result
effect obtained in different studies. is in the unpredicted direction). Signed normal deviates
Finally, it no longer seems acceptable to fund research are an informative characteristic of the result of a study
studies that claim to contribute to the resolution of con- presented in continuous rather than in dichotomous form.
troversy (e.g., does Treatment A work?) unless the in- Their use (a) increases the information value of a study,
vestigator has already conducted a meta-analysis to de- which (b) increases the utility of the study and, therefore,
termine whether there really is a controversy. A new (c) changes the cost-utility ratio and, hence, the ethical
experiment to learn whether psychotherapy works in value of the study.
general is manifestly not worth doing given the meta-
analytic results of Glass (1976) and his colleagues (Smith, Small effects are not small
Glass, & Miller, 1980). Until their meta-analytic work Another way in which meta-analysis increases re-
resolved the issue, the question of whether psychother- search utility and, therefore, the ethical justification of
apy worked in general was indeed controversial. It is research studies is by providing accurate estimates of
controversial no longer. effect sizes, effect sizes that can be of major importance
even when they are so small as to have p- = .00. Espe-
Pseudocontroversies cially when we have well-estimated effect sizes, it is valu-
Meta-analysis resolves controversies primarily be- able to assess their practical importance. The r^ method
cause it eliminates two common problems in the evalua- of effect size estimation does a poor job of this because
tion of replications. The first problem is the belief that an r^ of .00 can be associated with a treatment method
when one study obtains a significant effect and a replica- that reduces death rates by as much as 7 per 100 lives lost
tion does not, we have a failure to replicate. That belief (Rosenthal & Rubin, 1982). Once we are aware that effect
often turns out to be unfounded. A failure to replicate is size rs of .05, .10, and .20 (with r^s of .00, .01, and .04,
properly measured by the magnitude of difference be- respectively) may be associated with benefits equivalent
tween the effect sizes of the two studies. The second to saving 5, 10, or 20 lives per 100 people, we can more
problem is the belief that if there is a real effect in a accurately weigh the costs and utilities of undertaking
situation, each study of that situation will show a signif- any particular study.
icant effect. Actually, if the effect is quite substantial,
say, r = .24, and each study employs a sample size of, REPORTING PSYCHOLOGICAL RESEARCH
say, 64, the power level is .50 (Cohen, 1962, 1988;
Rosenthal, 1994; Sedlmeier & Gigerenzer, 1989). Given Misrepresentation of Findings
this situation, which is typical in psychology, there is Mother nature makes it hard enough to learn her se-
only one chance in four that two investigations will both crets, without the additional difficulty of being misled by

VOL. 5. NO. 3, MAY 1994 131


PSYCHOLOGICAL SCIENCE

Science and Ethics

the report offindingsthat were not found or by inferences tients treated by these six therapists, but not to any other
that are unfounded. Although all misrepresentations of therapists (Estes, 1991; Snedecor & Cochran, 1989).
findings are damaging to the progress of our science,
some are more obviously unethical than others.
Misrepresentation of Credit
Intentional misrepresentation I have been discussing misrepresentation of findings,
The most blatant intentional misrepresentation is the or the issue of "what was really found?" In the present
reporting of data that never were (Broad & Wade, 1982). section, the focus is on the issue of "who really found
That behavior, if detected, ends (or ought to end) the it?"
scientific career of the perpetrator. A somewhat more
subtle form of intentional misrepresentation occurs when Problems of authorship
investigators knowingly allocate to experimental or con- Because so many papers in psychology, and the sci-
trol conditions those participants whose responses are ences generally, are multiauthored, it seems inevitable
more likely to support the investigators' hypothesis. An- that there will be difficult problems of allocation of au-
other potential form of intentional misrepresentation oc- thorship credit. Who becomes a coauthor and who be-
curs when investigators record the participants' re- comes a footnote? Among the coauthors, who is assigned
sponses without being blind to the participants' treatment first, last, or any other serial position in the listing? Such
condition, or when research assistants record the partic- questions have been discussed in depth, and very general
ipants' responses knowing both the research hypothesis guidelines have been offered (APA, 1981, 1987; see also
and the participants' treatment condition. Of course, if Costa & Gatz, 1992), but it seems that we could profit
the research specifically notes the failure to run blind, from further empirical studies in which authors, editors,
there is no misrepresentation, but the design is unwise if referees, students, practitioners, and professors were
it could have been avoided. asked to allocate authorship credit to people performing
various functions in a scholarly enterprise.
Unintentional misrepresentation
Various errors in the process of data collection can Problems of priority
lead to unintentional misrepresentation. Recording er- Problems of authorship are usually problems existing
rors, computational errors, and data analytic errors can within research groups. Problems of priority are usually
all lead to inaccurate results that are inadvertent misrep- problems existing between research groups. A current
resentations (Broad & Wade, 1982; Rosenthal, 1966). We example of a priority problem is the evaluation of the
would not normally even think of them as constituting degree to which Robert C. Gallo and his colleagues were
ethical issues except for the fact that errors in the data guilty of "intellectual appropriation" of a French re-
decrease the utility of the research and thereby move the search group's virus that was used to develop a blood test
cost-utility ratio (which is used to justify the research on for HIV, the virus that is believed to cause AIDS (Palca,
ethical grounds) in the unfavorable direction. 1992). Priority problems also occur in psychology, where
Some cases of misrepresentation (usually uninten- the question is likely to be not who first produced a virus
tional) are more subtle. The use of causist language, dis- but rather who first produced a particular idea.
cussed earlier, is one example. Even more subtle is the
case of questionable generalizability.
Failing to Report or Publish
Questionable generalizability Sometimes the ethical question is not about the accu-
Suppose we want to compare the rapport-creating abil- racy of what was reported or how credit should be allo-
ity of female and male psychotherapists, as defined by cated for what was reported, but rather about what was
their patients' ratings. We have available three female not reported and why it was not reported. The two major
and three male therapists, to each of whom 10 patients forms of failure to report, or censoring, are self-censoring
were assigned at random. An analysis of variance yields and external censoring.
three sources of variance: sex of therapist (df = 1), ther-
apists nested within sex (df = 4), and patients nested Self-censoring
within therapists (df = 54). A common way to analyze Some self-censoring is admirable. When a study has
such data would be to divide the MS sex by MS patients been really badly done, it may be a service to the science
to get an F test. In such a case, we have treated therapists and to society to simply start over. Some self-censoring is
as fixed effects. When, in our report ofthe research, we done for admirable motives but seems wasteful of infor-
describeour results of, say,/^(l, 54) = 7.13, p = .01, we mation. For example, some researchers feel they should
have done a study that is generalizable only to other pa- not cite their own (or other people's) unpublished data

132 VOL. 5, NO. 3, MAY 1994


PSYCHOLOGICAL SCIENCE

Robert Rosenthal

because the data have not gone through peer review, I been that the ethical quality of our research is not inde-
would argue that such data should indeed be cited and pendent of the scientific quality of our research. Detailing
employed in meta-analytic computations as long as the some of the specifics of this general theme has, I hope,
data were well collected. served two functions. First, I hope it has comforted the
There are also less admirable reasons for self- affiicted by showing how we can simultaneously improve
censoring. Failing to report data that contradict one's the quality of our science and the quality of our ethics.
earlier research, or one's theory or one's values, is poor Second, and finally, I hope it has affiicted the comfort-
science and poor ethics. One can always find or invent able by reminding us that in the matter of improving our
reasons why a study that came out unfavorably should science and our ethics, there are miles to go before we
not be reported: The subjects were just starting the sleep.
course; the subjects were about to have an exam; the
subjects had just had an exam; the subjects were just Acknowledgments—This article is based on an address invited by
finishing the course; and so on, A good general policy— the Board of Scientific Affairs of the American Psychological As-
good for science and for its integrity—is to report all sociation (APA) and presented at the annual meeting of APA, Wash-
ington, D,C,, August 15, 1992, Preparation of this paper was sup-
results shedding light on the original hypothesis or pro- ported in part by the Spencer Foundation; the content is solely the
viding data that might be of use to other investigators. responsibility of the author, I thank Elizabeth Baldwin, Peter
Blanck, and Ralph Rosnow for their encouragement and support.
There is no denying that some results are more thrill-
ing than others. If our new treatment procedure prevents
or cures mental illness or physical illness, that fact may
be worth more journal space or space in more prestigious REFERENCES
journals than the result that our new treatment procedure
does no good whatever. But that less thrilling finding American Psychological Association, (1981), Ethical principles of psychologists,
American Psychologist. 36. 633-638,
should also be reported and made retrievable by other American Psychological Association, (1982), Ethical principles in the conduct of
researchers who may need to know that finding. research with human participants. Washington, DC; Author,
American Psychological Association, (1987), Casebook on ethical principles of
psychologists. Washington, DC: Author,
External censoring Barnett, V., & Lewis, T, (1978), Outliers in statistical data. New York: Wiley,
Blanck, P,D,, Bellack, A,S,, Rosnow, R,L,, Rotheram-Borus, M,J,, & Schooler,
Both the progress and the slowing of progress in sci- N,R, (1992), Scientific rewards and conflicts of ethical choices in human
ence depend on external censoring. It seems likely that subjects research, American Psychologist. 47. 959,-965,
Broad, W,, & Wade, N, (1982), Betrayers of the truth. New York: Simon and
sciences would be more chaotic than they are were it not Schuster,
for the censorship exercised by peers: by editors, by re- Cohen, J, (1962), The statistical power of abnormal-social psychological research:
viewers, and by program committees. All these gatekeep- A review. Journal of Abnormal and Social Psychology. 65. 145-153,
Cohen, J, (1988), Statistical power analysis for the behavioral sciences (2nd ed,),
ers help to keep the really bad science from clogging the Hillsdale, NJ: Eribaum,
pipelines of mainstream journals. Cook, T,D,, & Campbell, D.T, (1979), Quasi-experimentation: Design and anal-
ysis issues for field settings. Chicago: Rand McNally,
There are two major bases for external censorship. Cooper, H,M,, & Rosenthal, R, (1980), Statistical versus traditional procedures
The first is evaluation of the methodology employed in a for summarizing research findings. Psychological Bulletin. 87. 442-449,
Costa, M,M,, & Gatz, M, (1992), Determination of authorship credit in published
research study, I strongly favor such external censorship. dissertations. Psychological Science. 3. 354-357,
If the study is truly terrible, it probably should not be Estes, W,K, (1991), Statistical models in behavioral research. Hillsdale, NJ:
Eribaum,
reported. Glass, G,V, (1976), Primary, secondary, and meta-analysis of research. Educa-
The second major basis for external censorship is eval- tional Researcher. 5. 3-8,
Glass, G,V, McGaw, B,, & Smith, M,L, (1981), Meta-analysis in social research.
uation of the results. In my 35 years in psychology, I have Beverly Hills, CA: Sage,
often seen or heard it said of a study that "those results Grisso, T,, Baldwin, E,, Blanck, P,D,, Rotheram-Borus, M,J,, Schooler, N,R,, &
aren't possible" or "those results make no sense," Often Thompson, T, (1991), Standards in research: APA's mechanism for moni-
toring the challenges, American Psychologist. 46. 758-766,
when I have looked at such studies, I have agreed that the Hedges, L,V,, & Olkin, I, 11985), Statistical methods for meta-analysis. New
results are indeed implausible. However, that is a poor York: Academic Press.
Howell, D,C, (1992), Statistical methods for psychology (3rd ed,), Boston: PWS-
basis on which to censor the results. Censoring or sup- Kent,
pressing results we do not like or do not believe to have Kaplan, J, (1988), The use of animals in research. Science. 242. 839-840,
Kelman, H,C, (1968), A time to speak: On human values and social research. San
high prior probability is bad science and bad ethics Francisco: Jossey-Bass,
(Rosenthal, 1975, 1994), Mosteller, F,, & Bush, R,R, (1954), Selected quantitative techniques. In G,
Lindzey (Ed,), Handbook of social psychology: Vol. I. Theory and method
(pp, 289-334), Cambridge, MA: Addison-Wesley,
Paica, J, (1992), "Verdicts" are in on the Gallo probe. Science. 256. 735-738,
CONCLUSION Rosenthal, R, (1966), Experimenter effects in behavioral research. New York:
Appleton-Century-Crofts,
The purpose of this article has been to discuss some Rosenthal, R, (1975), On balanced presentation of controversy, American Psy-
scientific and ethical issues in conducting, analyzing, and chologist. 30. 937-938,
Rosenthal, R, (1978), How often are our numbers wrong? American Psychologist.
reporting psychological research, A central theme has 33. 1005-1008,

VOL, 5, NO, 3, MAY 1994 133


PSYCHOLOGICAL SCIENCE

Science and Ethics

Rosenthal, R. (1991). Meta-analytic procedures for sociat research (rev. ed.)- Rosenthal, R., & Rubin, D.B. (1982). A simple, general purpose display of mag-
Newbury Park, CA: Sage. nitude of experimental effect. Journal of Educational Psychology, 74, 166-
Rosenthal, R. (1994). On being one's own case study: Experimenter effects in 169.
behavioral research—30 years later. In W.R. Shadish & S. Fuller (Eds.), Rosenthal, R., & Rubin, D.B. (1984). Multiple contrasts and ordered Bonferroni
The sociat psychology of science (pp. 214-229). New York: Guilford Press. proceduTes. Journal of Educational Psychology, 76, 1028-1034.
Rosenthal, R., & Jacobson, L. (1968). Pygmalion in the classroom. New York: Rosnow, R.L. (1990). Teaching research ethics through role-play and discussion.
Holt, Rinehart & Winston. Teaching of Psychology, 17, 179-181.
Rosenthal, R., & Jacobson, L. (1992). Pygmalion in the classroom (expanded
ed.). New York: Irvington. Rosnow, R.L., & Rosenthal, R. (1993). Beginning behavioral research: A con-
Rosenthal, R., & Rosnow, R.L. (1975). The volunteer subject. New York: Wiley. ceptual primer. New York: Macmillan.
Rosenthal, R., & Rosnow, R.L. (1984). Applying Hamlet's question to the ethical Sedlmeier, P., & Gigerenzer, G. (1989). Do studies of statistical power have an
conduct of research: A conceptual addendum. American Psychologist, 39, effect on the power of studies? Psychological Bulletin, 105, 309-316.
561-563. Smith, M.L., Glass, G.V. & Miller, T.I. (1980). The benefits of psychotherapy.
Rosenthal, R., & Rosnow, R.L. (1991). Essentials of behavioral research: Meth- Baltimore: Johns Hopkins University Press.
ods and data analysis (2nd ed.). New York: McGraw-Hill.
Rosenthal, R., & Rubin, D.B. (1971). Pygmalion reaffirmed. In J.D. Elashoff & Snedecor, G.W., & Cochran, W.G. (1989). Statistical methods (8th ed.). Ames:
R.E. Snow, Pygmalion reconsidered (pp. 139-155). Worthington, OH: CA. Iowa State University Press.
Jones. Tukey, J.W. (1977). Exploratory data analysis. Reading, MA: Addison-Wesley.

134 VOL. 5, NO. 3, MAY 1994

You might also like