You are on page 1of 5

Dear Reviewer 2: Go F’ Yourself

David A. M. Peterson, Iowa State University

Objectives. The objective of this study was to empirically test the wide belief that Reviewer #2 is a
uniquely poor reviewer. Methods. The test involved analyzing the reviewer database from Political
Behavior. There are two main tests. First, the reviewer’s categorical evaluation of the manuscript
was compared by reviewer number. Second, the data were analyzed to test if Reviewer #2 was
disproportionately likely to be more than one category below the mean of the other reviewers of
the manuscript. Results. There is no evidence that Reviewer #2 is either more negative about the
manuscript or out of line with the other reviewers. There is, however, evidence that Reviewer #3
is more likely to be more than one category below the other reviewers. Conclusions. Reviewer #2
is not the problem. Reviewer #3 is. In fact, he is such a bad actor that he even gets the unwitting
Reviewer #2 blamed for his bad behavior.

Anyone who has ever submitted a paper to a peer-reviewed outlet knows the reviewers
can, occasionally, be unpleasant. While rejection always stings, the belief that a reviewer
has either completely missed the point of the manuscript, been overtly hostile in his or her
review, or simply held the author to an impossible standard is vexing. The source of this
frustration has seemingly become personified in the identity of a single person—Reviewer
2. He (and it is always assumed to be a he) is embodiment of all that we hate about other
scholars. Reviewer 2 is dismissive of other people’s work, lazy, belligerent, and smug.
The purpose of this article is to test a very specific claim about Reviewer 2: he is the
reviewer who holds us back. Using the database of reviewer responses from four years of
Political Behavior, I empirically test if reviewers who are assigned number 2 are systemati-
cally more negative and more likely to be out of line from the other reviews a manuscript
received. I assess this hypothesis in two ways. First, I compare the ordinal score each re-
viewer gives the manuscript. Second, I develop an original measure of “being Reviewer 2.”
In this specific case, being Reviewer 2 is defined as giving a score to the manuscript that is
more than one category below the mean ranking of the other two reviewers. The results
suggest that Reviewer 2 is no more likely to give a negative review of a manuscript or to be
uniquely critical.

Why Reviewer 2?

The academic study of the review process has flourished in recent years. Not surprisingly,
a large number of scholars whose careers depend on the peer review have examined the
patterns of the review process. Scholars have examined the transparency of the review
process (Wicherts, 2016), interrater reliability of journal peer reviews (Bornmann, Mutz,
and Daniel, 2010), and the validity of the reviews compared to the eventual citation counts

Direct correspondence to David A.M. Peterson, Department of Political Science, Iowa State University,
527 Farm House Lane, Ames, IA 50011 daveamp@iastate.edu.
SOCIAL SCIENCE QUARTERLY

C 2020 by the Southwestern Social Science Association
DOI: 10.1111/ssqu.12824
2 Social Science Quarterly
of published articles (Baethge, Franklin, and Mertens, 2013). Scholars have sent spoof
articles to open-access journals to question their review practices (Bohannon, 2013). In
one survey, less than half of the researchers felt that the peer-review process is fair, scientific,
or transparent (Ho et al., 2013). Thankfully, political scientists tend to have a much more
positive view of peer review (Djupe, 2015).
There is one notable omission in this literature: reviewer number. To the best of my
knowledge, there has been no systematic study to determine if reviewers are likely to pro-
vide more positive or negative reviews based on the number they are assigned. On the
one hand, the number of the reviewer seems like a trivial difference. Most people are not
aware of their reviewer number when they review. Additionally, the process of assigning
reviewer numbers is not what one might think. For Political Behavior, the numbers are not
the order that the editor thinks of the reviewer or enters the name in the system. Instead,
the number is assigned by the order in which the reviewer responds to the request. Because
the Editorial Manager system sends out multiple review requests simultaneously, there is
nothing in the editor’s decision making that influences the assignment of the reviewer
number.
If there is little reason to expect Reviewer 2 to be uniquely terrible, why explore this em-
pirically? The main motivation for this article is that the broader community has decided
that Reviewer 2 is a monster. A Google search for “Reviewer 2” produces the interdisci-
plinary Facebook group “Reviewer 2 Must Be Stopped!” (which has over 9,000 members),
a blog entry entitled “How Not to Be Reviewer #2,” and countless images combining al-
most every visual meme imaginable. In academia, it is fair to say that Reviewer 2 is the
ultimate boogeyman. He is Pennywise the Clown, combined with el chupacabra, wrapped
in the Blair Witch. Sjoberg (2016) summarizes the advice on avoiding being Reviewer
2 as: “But when in doubt, use the ’don’t be an asshole’ rule—either when reviewing, or
more generally in professional interaction. Thinking of it in those terms might make less
’Reviewer 2’ like-sorts in the world.”
In sum, even if there is no reason to expect the person assigned the number 2 to be terri-
ble, it is widely believed to be the case. While many will suggest that the use of the number
2 is arbitrary to describe Reviewer 2, it is worth empirically investigating if Reviewer 2 is,
in fact, the problem.

Data

To test the hypothesis that Reviewer 2 is systematically more negative than the other
reviewers, I collected the data from 1,323 manuscripts submitted to Political Behavior.1 For
this journal, reviewers are asked to provide comments to the editor and the author. They
are also asked to place the manuscript into one of four categories: Accept as Is, Needs
Minor Revisions, Needs Major Revisions, and Reject. These summary judgments were
used to measure the assessment of the reviewer. Each of the evaluations was converted to a
categorical score. “Accept” was coded as a 1, “Minor Revisions” as a 2, “Major Revisions”
as a 3, and “Reject” as a 4. This score serves as the first dependent variable in the analyses.
The overall score given to a manuscript, however, does not fully capture the concept
of “Reviewer 2.” If every review is negative, then authors will be less likely to single out
one of the reviewers for their malice. The result that really galls authors is when two of

1
Unfortunately, because the data are from a publisher’s reviewer database, they are proprietary and cannot
be made public.
Empirical Test of Reviewer #2 3
TABLE 1
Reviewer Evaluation Predicted by Reviewer Number (Ordered Logit, Standard
Errors Clustered by Manuscript)

Reviewer Number Coefficient Standard Error

Two −0.14 0.08


Three −0.09 0.08
Four −0.31 0.17
Five −0.28 0.46
τ1 −4.49 0.16
τ2 −1.79 0.07
τ3 −0.10 0.06
NOTE: ∗ p < 0.05; N = 3,426.

the reviewers are positive and that one reviewer is quite negative. It is in this case that a
reviewer is really Reviewer 2. If it were not for that review, we believe, the editor would
have given the manuscript a revise and resubmit.
I measure “being Reviewer 2” as a dichotomy. A reviewer is considered to exhibit this
behavior when he or she is at least one category more negative than the average of the other
two reviews. It is most frustrating when two reviewers score the manuscript at “minor revi-
sions” and the other reviewer says “reject.” In the second set of analyses, this dichotomous
variable, coded a “1” if the reviewer is one or more categories below the average of the
others and "0" otherwise, is the dependent variable.
The tests of the hypothesis are relatively straightforward. The first analysis, the one
focusing only on the category given to the manuscript, uses an ordered logit model where
the reviewer numbers are the independent variables and the standard errors are clustered
based on the manuscript reviewed. In the second analysis, the data are modeled as a logit,
also with the reviewer number as the independent variable and the standard errors clustered
based on the manuscript.

Results

The majority of reviews in the data are negative, with slightly over half recommending
rejection. The question addressed in this article, however, is if there is something specific
about Reviewer 2 that makes him worse for authors. Table 1 presents the results of an
ordered logit model predicting the reviewer ranking as a function of the reviewer number.
As a reminder, higher scores in these data are more negative evaluations.
The results presented in Table 1 test if the evaluation given by the reviewer is more
negative than the average review from the omitted reviewer (Reviewer 1). There are no
significant differences. The p-values on the differences between Reviewer 1 and Reviewers 2
and 4 are each between 0.05 and 0.10, so some may interpret this as support for differences
between these reviewers. In each case, the effects are negative, meaning that Reviewer 2
and Reviewer 4 are less negative than Reviewer 1. Post hoc tests on the equality of the
coefficients in Table 1 indicate that there are no statistically discernable differences in the
evaluations of these reviewers.
The simple reviewer evaluation may not fully capture the negative behavior of Reviewer
2. It is possible, for instance, that the there are enough low-quality manuscripts that re-
sult in Reviewers 1 and 3 also giving very negative evaluations. Perhaps the real problem
4 Social Science Quarterly
TABLE 2
Reviewer 2 Behavior Predicted by Reviewer Number (Logit, Standard
Errors Clustered by Manuscript)

Reviewer Number Coefficient Standard Error

Two 0.02 0.09


Three 0.20 0.09∗
Constant −0.85 0.06
NOTE: ∗ p < 0.05; N = 3,272.

of Reviewer 2 is that he is an outlier and that can only be seen when the manuscript
is strong enough to get positive evaluations from the other reviewers. This is when
Reviewer 2 crushes your hopes. Table 2 presents the model that predicts this behavior
based on reviewer number. Given the small number of reviewers in the data from review-
ers with numbers larger than 3, these results only compare Reviewers 1 through 3.
Surprisingly, there is an effect of reviewer number on the probability that the reviewer’s
evaluation is more than one category worse than the other reviewers. Even more surprising
is that the reviewer who is most likely to exhibit this behavior is Reviewer 3. A test compar-
ing the coefficients for Reviewers 2 and 3 indicates that this is also a significant difference.
Reviewer 3, at least for this journal, is the reviewer most likely to be the one who stymies
authors.

Conclusion

The hatred of Reviewer 2 is nearly universal. If an author is able to solve the Reviewer
2 problem, we tend to believe, he or she will be successful in getting his or her work
published. The results presented here, however, suggest that all of this malice is misdirected
at Reviewer 2. Reviewer 2 appears to be no more likely than other reviewers to give a
negative evaluation or to deviate substantially from the other reviewers. I do not want to
say that Reviewer 2 is blameless. There may be significant differences in the content of the
reviews that would justify Reviewer 2’s reputation. He may still exhibit dismissiveness and
a nasty tone. This tone, however, does not seem to translate to the overall evaluation.
The same cannot be said of Reviewer 3. The results from the logit model presented above
suggest that Reviewer 3 is the reviewer who is likely to be the negative outlier. He or she
is statistically significantly more likely to be the reviewer, based on the overall evaluation,
who dooms a manuscript. What is worse is that Reviewer 3 seems to be a crafty cretin. He
or she is able to do this while Reviewer 2 takes the blame. This seems like it is the ultimate
jerk move. Hopefully, the results presented here will begin to deflect some of the possibly
undeserved anger from poor Reviewer 2 and focus it where it seems to belong.
One final note on these results. As noted above, the reviewer number assigned is based
on the order in which the reviewers accept the review request. Why might this produce the
result presented here. There are likely two rival mechanisms behind this effect. The first
option is that the delay in accepting the review request is a signal of low compliance with
professional expectations. The reviewer does not accept the request in a timely fashion and
potentially waits until the editor sends an annoying reminder of the review request. Given
this process, should anyone be surprised that the most negative and critical reviewer is the
one who answers the request most slowly?
Empirical Test of Reviewer #2 5
The alternative mechanism is that Reviewer 3 is busier than Reviewers 1 and 2. He or
she may have additional professional responsibilities and be a more active researcher. As
a result, he or she does not provide a more critical review out of malice, but instead is
likely to find the flaws in the research that the other reviewers missed. Given the data used
here, it is impossible to separate the mechanisms. I presume that most authors, when they
receive this review pattern, will trust that the negative reviewer is the “better” reviewer.

REFERENCES

Baethge, Christopher, Jeremy Franklin, and Stephan Mertens. 2013. “Substantial Agreement of Referee Rec-
ommendations at a General Medical Journal—A Peer Review Evaluation at Deutsches Arzteblatt International.”
PLoS One 8(5):e61401.
Bohannon, John. 2013. “Who’s Afraid of Peer Review?” Science 342(6154):60–65.
Bornmann, Lutz, Rüdiger Mutz, and Hans-Dieter Daniel. 2010. “A Reliability-Generalization Study of Jour-
nal Peer Reviews: A Multilevel Meta-Analysis of Inter-Rater Reliability and its Determinants.” PloS One
5(1):e14331.
Djupe, Paul A. 2015. “Peer Reviewing in Political Science: New Survey Results.” Political Science & Politics
48(2):346–52.
Ho, Roger Chun-Man, Kwok-Kei Mak, Ren Tao, Yanxia Lu, Jeffrey R. Day, and Fang Pan. 2013. “Views
on the Peer Review System of Biomedical Journals: An Online Survey of Academics from High-Ranking
Universities.” BMC Medical Research Methodology 13(1):1.
Sjoberg, Laura. 2016. Don’t Be Reviewer #2. Available at http://relationsinternational.com/dont-be-
reviewer-2/.
Wicherts, Jelte M. 2016. “Peer Review Quality and Transparency of the Peer-Review Process in Open Access
and Subscription Journals.” PloS one 11(1):e0147913.

You might also like