You are on page 1of 53

A commentary on some articles by Dr.

Nicolas Guéguen
Nicholas J. L. Brown, University of Groningen
James A. J. Heathers, Poznań University of Medical Sciences1

Introduction
In this commentary, we review a number of published articles by Dr. Nicolas Guéguen. We
conclude that they contain a number of methodological and statistical deficiencies that cast doubt
on the validity of the results. In many cases, we can find no reasonable explanation for how the
data might have been collected in order to give rise to the published results.
Dr. Guéguen’s work was first brought to our attention when a colleague on Twitter
mentioned Guéguen’s (2015a) study on hairstyle and willingness to help. We examined the
article and discovered a number of puzzling patterns in the data, as well as an implausibly large
effect size. We therefore started to look at other articles by Dr. Guéguen, and found a number of
other troubling issues.
Here, we comment on 10 articles from Dr. Guéguen’s laboratory. Our selection criteria
were:
1. Recent publication: We have not included any articles published before 2012 in this
commentary; however, this should not be taken to mean that we do not consider that there are
further issues of concern in several of Dr. Guéguen’s earlier articles.
2. Sole authorship: We have included only articles where Dr. Guéguen is named as the sole
author. Again, this does not mean that we consider that none of the articles that Dr. Guéguen has
published with one or more co-authors are without problems.
3. Variety of problems demonstrated. We could probably have included at least another
eight problematic articles on the basis of the above two criteria, and at least 10 more if we had
gone further back in time before 2012. However, we believe that the selection of articles in this
commentary is more than adequate to demonstrate the variety of problems that we found; to add
more would have increased the length of this already lengthy document (and delayed its
distribution) without adding much more to the overall conclusions to be drawn.

The problems we have identified with this selection of articles fall into the following categories:

1
Now at Northeastern University, Boston, MA

1
1. Impossible means and/or standard deviations (SDs). In several of the articles that we
have reviewed, Dr. Guéguen reported means and SDs (typically of scores from Likert-type
scales) that cannot be obtained by any combination of the possible responses on the scale being
used, given the sample size. Such errors often affect multiple means and SDs in the same table,
making simple transcription errors unlikely as an explanation.
2. Implausible score distributions. Continuing from the previous point, in several articles
the SDs reported for scores resulting from Likert-type scales are so small that the majority of
participants must have replied with exactly the same score, even though this score lies towards
the middle of the scale. As a minimum, this implies a very large degree of leptokurtosis, but in
practical terms, it suggests that almost every participant had the same (moderate) opinion about
the subject being tested. This effect is usually seen in both the control and intervention
conditions; in the intervention condition, the mean is typically shifted (relative to baseline) by a
large amount, but the small SDs still determines that almost nobody can have endorsed the
highest (or lowest) items on the scale.
3. Implausible fieldwork scenarios. Several of Dr. Guéguen’s articles describe extensive
field experiments that took place over several days. While there is typically an extensive
discussion of the ways in which possible extraneous variables were eliminated, many practical
aspects of recruiting participants by stopping them as they went about their daily business were
apparently ignored.
4. Implausible response rates. Dr. Guéguen’s unnamed confederates seem to have
experienced very little difficulty in stopping people at random in public and getting them to
agree to take part in psychological research. In several field studies, it was reported that 100% of
the participants agreed to answer the confederate’s questions, or to take part in a debriefing.
5. Implausible effect sizes. The average effect size found in social psychology is around
r = .21 (Richard, Bond, & Stokes-Zoota, 2003), corresponding to a Cohen’s d of .43. Because of
publication bias, even this is probably an overestimate of the true effect size of most
experimental manipulations (Ioannidis, 2005). Yet, in several of the articles reviewed here, Dr.
Guéguen found effect sizes in excess of d = 1.0 for what appear to be very minor manipulations2.

2
Where necessary, we converted chi-square statistics to Cohen’s d values with the online effect size
calculator at http://www.campbellcollaboration.org/escalc/html/EffectSizeCalculator-SMD17.php.

2
All except one of the articles reviewed in this commentary have as their principal subject
matter some aspect of sexual behavior, generally to do with attempted sexual contact, or other
matters related to sexual attraction, between men (of various ages) and young women. This is, of
course, a legitimate area of scientific study; thus, while we have our own views on some of Dr.
Guéguen’s apparent presumptions about human sexuality, in this commentary we take no
position on the general plausibility of his principal hypotheses, and we have avoided
commenting on the introductory and discussion sections of his articles wherever possible.
As far as possible, we have also tried to avoid commenting on the reported sexual and/or
“courtship” behaviors shown, at least to the extent that these are not the result of the
experimental manipulation. If, on occasion, we have failed in this aim, it is because a claim about
the (non-manipulated) behaviors strikes us as sufficiently implausible as to require comment.
In writing this commentary, we have generally tried to keep the reviews of each article as
self-contained as possible, although there are some cross-references. Sometimes this might give
the impression that certain points have been covered before, perhaps more than once. Where we
had to choose between rigor in our analysis and readability, we opted for the first of these
wherever possible.
We first documented our concerns with these articles in December 2015, at which point
we sent our findings and questions to the French Psychological Society (SFP). The SFP’s
research department decided that there was a clear case for Dr. Guéguen to answer, and for the
next 18 months they repeatedly tried to get him to provide information about these studies.
However, they were unsuccessful: Dr. Guéguen repeatedly stalled, invoking family
circumstances as the reason for his inability to respond (although these circumstances did not
apparently prevent him publishing numerous articles throughout the entire period in question).
At one point Dr. Guéguen sent a thick envelope containing details of 25 field studies that had
been carried out by his undergraduates, none of which had any relevance to our questions. In the
summer of 2017, the SFP informed us that they were unable to make any progress and that they
were stepping down from their role as intermediaries. We have therefore decided to make our
findings and questions public, in order to bring this matter to the attention of the scientific
community.
We would be delighted to discover that all the studies took place exactly as described,
that any statistical errors are due to problems with transcription or our own oversights, and that

3
Dr. Guéguen’s demonstrations of large measurable effects as consequences of small behavioral
manipulations are replicable.

Nick Brown
James Heathers
December 2015–December 2017

4
1. Guéguen, N. (2015a). Women’s hairstyle and men’s behavior: A field experiment.
Scandinavian Journal of Psychology, 56, 637–640. http://dx.doi.org/10.1111/sjop.12253

Article summary: A female confederate with long hair in one of three hairstyles (“natural,”
“ponytail,” or “bun”) dropped an object while walking in the street. Participants were male or
female passers-by. Male (but not female) participants were more likely to help the confederate to
retrieve her lost object when her hair was in the “natural” condition.

When we first read this article, our immediate focus of concern was the very large effect
sizes that were reported. For example, the difference between men and women participants in the
“Natural” hairstyle condition corresponds to a Cohen’s d of 2.44, which would constitute a
remarkably large effect in any form of science, let alone social psychology.
On inspecting Guéguen’s (2015a) Table 1 more closely, we observed a curious pattern.
The last digit of all of the means is zero, whereas it might be expected that there would be some
3s and 7s in the second decimal place because the total scores (all integers) were divided by the
number of participants per cell, namely 30. (The article did not explicitly mention that there were
exactly 30 participants per cell, but Dr. Guéguen confirmed this fact in an e-mail when he
supplied us with the dataset.) At first we thought that this pattern might be due to a numerical
formatting problem—for example, perhaps the numbers had been rounded to one decimal place,
then expanded to two decimal places for display purposes—but the non-zero final digits of the
standard deviations (SDs) and row totals are not consistent with this. Assuming a uniform
distribution of scores, the chance of all six means ending in zero in this way is (1/3)6 = .0014.
We next examined the means and SDs in each cell. We discovered that in all five distinct
cases (the cells for the “Men—Ponytail” and “Men—Bun” conditions are identical, reducing the
number of unique combinations of mean and SD to five), there is only one possible combination
of scores of 1, 2, and 3 that can give the means and SDs shown in Guéguen’s (2015a) Table 1.
These unique combinations are shown in our Table 1:

5
Table 1. Per-cell scores for participants in Guéguen’s (2015a) study.
Participant Sex— Scores Mean SD
Hairstyle Condition
Men—Natural 3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,2,2,2,2,2,2 2.80 0.41
Men—Ponytail/Bun 3,3,3,3,3,3,2,2,2,2,2,2,2,2,2,2,2,2,1,1,1,1,1,1,1,1,1,1,1,1 1.80 0.76
Women—Natural 2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,1,1,1,1,1,1 1.80 0.41
Women—Ponytail 3,3,3,3,3,3,2,2,2,2,2,2,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1 1.60 0.81
Women—Bun 2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,1,1,1,1,1,1,1,1,1,1,1,1 1.60 0.50

For each condition (participant sex–hairstyle), we exhaustively verified that no other


combination of scores gives the same mean and SD. We further verified our analysis by using
these derived data to reproduce the ANOVA results reported by Guéguen (2015a). We found
exactly the same results as were reported in the article, with only one exception: On p. 638, the
article reports that “the overall effect of hairstyle on female participants appeared non significant
[sic] F(2, 87) = 1.11, p = 0.33),” whereas we obtained a rounded value of 1.12 for this F-statistic.
Since the exact value is 1.115385, we presume that the discrepancy is due to a different choice of
rounding method. Otherwise, we reproduced all the other F, p, and eta-squared values exactly,
which suggested to us that we had correctly reconstructed Guéguen’s dataset.
On November 17, 2015, we wrote to Dr. Guéguen to ask if he would send us his dataset,
and also to confirm that the sample size was exactly 30 per cell. He replied, enclosing the dataset
in Excel format, and confirmed the sample size. An examination of the dataset that he supplied
showed that it was exactly as we had predicted (and shown in our Table 1). Thus, we had been
able to reconstruct the entire dataset from the table of summary statistics in the article.
It is not difficult to see that these data contain multiple examples of a remarkably regular
distribution of scores. Specifically, in every condition (participant sex–hairstyle), each possible
individual score (1, 2, or 3) occurs exactly 0, 6, 12, 18, or 24 times. No other counts of individual
scores are present. Assuming for simplicity a uniform distribution of scores, we used Monte
Carlo simulations to determine that the fraction of cases in which all three scores are divisible by
6 is approximately 0.0246. If we assume that the distribution of remainders in the first condition
is arbitrary (i.e., we are only calculating the chances of that pattern occurring again by chance in

6
the five subsequent combinations of participant sex and hairstyle), the chances of this happening
randomly are (0.0246)5, or 9.01 × 10−9 (1 chance in 110 million).
As noted earlier, the apparent effect size of this study is extremely large; the numbers in
the dataset allow us to reconstruct exactly what is supposed to have happened in the field study.
The 24 scores of 3 and six scores of 2 in the “Men–Natural” condition mean that, when walking
behind the confederate with a “natural” hairstyle who dropped her glove, 100% of men
intervened; of those, 80% picked the glove up, and the remaining 20% pointed out to the
confederate that she had dropped it. In contrast, the 24 scores of 2 and six scores of 1 in the
“Women–Natural” condition mean that, when the person walking behind the confederate with a
“natural” hairstyle was a woman, 80% of them reacted to the glove being dropped, but every
single one of those women chose merely to point this fact out to the glove’s owner; not one
woman out of 24 picked up the dropped glove. Meanwhile, in the other hairstyle conditions, men
and women hardly differed in their behavior; for example, when the confederate wore her hair in
a ponytail, six men and six women returned the glove to her. The idea that only a simple change
in the confederate’s hairstyle could cause such extreme differences between men and women’s
reaction to the dropped glove seems to us to defy common sense.

We requested Dr. Guéguen to provide us with the following:

 Contact details of the confederate who can, first, confirm that the results stated in the
article were those actually observed during the experiment; second, confirm the accuracy
of the methodological description; and third, describe any problems that occurred during
data collection—for example, whether any participants were excluded because the
process of “walking in the same direction as the participant about three meters away”
(p. 638) proved to be non-trivial.
 Any other evidence that the study took place as described, such as contemporaneous e-
mail exchanges, planning documents, copies of the forms on which the confederate
recorded the events that took place, a written field report from the confederate, etc.

7
2. Guéguen, N. (2015b). High heels increase women’s attractiveness. Archives of Sexual
Behavior, 44, 2227–2235. http://dx.doi.org/10.1007/s10508-014-0422-z

Article summary: Participants were males (and, in Studies 2 and 3, females) from the general
population. A female confederate was dressed identically in all conditions except that she wore
shoes that had high, medium, or no heels. Male (but not, where relevant, female) participants’
willingness to accept the confederate’s request to fill in a questionnaire (Studies 1 and 2),
willingness to help her to recover a dropped glove (Study 3), or quickness to approach her as she
sat alone at a bar (Study 4) were all greatest when the confederate was wearing high heels, lower
when she was wearing medium heels, and lowest when she was wearing flat shoes.

1. Guéguen (2015b) reported that, in Study 2, “[t]o prevent possible multiple solicitations of
the same pedestrian, the study was conducted at the same time in each town by the different
confederates, and there was a minimum distance of 1km between each confederate” (pp. 2229–
2230). However, the repeated solicitation of the same participant would appear to pose only a
minor danger to the integrity of the study; it seems very likely that a person who was asked to
participate twice would point this out on the second occasion, thus enabling his or her second
response to be eliminated from consideration. In several other studies that we discuss in this
commentary (Guéguen, 2012b, 2012c, 2013a), no mention was made of any precautions to
prevent duplicate recruitment of participants, nor were any cases of such duplicate recruitment
reported, although our analysis of those studies shows that multiple solicitations of the same
participant were almost inevitable in those studies. The description of the precautions taken in
the study considered here (Guéguen, 2015b) to avoid such an eventuality suggests that simply
dropping duplicates would be highly inconvenient, although the reasons for that are not clear to
us. However, assuming that such duplicates are, indeed, a problem for Guéguen’s preferred
methods of analysis, we wonder why he apparently only started to address this problem in this
more recent study.
2. In Study 3, the confederates were instructed to change their shoes after testing 20
participants (10 men and 10 women). However, given that the total number of participants was
360 and there were four confederates, this implies that each confederate was instructed to test
exactly 90 participants (Guéguen, 2015b, did not report how the confederates were to coordinate

8
their individual numbers of recruited participants otherwise). Given that 90 is not divisible by 20,
this means that the for the last series of 10 participants per confederate, two of the four
confederates would be wearing shoes of one type and the other two would each be wearing one
of the remaining two types. Thus, the number of participants tested for the three heel conditions
would not be (120, 120, 120) but (130, 120, 110). This is illustrated in our Table 2. However,
Guéguen reported (in his Table 2) that there were 120 participants per heel condition; he did not
appear to notice the problem which would necessarily arise if his confederates followed his
instructions precisely, assuming that these instructions were correctly reported.

Table 2. Possible cycle of heel heights in Guéguen’s (2015b) Study 3. The blue numbers add up
to 110, the black to 120, and the red to 130.
Confederate 1 2 3 4
Round 1 20 high 20 medium 20 low 20 high
Round 2 20 medium 20 low 20 high 20 medium
Round 3 20 low 20 high 20 medium 20 low
Round 4 20 high 20 medium 20 low 20 high
Round 5 10 medium 10 low 10 high 10 medium

3. Also in Study 3, Guéguen (2015b) reported that “[r]esponses were recorded as help if the
participant warned the confederate within 10 s after losing the object” (p. 2230). However, he
provided no description of how this precise period of 10 seconds was measured. There was no
mention of the presence of other confederates as observers, or of the female confederate being
equipped with a stopwatch.
4. Study 4 took place in three different bars on twelve different nights. The same female
confederate (and, presumably, the same two male accomplices) thus spent a total of six hours in
each bar (twelve nights, 30 minutes per bar). It might be imagined that the staff and/or the
regular customers of the bar might have noticed what was going on, as a succession of men
attempted to make contact with the female confederate, only to be told that her friend would be
arriving shortly (which, indeed, transpired). Yet, Guéguen (2015b) described no precautions that

9
might have been taken to deal with this issue. Even if the management of the bar had given their
approval for the study, and the staff had been made aware of what was happening—neither of
which was mentioned by Guéguen—there would seem to be no way to control for the behavior,
or alertness, of multiple single young men in a bar, especially at 11:30 pm on a Saturday. The
whole scenario sounds contrived, to say the least.
5. In the description for Study 4, Guéguen (2015b) stated that “[i]f there was no male
contact after 30min, the female confederate was instructed to leave the bar” (p. 2232). However,
there was no report of the number of cases where no contact was made. Such information would
presumably be of crucial importance, depending on the height of the heels being worn by the
confederate. We tentatively assume that contact was made in every case, but why was this not
reported?
6. We calculated the effect sizes of the main effects reported by Guéguen (2015b) from his
chi-square statistics (Studies 1–3) and his table of means and SDs (Study 4, using a technique
recommended by Cohen, 1988, p. 273). The effect sizes we found were: d = 0.66 (Study 1),
d = 0.37 (Study 2), d = 0.30 (Study 3), and d = 1.71 (Study 4). This last value strikes us as
extremely unlikely, given that the only difference between the conditions was the height of the
heels of the woman’s shoes.
7. The ANOVA statistics for Study 4 reported by Guéguen (2015b) do not correspond to the
means and SDs reported in his Table 4. Our Table 3 shows the correct values, assuming 12
participants in each condition. We also show the p values of the LSD tests, which are again
different to those reported by Guéguen. We are unable to determine whether this error is due to a
systematic misreporting of the means and SDs, or of the ANOVA statistics themselves.

Table 3. Recalculated ANOVA statistics from Guéguen (2015b, p. 2232).


MFlat SDFlat MMedium SDMedium MHigh SDHigh Foriginal poriginal Frecalculated precalculated
13.54 4.87 11.46 3.67 7.49 2.18 7.18 .003 8.11 .001
LSD Flat––––––––––––––Medium .26 .18
LSD Flat––––––––––––––––––––––––––––––––––High .001 <.001
LSD Medium –––––––––––––High .015 .014

We requested Dr. Guéguen to provide us with the following:

10
 An answer to our demonstration above that four people, changing shoes every 20
encounters, could not have produced the distribution of data from exactly 360 encounters
described in the report of Study 3.
 An indication of the ethical approval procedure that was presumably undertaken before
performing Study 4—which placed the female confederate in a potentially sexualized
situation—including a copy of the report of the ethics committee.
 A copy of the dataset from each study in the article.
 Contact details of any confederates who can, first, confirm that the results stated in the
article were those actually observed during the experiment; second, confirm the accuracy
of the methodological description; and third, describe any problems that occurred during
data collection.
 Any other evidence that the study took place as described, such as contemporaneous e-
mail exchanges, planning documents, copies of the forms on which the confederates
recorded the events that took place, a written field report from the confederates, etc.

11
3. Guéguen, N. (2013a). Weather and courtship behavior: A quasi-experiment with the flirty
sunshine. Social Influence, 8, 312–319. http://dx.doi.org/10.1080/15534510.2012.752401

Article summary: Participants (female pedestrians aged 18–25, walking alone) were more likely
to give their telephone number to a male confederate who stopped them and asked them for a
date if the sun was shining, independent of the ambient temperature.

1. We find Guéguen’s (2013a) explanation of the weather conditions to be unconvincing.


He used nearly 300 words—more than 10% of the total length of the article—to explain the
apparently extensive measures that were taken to ensure that the weather was either cloudy or
sunny, and the temperature was neither too cold nor too warm, presumably because it was crucial
to his hypotheses that the determining factor in the participants’ behavior was the amount of
sunshine and not the ambient temperature. But he did not explain how the temperature was
measured, or how the average temperature was calculated over the course of the data collection
periods. Nor did he describe the time sequence by which information about the temperature, and
the degree of cloudiness as evaluated by passers-by, was transmitted to the laboratory so that
confederates could either be told to go out and solicit participants for the study, or alternatively
“participate in a phone survey on food habits conducted at the same time” (Guéguen, 2013a,
p. 315).
Furthermore, it seems that no allowance was made for variations in the weather, once
data collection had been started for the morning. Anyone with personal experience of the
weather conditions in the late spring in Brittany knows that a clear sky at 9:00 am is no guarantee
that it will not be cloudy, or indeed raining hard, an hour later. Not for nothing do the people of
Brittany have an expression, “En Bretagne, chaque jour on a les quatre saisons” (“In Brittany,
you get all four seasons every day”). Yet, Guéguen (2013a) did not report any measures to deal
with this problem, which would need to be handled very carefully. Recall that the confederates in
this study were blind to the experimenter’s hypothesis. Thus, to tell them to stop asking for
phone numbers if the sky went from, say, 1 to 5 on the “cloudiness” scale in the course of a
morning would presumably involve the risk that they might work out what was going on.
It also seems incorrect to imply that any temperature changes during each day’s data
collection period (9:00 am–1:00 pm) were minimal. Even if the sea were to play the role in

12
reducing fluctuations in the shade temperature claimed by Guéguen (2013a), the winds (which
also bring changes in the cloud cover) can cause a substantial chilling effect. Our experience is
that a light sweater is often required on the Brittany coast in late May and early June. Thus,
controlling for temperature merely by noting the nominal reading on a thermometer would be
unlikely to take into account the participants’ perception of the ambient temperature.
Our skepticism about the above claims by Guéguen (2013a) is confirmed by an
examination of the actual weather data for the period in question. The article did not report in
which year the field work was conducted, but since it was published in the fall of 2013, we
assume that the data were collected in 2013 itself or, perhaps, in one of the preceding three years.
The names of the two towns in which the research was conducted were not mentioned either, but
we assume that they were Vannes and Lorient, which are home to the campuses of the Université
de Bretagne-Sud to which Dr. Guéguen is affiliated. These towns are located about 45km apart
and would normally be expected to have very similar weather to each other on any given day.
We therefore collected the recorded weather data for Vannes for the months of May and June in
the years 2010 through 2013, from the website of the “Association Météo Bretagne” at
http://www.meteo-bretagne.fr/climatologie-mensuelle-Vannes-Sene. We selected the dates from
May 21–31 and June 1–10 in each year, corresponding to the description of the field work
having been carried out at the “end of May and beginning of June” (Guéguen, 2013a, p. 314).
Our analysis of these meteorological data shows that the number of days on which the mean
temperature reported by the Association Météo Bretagne website exceeded the 20ºC reported by
Guéguen (2013a) during the 21-day period in question was 3 (2010), 2 (2011), 2 (2012), and 2
(2013).
However, since this “mean” temperature appears merely to be the average of the
minimum and maximum temperatures for the day, it might be objected that this does not reflect
the true mean temperature for the daily data collection period of 9:00 am to 1:00 pm reported by
Guéguen (2013a). We therefore examined those days on which the maximum temperature
reached 20ºC to see what the likely mean temperature for the data collection period on those
days would have been. The number of those days in each year for the 21-day period in question
was 13 (2010), 8 (2011), 11 (2012), and 8 (2013). To calculate a reasonable mean temperature
between 9:00 am and 1:00 pm on each of those days, we started with the observation that sunrise
(usually the point at which the lowest temperature for the day would be measured) occurs in

13
Vannes on June 1 at around 6:15 am local time; thus, when data collection began each day at
9:00 am in Guéguen’s (2013a) study, it was less than three hours after the coldest point of the
day. Furthermore, Vannes is situated slightly west of the Greenwich meridian, and its local time
from April to October is two hours ahead of UTC. This means that the highest point of the sun in
the sky is at around 2:10 pm local time, more than one hour after data collection stopped in
Guéguen’s study; one would normally expect the maximum daily temperature to be reached at
some point in the afternoon after this solar high point (cf. Wagner, n.d., from which our Figure 1
is taken). However, in order to be as charitable as possible to Guéguen’s position, we assumed a
linear rise in temperature from a minimum at 6:00 am to the reported daily maximum at 2:00 pm.
Even using this model, the number of days on which we calculated that the mean temperature
from 9:00 am to 1:00 pm could possibly have been 20ºC or more was 6 (2010), 2 (2011), 5
(2012), and 3 (2013).
We do not believe that these weather data are compatible with the scenario described by
Guéguen (2013a), even with the use of very generous assumptions. First, it would mean that
most days during the period in question did not meet his criteria for inclusion; assuming that he
conducted his research in 2013, for example, it would require 500 different women aged 18–25
to have walked alone past Guéguen’s confederates in a period of just 12 hours spread over three
mornings (see also our point 5, below). Second, given the range of temperatures specified on
p. 314, the study scenario requires that the temperature at 9:00 am was already 18ºC, which,
using our optimistic linear temperature model, reduces the number of possible days even further,
to 4 (2010), 2 (2011), 4 (2012), and 2 (2013). Third, it would appear likely that this small
number of days on which the maximum temperature was sufficiently high to allow the average to
reach 20ºC were almost entirely sunny (Guéguen’s claims about the warming effect of the sea
notwithstanding), meaning that there would be little or no opportunity to collect data for the
“mostly cloudy” condition. Yet, Guéguen did not mention any difficulties in finding appropriate
conditions for his data collection to take place.

14
Figure 1. Typical evolution of daytime temperature as a function of solar time. The green vertical
line (added by us) indicates solar time in Vannes at the end of Guéguen’s (2013a) daily data
collection period. The original image is available at http://rowdy.msudenver.edu/~wagnerri/ch3-
14.jpg.

2. The state of the sky (i.e., the amount of cloud cover, or its absence, on a scale of 1 to 9)
was obtained by having separate confederates ask random members of the public, walking
elsewhere in the town, to give their opinion of the state of the sky. The use of these pedestrians
seems not only to be superfluous—the assistants could, presumably, have evaluated the degree of

15
cloud cover themselves—but also to introduce a potential problem of inter-rater reliability, since
presumably different pedestrians were asked to help each day. It would seem to be both easier
and more reliable to have trained the assistants and ask them to evaluate the sky; indeed, we
would expect intelligent assistants to propose such a modification to the protocol themselves. We
wonder why a needlessly complicated process was described instead.
3. The number of participants was reported to be exactly 500. It is not clear how multiple
confederates acting simultaneously could have coordinated their solicitations to ensure that
exactly this number was chosen3, so we presume that they were deployed sequentially, although
Guéguen (2013a), in contrast to some other articles with similar experimental protocols, did not
discuss this. Even if each confederate had been instructed to approach exactly 100 participants,
however, it seems surprising that all five of them hit this target exactly, or that no participant was
apparently excluded for any reason. In a study of this type, one would normally expect the
confederates to report back to the experimenter on their interactions with their participants, with
the experimenter taking decisions as to which participants should be excluded (for example,
because of the way in which they reacted to the confederate’s request to give their phone
number). No trace of any such discussions is to be found in Guéguen’s article (or, indeed, in
most of his other articles involving field-based recruitment of participants).
4. All participants were reported as being aged 18–25. That is, of the 500 women stopped
by the confederates, not one turned out to be aged 17 or 26. The article makes no mention of any
elimination of participants whose age, when revealed, turned out to be outside the specified

3
Of course, another possibility is that exactly 500 participants had been recruited when data collection
was stopped for another coincidental reason, such as the planned end time of the Nth day’s collection
period being reached. However, the pattern of round numbers of participants in many of Guéguen’s other
studies leads us to conclude that these precise sample sizes were probably fixed in advance. Determining
one’s ideal sample size before starting to collect data is, of course, generally to be encouraged, but we
question how likely a researcher is to hit that target number exactly in practice. For example, suppose that
after ten full days of recruiting participants, the researcher were to find that his confederates had solicited
phone numbers from 499 women. Would he really go through all the steps described by Guéguen (2013a)
to verify the weather conditions the next day just to add one more participant, when the loss of statistical
power incurred by accepting a sample size of 499 would be negligible?

16
range. This would appear to require quite remarkable judgment on the part of all five
confederates.
5. The combined population in 2012 of the towns of Vannes and Lorient, together with their
associated suburban periphery, was 191,758 (Insée, 2015a, 2015b). From a weighted average of
the population statistics by five-year age group for the Brittany region in 2013 (Insée, 2015c), we
estimate that about 4.5% of these people are women aged between 18 and 25. (We have assumed
that the number “25” here refers to completed years, so that the age range is eight years wide,
including all women who had celebrated their 18th birthday but not their 26th.) That gives a pool
of 8,640 potential participants in the two towns. Let us further assume that when walking in
town, none of these women was displaying any visible signs of belonging to a cultural or
religious group whose attitudes to casual proposals from a random male to have a drink might be
likely to skew their response patterns; however, Guéguen (2013a) did not mention any particular
instructions to his confederates in this regard.
The size of the potential participant pool raises the important possibility of multiple
solicitations of the same woman. Even under the most optimistic assumptions (notably that all of
the 8,640 women aged between 18 and 25 in these two towns spend an equal amount of time
walking around on their own), Guéguen’s (2013a) confederates were, in effect, sampling with
replacement 500 times from a pool of 8,640. We calculate the chances of there being no
duplicates in such a situation to be approximately 4.03 × 10−7. Yet, Guéguen made no mention of
any strategy to avoid duplicate solicitations of the same woman, nor of the action to be taken if a
woman were to reveal (as, we presume, she almost certainly would do) that the confederate was
the second man named Antoine to ask her for her phone number recently. This seems strange,
since—as discussed in our analysis of Guéguen (2015b)—simply dropping the data from any
duplicate cases would appear to be a simple matter.
6. Guéguen (2013a) reported that “[a]fter making his request, the confederate was instructed
to wait 10 seconds, and to gaze and smile at the participant” (p. 315). This strikes us as a very
strange instruction. A number of the women could be expected to keep on walking; if the
confederate stood still for this time, he would potentially have to make up 30 meters or so in
order to ask the participant for her age. Even if the woman stopped, to have the confederate
standing “gazing” at her for 10 seconds might be considered a potentially aggressive act by at
least some women. Would every single woman out of 500, without exceptions, when asked by a

17
man for her phone number, really stop to think for 10 seconds? (Indeed, how was this period
measured?) We find it extremely implausible that any such instruction to the confederate could
be enacted in practice. Yet, Guéguen reported no problems with the implementation of this
protocol.
7. Perhaps the most improbable part of the study is the claim that 100% of participants—
that is, all 500 women who were approached—revealed their age, after either having accepted or
refused the request to give their phone number. This implies that not one woman, when
approached by the confederate who asked for her number, decided to walk away and ignore him.
Our own experience in this area is somewhat limited, and we are surely less attractive (and
young) than Guéguen’s (2013a) carefully-selected physical specimens, but we find the idea that
every woman in a sample of 500 would reveal her age to a stranger, especially immediately after
rebuffing his romantic advances, to be completely unrealistic.
8. The debriefing procedure seems extremely incomplete. Only the 90 women who gave
their phone number were contacted for debriefing. The other 410 women, who refused to give
their number, were presumably left to wonder why a strange man had approached them in the
street, introduced himself by name, asked them for their phone number, stood looking at them for
10 seconds, and then asked them their age. We are surprised that an ethics committee would
approve such a procedure, when it might be expected that the women who refused to give their
phone number might be more likely to be negatively affected by the confederate’s behavior than
those who revealed their number. We would have expected that either the male confederate (cf.
Guéguen, 2012b; but see also our comments on that article) or, perhaps, a female confederate
stationed a few meters away, would provide the debriefing immediately.

We requested Dr. Guéguen to provide us with the following:

 A copy of the report of the ethics committee, which is mentioned in the article (p. 314).
 An answer to our demonstration above that the temperature could almost never have
reached the daily mean level reported in the article, especially on cloudy days.
 A list of the dates and times at which data were collected.
 An explanation of why the degree of cloud cover was estimated by passers-by, instead of
being obtained from the local weather bureau, thus introducing needless unreliability and
complexity.

18
 A copy of the dataset.
 Contact details of any confederates who can, first, confirm that the results stated in the
article were those actually observed during the experiment; second, confirm the accuracy
of the methodological description; and third, describe any problems that occurred during
data collection—for example, duplicate solicitations of the same participant, or refusal by
the participant to cooperate in any way.
 Any other evidence that the study took place as described, such as contemporaneous e-
mail exchanges, planning documents, copies of the forms on which the confederates
recorded the events that took place, a written field report from the confederates, etc.

19
4. Guéguen, N. (2013b). Effects of a tattoo on men’s behavior and attitudes towards women: An
experimental field study. Archives of Sexual Behavior, 42, 1517–1524.
http://dx.doi.org/10.1007/s10508-013-0104-2

Article summary: A female confederate lay on a beach in a bikini, either with or without a
tattoo on her back. In the “tattoo” condition, participants (young male beachgoers) were more
likely to approach (Experiment 1) or estimate, when asked by a male confederate, that the female
confederate would go on a date and have sex with them (Experiment 2).

1. In Experiment 1, the logistical challenges involved in transporting 11 male and 11 female


confederates to 20 different beaches must have been considerable.
The majority of the confederates’ sessions on the beach towel (183 out of 220) lasted for
the full hour that was allowed. Including time to set up and leave the beach, travel from one
beach to the next, and have lunch, and given the implied need for the experiment to take place at
times when the beaches were busy, we estimate that a maximum of three sessions could have
been planned per confederate/observer couple per day. This would seem to require seven full
days work from each volunteer, which is quite a lot of time for an undergraduate to dedicate to a
study for which he or she received no acknowledgement.
Additionally, 11 vehicles would presumably have been needed to transport the volunteers
between the beaches. Assuming that the confederate’s or observer’s own vehicles were used, we
presume that appropriate insurance for their use was arranged by the laboratory. Doubtless this
information can be found in the record of the meetings of the ethics committee that approved
these experiments.
2. In Experiment 2, 100% of 440 young men who were approached by the confederate
agreed to answer his questions. This strikes us as extremely implausible, especially since these
participants were requested to provide answers to potentially embarrassing personal questions,
notably whether the woman confederate—who was lying no more than 10 meters away—would
be prepared to have sex with them on a first date. Apparently, not one participant changed the
subject, or was concerned that the woman confederate might overhear the conversation, or was in
any way suspicious of any other possible traps in the experimental setup, or expressed any form
of objection to answering the male confederate’s question.

20
3. The effect sizes from both experiments (d = 0.87 for the principal tattoo/no tattoo effect
in Experiment 1; 0.80 (“date”) and 1.37 (“sex”) in Experiment 2) seem highly implausible. For
comparison, the effect of sildenafil (Viagra) on sexual function is about d = 0.82 (Meyer et al.,
2001). Apart from any other considerations, it is unclear to us how many men would even notice
a 10 × 5cm tattoo on the lower back of a woman who was lying down on a beach towel in a red
bikini, before deciding to approach her from several meters away.
4. In the “no tattoo” condition, the item measuring the participants’ estimate of the
probability of having sex with the female confederate on a first date is shown with a mean of
4.53 and an SD of 1.23, measured on a scale with possible responses in the range of 1 to 9. Our
analysis shows that, in order for this mean and SD to be attained, the maximum possible number
of responses of 1 (“no probability”) is 11, and the maximum possible number of responses of 9
(“high probability”) is nine. (These extremes cannot occur simultaneously; with 11 responses of
1, the maximum number of responses of 9 is just six.) Additionally, for either of these maximum
numbers of extreme responses to be possible, it is necessary that all but two or three of the other
200+ participants responded with exactly 4 or 5. We will allow ourselves to break here for a
moment from the position that we took, in the introduction to this commentary, of not
commenting on the baseline behavior of participants: Out of over two hundred young men
encountered at random on a beach, when interviewed by a male researcher about their chances of
having sex on a first date with a woman, would no more than a handful of them really shake their
head sadly and say “None,” or smile broadly and say “No problem”? Would around 90% of them
give responses exactly in the middle of the scale?

We requested Dr. Guéguen to provide us with the following:

 A copy of the report of the ethics committee, which is mentioned in the article (p. 1519).
 A copy of the dataset.
 Contact details of any confederates who can, first, confirm that the results stated in the
article were those actually observed during the experiment; second, confirm the accuracy
of the methodological description; and third, describe any problems that occurred during
data collection—for example, duplicate solicitations of the same participant, or refusal by
the participant to cooperate in any way.

21
 Any other evidence that the study took place as described, such as contemporaneous e-
mail exchanges, planning documents, copies of the forms on which the confederates
recorded the events that took place, a written field report from the confederates, etc.

22
5. Guéguen, N. (2012a). Dead indoor plants strengthen belief in global warming. Journal of
Environmental Psychology, 32, 173–177. http://dx.doi.org/10.1016/j.jenvp.2011.12.002

Article summary: Participants (undergraduates) were asked about their attitudes and beliefs
regarding climate change in a room that contained a yucca plant that was either alive or dead.
They were more likely to endorse beliefs that climate change is real in the presence of the dead
plant.

Our concerns with this article are principally statistical in nature.

Study 1
Guéguen (2012a) reported in considerable detail the procedure by which the states (alive or
dead) of the plants to be used in the rest of the study were to be evaluated. Twenty participants,
exactly 10 of each sex, were used to determine that a brown ficus with no leaves was dead,
whereas one with “luxuriant green foliage” was alive. Means and standard deviations (SDs) of
their ratings were provided. It appears, from the presence of reports of W statistics, that Guéguen
also performed a Wilcoxon signed-rank test on these ratings and concluded that they were
significantly different. It is unclear why any such test was considered necessary in the first place,
given the overwhelming differences in ratings of the dead and living plants; indeed, it is unclear
why this rating procedure was followed at all. We think that most readers of the article would
have simply taken it on trust that the investigator was able to provide examples of plants that
were clearly dead or alive.
Guéguen’s (2012a) Table 1 shows the means and SDs of the answers given by
participants to the four questions about global warming. We observe that the SDs are rather
small, suggesting that many individual scores close to the mean values were dominant in
determining those means. Accordingly, we performed exhaustive enumerations of all possible
combinations of responses in the range 1–7 by groups of 30 people. These analyses revealed that
the numbers of each possible response are constrained to quite narrow ranges. Our Table 4 shows
the minimum and maximum possible numbers of each response, on the scale of 1–7, to the
questions concerning participant’s perceptions of global warming, given the means and SDs
reported for each question and condition by Guéguen.

23
Table 4. Minimum and maximum possible number of occurrences of each response in the range
3–7 in Guéguen’s (2012a) Study 1.
Plant Count of 3 Count of 4 Count of 5 Count of 6 Count of 7
Item Mean SD
Condition Min Max Min Max Min Max Min Max Min Max
1 Foliage 5.3 0.7 0 1 0 4 13 25 1 13 0 4
2 Foliage 5.1 0.6 0 1 0 4 19 25 1 7 0 2
3 Foliage 5.7 0.7 0 0 0 4 2 14 12 24 0 4
4 Foliage 5.4 0.7 0 0 0 4 9 21 5 17 0 4
1 No foliage 5.9 0.5 0 1 0 2 0 6 21 29 0 3
2 No foliage 5.6 0.5 0 0 0 0 12 12 18 18 0 0
3 No foliage 5.8 0.5 0 0 0 2 1 7 21 27 0 2
4 No foliage 5.9 0.6 0 1 0 2 1 8 19 27 1 4

Notes:
1. We generated these tables using custom R code (available on request) before the development
of SPRITE (see text).
2. Counts for responses of 1 and 2 are not shown here, as these cannot occur in any combination
of 30 responses with the means and SDs reported by Guéguen.

We observe some very puzzling patterns in these data. While the means of the responses to each
question seem to show a general consensus in favor of belief in global warming, with this belief
being higher among participants in the “No foliage” condition, the distribution of the responses
is very strange. First, as evidenced by the absence of any response of 1 or 2, and the very small
possible number of responses of 3 and 4 (including zero), it appears that none of the 60 students
tested expressed any doubt that global warming is real and they have personally noticed signs of
it occurring. While acceptance of the scientific consensus around global warming is strong
among French people, especially the young, with only 7.9% of people aged 18–24 expressing
skepticism (Ipsos, 2010), the fact that out of 60 people, nobody apparently expressed any doubts
at all about the reality of the phenomenon, or any of the aspects of their personal experience of it

24
that were measured by the items in this study, seems slightly surprising. Second—and in contrast
to the previous point—although belief in global warming is clearly strong on average among
participants in both conditions, very few participants gave a response of 7 (i.e., the maximum of
the scale). In the “Foliage” condition, the most prevalent response by far was 5, and in the “No
foliage” condition, the most prevalent response was 6. These figures are perhaps even more
surprising than the absence of skeptics; in the research by Ipsos (2010) just cited, 53.5% of adults
aged 18–24 surveyed replied “Oui, tout à fait” (“Yes, absolutely”) to a question asking if they
personally thought that climate change was real. Yet, when given the proposition “It seems to me
that the temperature is warmer now than in previous years,” apparently not one participant gave a
response of 7 out of 7, even in the presence of a dead plant.
An analytical exposition of the above is possible. Using SPRITE (Sample Parameter
Reconstruction via Iterative TEchniques; Heathers, Anaya, van der Zee, & Brown, 2018), we can
verify the possible sample sets for the values provided. Take, for instance, the mean/SD pair 5.3
and 0.7 in response to the item “I have already noticed some signs of global warming” in the
“Foliage” condition, as shown in our Figure 2. It can be seen that no solution has more than 3
reported values of 7, or more than a single reported value of 3, and none has any occurrences of
the values of 1 or 2. In short, no more than 10% of the participants expressed a strong belief in
global warming, while 0% expressed a strong disbelief. This does not seem to us to be a very
likely pattern of responses among young educated French people.
It should be noted that the example just given probably represents the mean/SD pair with
the greatest potential variability in Guéguen’s (2012a) article. Other items and conditions have
even more restricted ranges of possible individual responses. For the item “It seems to me that
the temperature is warmer now than in previous years” in the “No foliage” condition, with a
reported mean of 5.6 and SD of 0.5, SPRITE suggests that there are only three solutions, all
consisting of only responses with the values 5 and 6. Subsequent analysis with CORVIDS
(Wilner, Wood, & Simons, 2018) confirmed this to be the case.

25
Figure 2. Possible distributions of responses to the item “I have already noticed some signs of

global warming” in the “Foliage” condition in Guéguen’s (2012a) Study 1, as generated by

SPRITE (Heathers et al., 2018).

26
Figure 3. Possible distributions of responses to the item “It seems to me that the temperature is

warmer now than in previous years” in the “No foliage” condition in Guéguen’s (2012a) Study 1.

27
We also note that this study has a very large effect size. Guéguen (2012a) reported an
effect of d = 1.50 for the t test comparing the aggregated scale scores from all four items relating
to belief in global warming. This is a remarkably large effect, given the subtle nature of the
manipulation.

Study 2
All of our concerns about Guéguen’s (2012a) Study 1 apply at least equally, if not to a greater
extent, to his Study 2. Once again, elaborate and gender-neutral measures were taken to ensure
that the plants being used were evaluated as dead or alive. Once again, the means of participants’
responses to the questions on global warming appeared plausible, while the small SDs meant that
the responses were constrained to the upper-middle range of possibilities. Our Table 5 shows the
possible responses for each item. The same patterns as in Study 1 are readily apparent. We can
be sure that nobody (out of 150 participants) responded with 1 or 2. Only a very small number
can possibly have responded with 3 or 4, and it is possible for both of these responses to have
never been given. But as with Study 1, remarkably few participants responded with 7, giving full
endorsement to a belief in the reality of global warming. The remarkable conclusion in this study
is that, whereas around three-quarters of participants gave a response of 5 (out of 7) in the
presence of no plant, or one or three live plants, the same proportion gave a response of 6 in the
presence of one dead plant. For so many participants to so consistently give the same (per
condition) positive endorsements of a belief in global warming, while almost never giving the
maximum possible endorsement, strikes us as truly remarkable.
Study 2 also has a very large effect size. Although it is not easy to convert from partial
eta-squared to Cohen’s d directly, Cohen (1988) suggested that a partial eta-squared value of .14
should be considered a large effect. Guéguen’s (2012a, p. 176) value of .53 therefore presumably
represents something several times larger than “large.”

28
Table 5. Minimum and maximum possible number of occurrences of each response in the range
3–7 in Guéguen’s (2012a) Study 2.
# Plants / Count of 3 Count of 4 Count of 5 Count of 6 Count of 7
Item Mean SD
Condition Min Max Min Max Min Max Min Max Min Max
1 No plant 5.3 0.6 0 1 0 3 15 25 2 12 0 3
2 No plant 5.2 0.6 0 1 0 4 17 27 0 10 0 3
3 No plant 5.2 0.5 0 0 0 2 21 27 1 8 0 2
4 No plant 5.3 0.5 0 0 0 1 18 23 6 11 0 1
1 1 / Foliage 5.2 0.6 0 1 0 4 17 27 0 10 0 3
2 1 / Foliage 5.1 0.6 0 1 0 5 18 27 0 8 0 2
3 1 / Foliage 5.3 0.6 0 1 0 3 15 25 2 12 0 3
4 1 / Foliage 5.4 0.5 0 0 0 1 17 20 9 13 0 1
1 3 / Foliage 5.3 0.5 0 0 0 1 18 23 6 11 0 1
2 3 / Foliage 5.2 0.5 0 0 0 2 21 27 1 8 0 2
3 3 / Foliage 5.2 0.6 0 1 0 4 17 27 0 10 0 3
4 3 / Foliage 5.3 0.5 0 0 0 1 18 23 6 11 0 1
1 1 / No foliage 5.9 0.5 0 0 0 2 0 6 21 29 0 3
2 1 / No foliage 5.7 0.5 0 0 0 1 6 11 18 23 0 1
3 1 / No foliage 5.9 0.6 0 1 0 2 0 8 18 28 0 5
4 1 / No foliage 5.9 0.5 0 0 0 2 0 6 21 29 0 3
1 3 / No foliage 6.3 0.5 0 0 0 0 0 1 19 21 9 10
2 3 / No foliage 6.3 0.6 0 0 0 1 0 3 15 18 11 12
3 3 / No foliage 6.0 0.6 0 1 0 2 0 6 18 27 2 6
4 3 / No foliage 6.3 0.5 0 0 0 0 0 1 19 21 9 10

We requested Dr. Guéguen to provide us with the following:

 A copy of the dataset.


 Any other evidence that the study took place as described, such as contemporaneous e-
mail exchanges, planning documents, copies of the forms on which the research assistant
noted the participants’ responses, etc.

29
6. Guéguen, N. (2012b). The sweet smell of.... courtship: Effects of pleasant ambient fragrance
on women’s receptivity to a man’s courtship request. Journal of Environmental
Psychology, 32, 123–125. http://dx.doi.org/10.1016/j.jenvp.2012.01.004

Article summary: Participants (women aged 18–25 walking along in a shopping center) were
more likely to give their phone number to a male confederate who asked them for a date if the
location where they were walking was directly in front of a business that produced pleasant
smells.

1. We find the reasons given to the confederates for the choice of location within the mall
(presumably this was a typical “centre commercial” or shopping center, rather than an American-
style mall) to be implausible. This is especially the case with the claim that “we said that
changing the location was a good method to prevent to be notice [sic] by the vigilantes [sic] who
worked in the shopping mall” (Guéguen, 2012b, p. 124). This suggests that the confederates
were told that the research was being conducted without the permission of the shopping center’s
operators. Aside from the legal and ethical questions that this raises, the belief that a security
guard might intervene at any moment and eject them from the shopping center might well have
affected the confederate’s interventions with the participants.
We also wonder exactly how far from a retail outlet its characteristic odors are detectable.
For example, in our experience, the ovens of a French shopping-center bakery tend to be
installed at the back of the store, and almost no odor of baking is typically detectable in the
pedestrian area. In order to obtain any noticeable odor, the confederates would have to stand (and
the participants would have to be walking) rather close to the outlet, at which point the staff
might start to raise questions about what was going on.
2. The number of participants was reported as exactly 400. (Cf. our equivalent concern
about Guéguen, 2013a.)
3. Let us assume that the shopping center where the study was conducted serves a
population of around 40,000 people. We base this figure on the town of Vannes, which has two
such centers based around large hypermarkets (Carrefour, opened in 1967, and E. Leclerc,
opened in 1999) and had a population in 2012 of 77,150 (Insée, 2015b). On that basis, we
wonder where these 400 young women were to be found. If we assume the same percentage of

30
women in the relevant age group, as well as the other conditions noted in point 5 of our
discussion of Guéguen (2013a), the number of possible participants within the population of
approximately 40,000 served by the shopping center is around 1,800. In order for Guéguen
(2012b) to have recruited 400 of these 1,800 women, at least 22% of them must have chosen to
visit the shopping center, on their own, during the hours when one of Guéguen’s confederates
was present. Even without the constraint of these women being unaccompanied during their visit,
this seems a remarkably high percentage, given that these two shopping centers are by no means
the only retail outlets in the town. Furthermore, as with Guéguen’s (2013a) study of women
walking alone in the streets of two towns, no mention is made of any of these young women
being solicited more than once, or of any precautions being taken either to ensure that such an
event did not occur, or to mitigate its effects. Yet, we estimate the probability of no duplicate
solicitations being made to be approximately 1.38 × 10-21. See also our discussion of the issue of
duplicate participants in Guéguen (2013a).
4. Guéguen (2012b) reported that “[a]fter making his request, the confederate was instructed
to wait 10 s, and to gaze and smile at the participant” (p. 124). It strikes us as very unlikely that
anything like this scenario would be enacted in practice. (Cf. our equivalent concern about
Guéguen, 2013a.) When intercepted by the confederate, the participant was walking along in the
shopping center, presumably with some purpose in mind. Yet, for the protocol of the study, it
was essential that the participant remain in the close vicinity of the confederate for at least 10
seconds, in order for the confederate to initiate the debriefing procedure. Did all 400 women stop
walking and hang around out of curiosity when asked for their phone number, to see what would
happen next? Did they all believe the confederate’s subsequent story that he was indeed just
conducting a psychological survey, and acquiesce to the debriefing? Did not one woman say
anything to the effect of “Please, just leave me alone, I’m in a hurry / not interested / about to
meet up with my boyfriend?”
5. Guéguen (2012b, p. 124) stated that “Just before leaving the participant, the confederate
asked her for her age whether she had responded positively or not to his request.” Although there
is no mention of the percentage of women who responded positively to this request, we presume
from the absence of any indications to the contrary that this rate was 100%; there is no indication
that the mean and SD of the participant’s ages is an estimate based on a smaller number of
positive responses to the request for their age. While, in this case (compared to Guéguen, 2013a),

31
the participant had at least been debriefed by the confederate before he asked her age, we still
find this percentage implausible. First, we believe that many women would not have stopped and
waited for 10 seconds while the confederate gazed and smiled at them, so that they would simply
not have been present for the debriefing (cf. point 4, above). Second, we consider it unlikely that
all of the women, without exception, who listened to the debriefing would agree to give out their
age.

We requested Dr. Guéguen to provide us with the following:

 A copy of the report of the ethics committee, which is mentioned in the article (p. 123).
 A copy of the dataset.
 Contact details of any confederates who can, first, confirm that the results stated in the
article were those actually observed during the experiment; second, confirm the accuracy
of the methodological description; and third, describe any problems that occurred during
data collection—for example, duplicate solicitations of the same participant, or refusal by
the participant to cooperate in any way.
 Any other evidence that the study took place as described, such as contemporaneous e-
mail exchanges, planning documents, copies of the forms on which the confederates
recorded the events that took place, a written field report from the confederates, etc.

32
7. Guéguen, N. (2012c). Color and women hitchhikers’ attractiveness: Gentlemen drivers prefer
red. Color Research & Application, 37, 76–78. http://dx.doi.org/10.1002/col.20651

Article summary: Participants were motorists driving along a road. Male (but not female)
participants were more likely to offer a lift to a female hitchhiker if she was wearing a red (but
not black, white, blue, green, or yellow) t-shirt.

1. Guéguen (2012c) stated that his five female confederates were chosen by a group of men
who rated the facial attractiveness of “18 female volunteers with the same height, with the same
breast size (95 cm of bust measurement and bra with a ‘B’ size cup), and same hair color”
(p. 77). While we admire this level of attention to detail, we wonder how many volunteers were
rejected because they were slightly too tall, or had slightly too large or too small breasts, or hair
that was slightly too dark or too light. It seems that at Dr. Guéguen’s laboratory he must have
access to a very large number of women who are prepared to stand for hours on end to stop
passing motorists. In contrast, many psychological researchers have difficulty in getting
participants to spend an hour filling out questionnaires even when offered a monetary reward.
2. We note a curious error in Guéguen’s (2012c) Table 1. The numbers of male and female
motorists are listed as 3,474 and 1,776, respectively. However, these two numbers sum to 5,250,
rather than 4,800 (which was the sample size reported elsewhere in the article by Guéguen, with
3,024 male and 1,776 female motorists). We question how such an error might creep in to a table
by accident, since it requires two very different digits (4 instead of 0 and 7 instead of 2) to be
mistyped.
3. Guéguen (2012c) reported the colors of the T-shirts used in very precise terms, even
going so far as to give their HSL (Hue, Saturation, Luminance) values. However, in several
cases, those HSL values do not correspond to the reported T-shirt colors. As our Table 6 shows,
the color described by the HSL values corresponding to “red” is best described as a medium
orange-ochre color, while “yellow” is a very pale pink, and “blue” is pure white.
We also question exactly how Guéguen (2012c) arrived at these HSL numbers. For
example, was an example of each T-shirt placed in a scanner and its color matched by successive
approximation against a reference palette? Indeed, why was such precision considered necessary,

33
when it would surely have been sufficient to report that bright, unambiguous examples of each
color had been selected?
Table 6. HSL values claimed by Guéguen (2012c) for the colors of the T-shirts used in the study,
converted to hexadecimal RGB values and displayed as colored patches.
Hue Saturation Luminance Claimed color Hex color Result

0 0 0 Black #000000

0 0 100 White #ffffff

16 92 68 Red #f88a62

19 100 94 Yellow #ffeae0

210 100 100 Blue #ffffff

99 66 87 Green #d7f4c8

4. Guéguen (2012c, p. 77) reported that “Each hitchhiker was instructed to test 960 drivers.
After the passage of 240 drivers, the confederate stopped and was replaced by another
confederate.” No indication was given of how long it took for 240 drivers to pass. However,
assuming that the “famous peninsula” mentioned here is the “Presqu'Île de Rhuys,” also
described as a “famous peninsula” in a similar study by Guéguen (2007b), and assuming similar
traffic flows to those reported in that previous study, in which the passage of 100 cars took
“about 40 to 50 minutes” (Guéguen, 2007b, p. 1296), we assume that it took around two hours
for 240 cars to pass. In order to test 4,800 motorists, then, a total of 40 hours of testing would be
required. Guéguen (2012c, p. 77) stated that the experiment “took place during summer
weekends on clear sunny afternoons between 2 and 5 PM.” With three hours being available on
Saturday and three more on Sunday, the experiment would have lasted for at least seven
weekends, assuming that every day was sunny (a contingency that is far from guaranteed in
Brittany, even if “summer” is taken to mean the months of July and August—a time when one
might expect even the most dedicated undergraduates to be reluctant to participate in
psychological research every weekend). We consider this to be a quite extraordinary amount of
effort, especially in view of the fact that none of the confederates were listed as co-authors, or

34
even acknowledged, in the resulting article (a common factor in many of the articles that we
examine in the present commentary).
5. In view of the considerable logistical effort required for this study, and the author’s
hypothesis that an effect would be observed only for red T-shirts with male drivers, why was it
considered necessary to test five other colors of T-shirt, rather than, say, just three? For example,
Guéguen (2012e) tested only four colors of T-shirt (white, red, blue, and green) in a far less
onerous scenario.
6. The permanent population of the “Presqu'Île de Rhuys” in 2011 was 13,513 people
(CCPRhuys, 2015). Assuming that the number of cars per 1,000 people in the area is 600, which
is at the upper bound of the range for the Brittany region (OECD, 2009), the number of “local”
cars would be about 8,100. Of course, this area is a popular vacation destination, and many of the
cars passing would have been driven by tourists; however, those cars would typically be
expected to contain at least two people, and quite possibly an entire family (cf. also our point 7,
below), and would presumably be unlikely to stop for hitchhikers of any kind. Even if we assume
that only one-third of the 4,800 cars counted were local, and that all of the 8,100 local cars were
equally likely to be included in the study—despite the fact that the study was conducted on
weekend afternoons only, thus excluding, for example, people who use their cars to commute to
work and prefer to cycle at the weekends—the chance of there being no duplicates when
sampling 1,600 cars with replacement from a pool of 8,100 is approximately 2.47 × 10−74. Yet,
Guéguen (2012c) did not describe any precautions taken to avoid such duplicates, despite the
potential biases that these might introduce into a model that appears to assume the independence
of all 4,800 (or 5,250) observations. This problem is even more severe than the previous
examples we have discussed where multiple solicitations of participants were extremely likely
(Guéguen, 2012b, 2013a), because the duplicates would be unlikely to make themselves known.
For example, a motorist who had previously offered a lift to the confederate, only to be told that
he or she was taking part in an experiment, might well be unlikely to stop for her (or another
similarly-attractive, similarly-dressed confederate) at the same spot a week later, regardless of
the color of her T-shirt.
7. Guéguen’s (2012c) experimental scenario seems to assume that every car contained
exactly one occupant (the driver). This seems strange, since one would expect many drivers—
especially on a sunny afternoon in summer—to be accompanied by their spouse, children, or

35
friends. Information about any other occupants of the vehicle would appear to be of considerable
interest in evaluating the motivations for the driver to stop (or not). Perhaps a male driver who
might otherwise have stopped to offer a lift to a single woman was accompanied by his wife, or a
female driver might have been willing to help the hitchhiker had all the seats in her car not been
full with children. Guéguen’s description of his method does not address the question of how
vehicles with multiple occupants were to be treated, nor was any mention made of a method to
detect such cars and eliminate them from consideration. For example, given the possibility that a
small child might have been sitting in the back, it is not clear how a confederate sitting in a car
by the side of the road could determine if a car that passed them was only occupied by the driver.
A requirement to consider only cars with unaccompanied drivers would, of course, also
considerably extend the number of hours needed to observe 4,800 such cars, and exacerbate the
problem of duplicate solicitations.
8. Guéguen (2012c) stated that the two observers whose job was to count the number of
passing motorists and identify them as male or female were stationed 500 meters from the
hitchhiker (whether this was before or after the cars being observed passed the hitchhiker was
not specified). He reported that “[t]he convergence between the two observers’ evaluation was
high (r = 0.97)” (p. 77). In order for such a “convergence” (expressed as a correlation
coefficient) to be calculated, more information than the simple total numbers of male and female
drivers would seem to be required. Specifically, in order to establish a meaningful correlation
coefficient, the two observers would need to independently record both the sex of each driver
and the sequence in which those drivers were observed. However, Guéguen reported that each of
the observers “used two hand-held counters, one to count the female motorists and the other to
count the male motorists” (p. 77). The term “hand-held counters” seems to imply that these were
simple mechanical devices, such as those used to count attendees at sporting events (Amazon,
2015). Without synchronized timestamps across all four of these counters, however, it does not
seem to be possible to establish a correlation coefficient for the evaluations of the drivers’ sex
between the two observers. More sophisticated methods of collecting these data can easily be
imagined (e.g., a laptop computer per observer, with software that would record, with a
timestamp, each press of the “M” or “F” keys), but these were apparently not used. We are
therefore curious as to how Guéguen established the correlation coefficient for the

36
“convergence” between the observers, given only a total count of male and female drivers from
each observer.

We requested Dr. Guéguen to provide us with the following:

 A copy of the report of the ethics committee, which is mentioned in the article (p. 77).
 An explanation of how the problem of vehicles with multiple occupants was handled.
 An explanation of the discrepancy between the sample sizes in Table 1 and the text
(5,250 vs. 4,800).
 An explanation of how the HSL values for the t-shirt colors were determined, and why
these bear very little relation to the colors described in the text.
 An explanation of how the use of a “hand-held counter” provided values capable of being
subjected to correlation analyses.
 A copy of the dataset.
 Contact details of any confederates who can, first, confirm that the results stated in the
article were those actually observed during the experiment; second, confirm the accuracy
of the methodological description; and third, describe any problems that occurred during
data collection.
 Any other evidence that the study took place as described, such as contemporaneous e-
mail exchanges, planning documents, copies of the forms on which the confederates
recorded the events that took place, a written field report from the confederates, etc.

37
8. Guéguen, N. (2012d). Gait and menstrual cycle: Ovulating women use sexier gaits and walk
slowly ahead of men. Gait & Posture, 35, 621–624.
http://dx.doi.org/10.1016/j.gaitpost.2011.12.011

Article summary: Participants were heterosexual women who were ostensibly recruited to take
part in a study on computerized lexical decision making. They were surreptitiously filmed while
walking down a corridor from the waiting area to the laboratory, while aware that a male
confederate was walking behind them. They also provided saliva samples that were tested for a
hormone to determine whether they were ovulating. Observers coded the “sexiness” of the
women’s gait as she walked along the corridor in front of the man, and this was correlated with
their “fertility probability.”

Our principal concern when we first read this study was about the ethical issues involved.
First, women were recruited to take part in a study on “computerized lexical decisions”
(Guéguen, 2012d, p. 622), but after having agreed to participate they were subjected to a number
of intimate questions. While we presume that they were informed that they could leave the study
at any time, such information does not absolve the experimenter of the need to avoid possible
distress to participants, who might well have felt morally or socially obligated (for example, to
avoid putting their participation in future research in danger) to continue. Second, the secret
filming of the women, especially with the subsequent intention of evaluating the “sexiness” of
their gait, strikes us as voyeuristic, to say the very least. We would be interested to see the report
of the meeting of the ethics committee where this experiment was approved, and in particular
exactly what precautions were taken to prevent these secretly-recorded films from being
disseminated beyond their intended audience. We are also rather surprised that, having been
informed that they had been filmed from behind at a distance of one meter in order for their
“sexiness” to be evaluated by two unknown males, every one of the women apparently gave her
consent.
There are also problems with the reported data for the evaluation of the “sexiness” of the
participants’ gait by the evaluators. In the “High risk” condition, neither the reported mean of
2.96 nor its associated SD of 0.57 can be produced by any combination of 14 numbers from 1 to
5. The closest possibilities for the mean are 3.00 with an SD of 0.55 (produced by the sequence

38
2, 2, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 4, 4), or 2.93 with an SD of 0.62 (produced by the sequences 2, 2,
2, 3, 3, 3, 3, 3, 3, 3, 3, 3, 4, 4 and 1, 2, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 4), while the closest possibility
for the SD is 0.58, with associated means of either 2.21 (produced by the sequences 2, 2, 2, 2, 2,
2, 2, 2, 2, 2, 2, 2, 3, 4 and 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3) or 2.79 (produced by the sequences
2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 3, 3, 3, 4 and 1, 2, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3). In the “Medium risk”
condition, neither the reported mean of 2.54 nor its associated SD of 0.48 can be produced by
any combination of 22 numbers from 1 to 5; the nearest mean is 2.55 with an SD of 0.51
(produced by the sequence 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3). The reported
SD of 0.48 can be produced, but only with a mean of either 2.32 (produced by the sequence 2, 2,
2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3) or 2.68 (produced by the sequence 2, 2, 2, 2, 2,
2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 3). Thus, of the six reported numbers for means and SDs
of the “sexiness” scores, four are necessarily incorrect, and of these, at least two differ from the
published values in two separate digits.
Guéguen (2012d) also reported very large effect sizes in this article. The main effect of
fertility on “sexiness” of gait had a partial eta-squared value of .15, which is considered a large
effect (Cohen, 1988). If it were true, one might expect that women themselves may have noticed
or intuited the existence of such an effect previously.

We requested Dr. Guéguen to provide us with the following:

 A copy of the report of the ethics committee, which is mentioned in the article (p. 622).
 A copy of the dataset.
 Contact details of any confederates who can, first, confirm that the results stated in the
article were those actually observed during the experiment; second, confirm the accuracy
of the methodological description; and third, describe any problems that occurred during
data collection.
 Any other evidence that the study took place as described, such as contemporaneous e-
mail exchanges, planning documents, copies of the forms on which the confederates
recorded the events that took place, invoices or other documentation from the laboratory
that analyzed the participants’ saliva samples, a written field report from the
confederates, etc.

39
In his reply of July 2017, Dr. Guéguen pointed out that the ratings of “Sexiness of the women’s
gait” reported in Table 1 were constructed from the mean ratings of two raters. As we discovered
during our subsequent development of GRIM (Brown & Heathers, 2017), it is easy to overlook
such details when reading articles; nevertheless, we should have been more careful. However, we
note that, even if the values that are used to construct these means can be fractions ending in 0.5
rather than integers, the mean value for “Moderate” risk (2.54) is still not possible with N = 22.

40
9. Guéguen, N. (2012e). Color and women attractiveness: When red clothed women are
perceived to have more intense sexual intent. Journal of Social Psychology, 152, 261–
265. http://dx.doi.org/10.1080/00224545.2011.605398

Article summary: Participants were male undergraduates. They were ask to rate the
attractiveness of a photo of a woman, whose clothing was digitally altered to be red, blue, green,
or white. Participants who saw the photo of the woman with red clothing rated her as more
attractive.

In this study, three main problems can be observed from the table of means and standard
deviations.
First, as with another study discussed above (Guéguen, 2012a), the small standard
deviations suggest that the responses in each case must have been closely clustered around the
mean. That is, very few of the confederates can have rated the woman’s attractiveness or sexual
intent with a 0, 1, 2, 8, or 9. Once again, Guéguen’s participants consistently revealed themselves
to be intensely moderate in their views.
A second and more substantial problem, however, is that four of the means, as reported to
two decimal places, are clearly impossible. There were exactly 30 participants in each condition,
giving categorical responses on an integer scale from 1 to 9. Thus, the total of these responses is
itself an integer. However, when any integer is divided by 30, the second and subsequent decimal
places must all either be 0, 3, or 6; when rounded, these give numbers of the form x.y0, x.y3, or
x.y7 (cf. our analysis of Guéguen, 2015a). Thus, only means ending in 0, 3, or 7 ought to be
reported. However, four of the means reported by Guéguen (2012e), namely those for
attractiveness and sexual intent in the “White” and “Red” conditions, do not conform to this
requirement; these values are 5.12, 5.95, 4.34, and 6.28. These appear to be impossible given the
fact the data is reported in decimal units of 1/30th (Brown & Heathers, 2017).
A third problem, related to the second, is that of the four remaining means that are
possible (i.e., those for the blue and green T-shirts, ending in 0, 3, or 7), three (M = 4.40,
SD = 1.32; M = 4.93, SD = 1.37; M = 5.07, SD = 1.19) are associated with standard deviations
that are impossible. We enumerated all possible combinations of the numbers 1 through 9 that
gave those means, and determined that no combination gave a standard deviation equal to that

41
reported to Guéguen (2012e) when rounded to two decimal places. This situation arises because,
with the per-condition sample size of 30, the increment in standard deviation from one
combination of scores to the next (e.g., with a mean of 5.00, replacing two scores of 5 with one
of 4 and one of 6), in the region of means between 4.0 and 6.5 reported by Guéguen (2012e), is
between 0.025 and 0.030. Thus, less than half of all possible numbers with two decimal places
(i.e., with a precision of 0.01) are candidates to be valid standard deviations for any given mean.
Taken together, these analyses mean either that Guéguen (2012e) made at least seven
transcription errors in his table of eight means and eight SDs, or that some other explanation
must be sought for the origin of the numbers in his Table 1.

We requested Dr. Guéguen to provide us with the following:

 A copy of the dataset.

In his reply of July 2017, Dr. Guéguen stated that the ratings were made by drawing a line on
paper, which was subsequently measured with an electronic caliper. No mention of this was
made in the article; nor is it clear why one would use such a method given the relative simplicity
of the question (and the fact that the rating was stated as being “on a scale of 1 to 9”, whereas
one would expect a scale that was based on the length of a line to start at zero).

42
10. Guéguen, N. (2012f). Risk taking and women’s menstrual cycle: Near ovulation, women
avoid a doubtful man. Letters on Evolutionary Behavioral Science, 3, 1–3.
http://dx.doi.org/10.5178/lebs.2012.17

Article summary: Participants were heterosexual female undergraduates, ostensibly recruited to


take part in a study on computerized lexical decision making. They also provided saliva samples
that were tested for a hormone to determine whether they were ovulating. They were asked to
wait in a room that contained a male confederate with poor grooming and messy clothing, with
seven free seats. Women with a higher (versus lower) “fertility risk” chose to sit further from the
confederate.

The principal problems with this article are to be found in the means and (especially) SDs
of the reported distances between the participants and the confederate.
In our Figure 4, we have taken Guéguen’s (2012f) Figure 1 and annotated it with what we
believe to be conservative estimates of the minimum distances, in cm, between the chairs.
Guéguen did not report exactly how the distance between the participant and the confederate was
measured, so we have used the middle of the front edge of the chair in his diagram. The original
figure did not have the same aspect ratio of 1.5:1 that would correspond to Guéguen’s
description of the room’s dimensions as 4.5 × 3.0m, so we have compressed the figure in the
horizontal direction to achieve this aspect ratio. It can be seen that only four of the chairs are
situated closer to the confederate than the mean distance reported for participants in the High risk
condition; for the Moderate and Low risk conditions, these numbers are three and two,
respectively. When weighted by the numbers of participants in each condition, this implies that
at least half of the women in the study must have sat in the same half of the room as the
confederate.
A further problem with Guéguen’s (2012f) Table 1 is to be found in the standard
deviations of the distances between the participants and confederate. As can be seen from our
Figure 4, but also if one takes a moment to imagine the situation in the room, the distance
between the chairs must have been at least one meter, in order for them to be spaced out across
the 4.5 × 3.0m room. (Apart from anything else, the chairs themselves were probably at least
45cm wide.) For example, if we consider the High risk condition, the only way we can find to

43
account for the given mean of 236.5cm and SD of 29.4cm (which is about two-thirds of the
width of the chair, and about one-quarter of the mean distance between the chairs) is if 14 of the
15 women sat on the chair approximately 220cm from the confederate, and one sat on one of the
chairs on the left-hand wall. Let us assume for a moment that Guéguen’s main hypothesis is
correct, and that women at high risk of conception did indeed choose “safer” places to sit in the
room. Let us further assume that they considered all four chairs in the opposite half of the room
to the confederate to be equally “safe” (thus ignoring the fact the three chairs against the left-
hand wall were all further from the confederate) so that they had a choice of four chairs.
Assuming that each of those chairs was equally likely to be chosen, the chance that any
particular chair would be chosen 14 times out of 15 is approximately 4 × 10-8. Some further
numerical analysis of the standard deviations shows that, even in the “Moderate” and “Low” risk
conditions, almost all of the women must have chosen to sit in the chairs numbered (in our
Figure 4) 3, 7, or 8—that is, at a medium distance from the confederate. Whatever their study
condition, almost none of the women can have chosen to sit as far as possible from this
unpleasant individual.
Taken together, these results suggest that, of 96 women who entered an otherwise empty
room and found an unpleasant man sitting there, more than half chose to sit in the same half of
the room as him, with many choosing to sit either directly or diagonally opposite him. Based on
our life experience, as men whose appearance is not quite as extreme as that described in the
article, we find this extremely difficult to believe. When we have found ourselves alone in
settings such as a doctor’s waiting room, we find that the second person to enter the room very
rarely sits anywhere near us, especially if she is a woman, despite our best efforts not to appear
unpleasant. Indeed, elementary social conventions (in France, as in most other Western
countries) would seem to dictate that the second person to enter a waiting room will almost
always choose to sit in the opposite half of the room to the first person.

44
Figure 4. Guéguen’s (2012f) Figure 1, aspect ratio corrected, with estimated distances between
chairs (red) and room dimensions (mauve) indicated in centimeters.

Even if the issues that we have just described were to be somehow resolved, however, the
very large effect sizes reported by Guéguen (2012f) also seem problematic. A partial eta-squared
value of .20 was reported for the main effect of fertility risk with distance; this is considerably
larger than Cohen’s (1988) suggestion that a value of .14 constitutes a large effect. As with some
other studies discussed previously, if this effect were real, we wonder why nobody would have
noticed it before outside of a laboratory context.

We requested Dr. Guéguen to provide us with the following:

 A comprehensive description of how the seating plan worked, taking into account our
estimation of the distances within the room based on the Figure 1 from the article.
 A copy of the dataset.

45
 Contact details of any confederates who can, first, confirm that the results stated in the
article were those actually observed during the experiment; second, confirm the accuracy
of the methodological description; and third, describe any problems that occurred during
data collection.
 Any other evidence that the study took place as described, such as contemporaneous e-
mail exchanges, planning documents, copies of the forms on which the confederates
recorded the events that took place, invoices or other documentation from the laboratory
that analyzed the participants’ saliva samples, a written field report from the
confederates, etc.

46
Recurrent problems

Following our review of a selection of articles by Dr. Guéguen, in this section we attempt
to summarize the trends that we have observed within these articles, and our concerns for the
integrity of this body of research as a whole.

Lack of variability of responses


In several of the articles we have examined, Dr. Guéguen reported the results of Likert-
type data, apparently showing substantial differences between conditions. But in most of those
cases, the small sizes of the reported standard deviations mean that the distribution of the
participants’ responses, whatever the condition, must all have been clustered around one number
in the middle, or upper-middle, of the scale. This applies both to participants who were
undergraduates (e.g., Guéguen, 2012a) and members of the public (e.g., Guéguen, 2013b). In all
cases, out of substantial numbers of participants (several hundred in the case of Guéguen,
2013b), almost nobody chose answers at the extreme ends of the scale in any condition. Such
patterns are totally unlike anything we have ever observed with Likert-type data, where there is
almost always a substantial presence of answers at one end (or, occasionally, both ends) of the
scale. In contrast, Dr. Guéguen’s participants consistently seem to have very moderate views on
almost every topic.

Remarkable degree of experimental control


Dr. Guéguen’s field experiments are characterized by a distinct lack of randomness.
Every single participant solicited in a sample of several hundred takes time out of their day to
interact with the confederate and be debriefed. Of the hundreds of women from whom a male
confederate solicits their phone number, every single one agrees to give him their age. Multiple
confederates set out to importune precise numbers of male and female participants, and always
get exactly the right number of each sex, with no overruns or exclusions. Apparently, nobody
who was stopped by these junior confederates was ever in a hurry, or bad-tempered, or didn’t
believe that the confederate was in fact a psychology researcher (even when, for example, the
confederate had just told a woman he found her very attractive, then changed his story after she
refused to give her telephone number to him).

47
Similar good fortune seems to follow Dr. Guéguen’s laboratory experiments. For
example, when participants volunteer to be involved with a study, it seems that equal numbers of
men and women are always available. Presumably this means either that the sex ratio among
psychology undergraduates at the Université de Bretagne-Sud is very atypical, or that many
female participants are simply rejected, along with the extra statistical power that they would
bring; most psychological researchers, we suspect, would be very glad to have such a luxury.
One presumes that Dr. Guéguen, who has authored a book on statistics for psychologists, is
aware of the statistical techniques that are available to compensate for imbalances in sex among
participants.

Too much information... about the wrong things


In this commentary, we have pointed out—as we came across them in the various articles
under examination—a number of places where apparently trivial features of the method of an
experiment are reported in great detail, while some rather more obvious issues of the real-world
logistics that would be involved are ignored entirely. Again, this is a curious pattern. Most
authors, conscious of the word limits frequently imposed on them by journals, would, we
suggest, not feel the need to report that their confederates were instructed to read a book on the
beach “because we found4 that most of the women (about 85%) who were alone on the beach
read books or magazines when they were lying on their beach towel” (Guéguen, 2013b, p. 1519),
or to describe a vase of flowers as containing “a mix of 10 roses, 15 French marigolds, and 15
daisies” (Guéguen, 2011, p. 106), or to determine and report (incorrectly) the exact HSL values
of various colored T-shirts (Guéguen, 2012c, p. 77). The overall impression is that, while many
of these experiments would make for interesting class discussions in a course on research
methods for how to control for potential confounds in the measured phenomenon (e.g., the
extensive discussion of the measures taken to ensure that the only difference between conditions
was the height of the heels in Guéguen, 2015b), very little thought has been given to the practical

4
Guéguen (2013b) gave no information about the field work that led to the observation of the prevalence
of book-reading among young women lying on the beaches of Brittany. We presume, however, that there
was no shortage of male confederates who volunteered to help the principal investigator in the onerous
task of collecting this information.

48
difficulties of sending confederates out into the messy real world and having them recruit
participants spontaneously.

Ethical approval
In three articles, Guéguen (2012b, 2012d, 2013a) reported that his studies were approved
“by the ethical committee of the laboratory,” citing in each case the reference “CRPCC-LESTIC
EA 1285.” In other articles (e.g., Guéguen 2009a, 2012c, 2012g), he mentioned specific
modifications to the protocol of the studies that had been made at the suggestion of this
committee. It appears that the code “CRPCC-LESTIC EA 1285” refers to the “Laboratoire
d'Ergonomie des Systèmes, Traitement de l'Information & Comportement” of the “Centre de
Recherches en Psychologie, Cognition et Communication,” with EA 1285 being this
department’s “équipe d’accueil” code within the French national university research system (for
more information, in French, see Patron, 2014). In other words, the reference for the ethics
committee’s approval is simply the administrative reference code for the entire laboratory of
which Guéguen is the director, rather than—as the casual reader might imagine—a reference
number for the ethical approval document. It also seems strange that the head of a laboratory
should be undertaking research without neutral ethical approval, for example from a group of
colleagues in another psychological research laboratory within the same institution. This seems
especially important given the nature of these studies, which involved, among other things,
importuning young women in the street, and the covert filming of young female volunteers in
order for the “sexiness” of their walk to be rated. It would be instructive to examine the minutes
of the meetings of this ethics committee with regard to several of the studies that we have
discussed in this commentary.

Publishing practices
We find it rather unusual for a professor to publish so many empirical studies as sole
author, with no authorship conferred to any graduate students, postdocs, or other faculty
members. For example, in this commentary we have reviewed six studies published in 2012
alone; in 2015, Dr. Guéguen appears to have published around 20 sole-authored empirical
articles. This seems to represent a substantial workload for a professor with teaching
commitments and a laboratory to manage.

49
We also note that at no point in any of the articles reviewed in the present commentary
did Dr. Guéguen offer thanks or acknowledgement, either individually or collectively, to the
many participants and confederates—presumably undergraduates for the most part, judging by
their reported ages—who apparently put in many hours of their free time to assist him in his
research. In some studies, several students must have spent an extraordinarily large number of
hours in the field collecting data in messy social situations.
The question of payment leads us to another, broader question: Dr. Guéguen did not
report any source of funding in any of the articles reviewed in this commentary. While we are
not familiar with the practices of research funding in French universities, we presume that some
minimal form of budgetary approval would be necessary, if only to justify the expense claims
associated with entry to nightclubs, drinks for confederates in bars, and transport to and between
beaches. We presume that detailed records of these expenditures, and the names of the
confederates to whom these reimbursements were made, are available for inspection.

50
References5
Amazon, 2015. GOGO tally counter. Retrieved from http://www.amazon.com/GOGO-Tally-
Counter-Manual-Mechanical/dp/B001KX1VW2
Brown, N. J. L., & Heathers, J. A. J. (2017). The GRIM test: A simple technique detects
numerous anomalies in the reporting of results in psychology. Social Psychological and
Personality Science, 8, 363–369. http://dx.doi.org/10.1177/1948550616673876
CCPRhuys (2015). Communauté de Communes de la Presqu'île de Rhuys: Historique. Retrieved
from http://www.ccprhuys.fr/fr/information/47/la-ccprhuys
Cohen, J. (1988). Statistical power analysis for the behavioral sciences (2nd ed.). Hillsdale, NJ:
Lawrence Erlbaum Associates.
Guéguen, N. (2007a). Courtship compliance: The effect of touch on women's behavior. Social
Influence, 2, 81–97. http://dx.doi.org/10.1080/15534510701316177
Guéguen, N. (2007b). Bust size and hitchhiking: A field study. Perceptual and Motor Skills, 105,
1294–1298. http://dx.doi.org/10.2466/pms.105.4.1294-1298
Guéguen, N. (2009a). Menstrual cycle phases and female receptivity to a courtship solicitation:
An evaluation in a nightclub. Evolution and Human Behavior, 30, 351–355.
http://dx.doi.org/10.1016/j.evolhumbehav.2009.03.004
Guéguen, N. (2011). “Say it with flowers”: The effect of flowers on mating attractiveness and
behavior. Social Influence, 6, 105–112. http://dx.doi.org/10.1080/15534510.2011.561556
Guéguen, N. (2012g). Hair color and courtship: Blond women received more courtship
solicitations and redhead men received more refusals. Psychological Studies, 57, 369–
375. http://dx.doi.org/10.1007/s12646-012-0158-6
Heathers, J. A. J., Anaya, J., van der Zee, T., & Brown, N. J. L. (2018, March 5). Recovering
data from summary statistics: Sample Parameter Reconstruction via Iterative TEchniques
(SPRITE). PeerJ Preprints, 6, 26968v1. http://dx.doi.org/10.7287/peerj.preprints.
26968v1
Ioannidis, J. P. A. (2005). Why most published research findings are false. PLoS Medicine, 2(8),
e124. http://dx.doi.org/10.1371/journal.pmed.0020124

5
The reference for each article by Dr. Guéguen that is discussed in its own section appears at the
top of the relevant section.

51
Insée (2015a). Séries historiques des résultats du recensement : Unité urbaine de Lorient
(56601). Retrieved from
http://www.insee.fr/fr/themes/tableau_local.asp?ref_id=TER&millesime=2012&typgeo=
UU2010&search=56601
Insée (2015b). Séries historiques des résultats du recensement : Unité urbaine de Vannes
(56501). Retrieved from
http://www.insee.fr/fr/themes/tableau_local.asp?ref_id=TER&millesime=2012&typgeo=
UU2010&search=56501
Insée (2015c). Estimation de la population au 1er janvier par région, département (1975-2014),
sexe et âge (quinquennal, classes d'âge). Retrieved from
http://www.insee.fr/fr/themes/detail.asp?reg_id=99&ref_id=estim-pop
Ipsos (2010). Les Français et le réchauffement climatique, un mois après le sommet de
Copenhague. Available from
http://sciences.blogs.liberation.fr/files/rapport_les_francais_et_le_rechauffement_climati
que.pdf
Meyer, G. J., Finn, S. E., Eyde, L. D., Kay, G. G., Moreland, K. L., Dies, R. R., ... Reed, G. M.
(2001). Psychological testing and psychological assessment: A review of evidence and
issues. American Psychologist, 56, 128–165. http://dx.doi.org/10.1037/0003-
066X.56.2.128
OECD (2009). Environnement : Véhicules particuliers par habitant. Retrieved from
http://www.oecd-ilibrary.org/sites/reg_glance-2009-fr/05/06/g30-
europe_fr.html?itemId=/content/chapter/reg_glance-2009-34-
fr&_csp_=972bfedeaaf3cf27776af6048e3a8b91
Patron, J. (2014). Les statuts et organisation des laboratoires, structures et pôles de recherche.
Nantes, France: Université de Nantes. Retrieved from https://www.univ-
nantes.fr/75188879/0/fiche___pagelibre/&RH=1291658120656
Richard, F., Bond, C., & Stokes-Zoota, J. (2003). One hundred years of social psychology
quantitatively described. Review of General Psychology, 7,331–363.
http://dx.doi.org/10.1037/1089-2680.7.4.331
Wagner, R. G. (n.d.). Solar radiation and temperature. Retrieved from
http://rowdy.msudenver.edu/~wagnerri/radtemp.htm.

52
Wilner, S., Wood, K., & Simons, D. J. (2018). Complete recovery of values in Diophantine
systems (CORVIDS). Retrieved from
https://osf.io/rvgqk/download/?version=4&displayName=CORVIDS-2018-05-
09T04%3A04%3A35.584Z.pdf

53

You might also like