You are on page 1of 10

See

discussions, stats, and author profiles for this publication at: https://www.researchgate.net/publication/261187495

Evaluation of a rating-based variant of checkall-that-apply questions: Rate-all-that-apply


(RATA)
ARTICLE in FOOD QUALITY AND PREFERENCE SEPTEMBER 2014
Impact Factor: 2.78 DOI: 10.1016/j.foodqual.2014.03.006

CITATIONS

READS

481

9 AUTHORS, INCLUDING:
Leticia Vidal

Rafael Silva Cadena

University of the Republic, Uruguay

Universidade Federal do Estado do Rio de J

30 PUBLICATIONS 193 CITATIONS

25 PUBLICATIONS 413 CITATIONS

SEE PROFILE

SEE PROFILE

Amy G. Paisley

Sara R Jaeger

Plant and Food Research

Plant and Food Research

12 PUBLICATIONS 78 CITATIONS

116 PUBLICATIONS 2,119 CITATIONS

SEE PROFILE

All in-text references underlined in blue are linked to publications on ResearchGate,


letting you access and read them immediately.

SEE PROFILE

Available from: Sara R Jaeger


Retrieved on: 08 January 2016

Food Quality and Preference 36 (2014) 8795

Contents lists available at ScienceDirect

Food Quality and Preference


journal homepage: www.elsevier.com/locate/foodqual

Evaluation of a rating-based variant of check-all-that-apply questions:


Rate-all-that-apply (RATA)
Gastn Ares a,, Fernanda Bruzzone a, Leticia Vidal a, Rafael Silva Cadena a, Ana Gimnez a,
Benedicte Pineau b, Denise C. Hunter b, Amy G. Paisley b, Sara R. Jaeger b
a
b

Departamento de Ciencia y Tecnologa de Alimentos, Facultad de Qumica, Universidad de la Repblica, Gral. Flores 2124, C.P. 11800 Montevideo, Uruguay
The New Zealand Institute for Plant & Food Research Ltd., 120 Mt Albert Road, Private Bag 92169, Auckland, New Zealand

a r t i c l e

i n f o

Article history:
Received 14 January 2014
Received in revised form 11 March 2014
Accepted 17 March 2014
Available online 27 March 2014
Keywords:
CATA
Consumer research
Research methodology
Sensory characterization
Consumer proling

a b s t r a c t
The current research explored the possibility of using attribute ratings as a variant (RATA: rate-allthat-apply) to CATA questions (check-all-that-apply), in order to improve sample description and
discrimination and to engage participants in greater cognitive processing. The RATA question variant
was implemented by asking participants, for the terms they ticked as apply, to indicate intensity (using
a 3-pt scale with anchors low, medium or high) or rate applicability (using a 5-pt scale anchored at
slightly applicable and very applicable). A total of four studies with 328 consumers were conducted.
Studies 13 involved the consumption of products (milk desserts, bread and gummy lollies), whereas
Study 4 was performed with yogurt labels. A between-subjects design was used in all studies to compare
product characterizations from CATA and RATA questions. Across the four studies, compared to the
simple CATA questions, the RATA variant led to an increase in the total number of selected terms and
a small increase in the percentage of terms for which signicant differences among samples were
identied. Although the stability of sample and term congurations from CATA and RATA questions,
calculated using a bootstrapping re-sampling approach, were similar, for two of the four studies RATA
questions provided more stable sample and term congurations. Results from the present work reveal
the potential of intensity-based CATA variants with consumers for sensory product characterization,
but also suggest that these may be study and sample specic.
2014 Elsevier Ltd. All rights reserved.

1. Introduction
1.1. Motivation for the research
Demand for consumer-based methods which deliver sensory
product characterizations is growing (Valentin, Chollet, Lelivre,
& Abdi, 2012; Varela & Ares, 2012). Several approaches have been
developed, including check-all-that-apply (CATA) questions, in
which consumers are presented with a list of terms and asked to
select all those that apply to the focal sample (Adams, Williams,
Lancaster, & Foley, 2007; Meyners & Castura, 2014). Despite their
recent introduction to sensory and consumer science, CATA questions have already been widely applied and tested products include snacks, fruits, chocolate, milk desserts, crackers, dips,
avoured water, potato chips, beer, ice-cream, orange-avoured
powdered drinks, whole grain breads, citrus-avoured sodas, and

Corresponding author. Tel.: +598 29248003; fax: +598 29241906.


E-mail address: gares@fq.edu.uy (G. Ares).
http://dx.doi.org/10.1016/j.foodqual.2014.03.006
0950-3293/ 2014 Elsevier Ltd. All rights reserved.

cosmetics (Adams et al., 2007; Ares & Jaeger, 2013; Ares, Varela,
Rado, & Gimnez, 2011; Dooley, Lee, & Meullenet, 2010; Jaeger,
Chheang et al., 2013, Jaeger, Giacalone et al., 2013; Meyners,
Castura, & Carr, 2013; Parente, Manzoni, & Ares, 2011; Plaehn,
2012). Compared to projective mapping and sorting, also novel
methods for consumer-derived sensory characterization, an advantage of CATA questions is the structured format which enables
collection and analysis of data from large consumer samples easily
and quickly (Ares & Varela, 2014).
Past research has shown that sensory product characterizations
elicited from consumers using CATA questions are reliable and
comparable to those generated by trained assessors (Ares, Barreiro,
Deliza, Gimnez, & Gmbaro, 2010; Bruzzone, Ares, & Gimnez,
2012; Dooley et al., 2010; Jaeger, Chheang et al., 2013). Selfreported measures from consumers conrm that CATA questions
are perceived as easy to complete and not tedious (Ares et al.,
2013), and as a result of methodological investigations involving
CATA questions, pros and cons of this question format are being
uncovered (Ares & Jaeger, 2013; Ares, Etchemendy et al., 2014;
Ares, Trrega, Izquierdo, & Jaeger, 2014; Ares et al., 2013; Jaeger,

88

G. Ares et al. / Food Quality and Preference 36 (2014) 8795

Chheang et al., 2013; Jaeger, Giacalone et al., 2013; Lee, Findlay, &
Meullenet, 2013).
The simplicity of CATA questions is a key advantage, but also a
potential limitation. The binary response format does not allow for
a direct measurement of the intensity of the evaluated sensory
attributes, which could hinder detailed descriptions and discrimination between products that have similar proles in terms of their
characteristic sensory attributes. Different formulations of milk
desserts during new product development is a case in point. All
the formulations can be described by their thickness, creaminess,
sweetness and vanilla avor, but they primarily differ in the relative intensities of those sensory characteristics. Would it be possible, on the basis of data generated by consumers using CATA
questions to accurately characterize and discriminate among such
different formulations? Alternatively, consider orange juice, which
may be smooth or contain fruit pulp. While it is easy to describe
these attributes and differentiate between such juices, for example
using a CATA term like no pulp, it is less straightforward to describe and differentiate juices that contain more/less pulp. Another
example relating to orange juice is sweet-acid balance, which has
been reported as a key factor differentiating commercial brands
(Kim, Lee, Kwak, & Kang, 2013; Olsen, Menichelli, Meyer, & Ns,
2011). Extending beyond milk desserts and orange juice, to foods
and beverages in general, the current research addresses the
question of whether it is possible to elicit intensity-based sensory
characterizations with consumers using CATA-style questions.
Evidence pointing to the ability of consumers to reliably score
sensory attribute intensity exists (Ares, Bruzzone, & Gimnez,
2011; Husson, Le Dien, & Pags, 2001; Worch, L, & Punter, 2010)
and intensity scales are being used in consumer studies (Popper,
Rosenstock, Schraidt, & Kroll, 2004). Consumers are also being
asked to provide intensity-based measures in the hedonic domain,
commonly by means of Just-about-right (JAR) questions (Popper,
2014) and more recently in the Ideal Prole Method, where measures of the perceived and ideal intensity of selected sensory attributes are elicited (Worch, Crine, Gruel, & L, 2014). Hence,
exploration of CATA question variants which allow for intensitybased responses has merit.
In CATA questions included multiple terms for a single attribute
which vary in intensity (e.g., chocolate: weak, chocolate:
strong, sweet: high, no strawberry avour, very sweet,
not very sweet) have been previously used (Ares, Trrega,
Izquierdo, & Jaeger, 2014; Jaeger, Chheang et al., 2013) and have
been found sub-optimal, because such intensity-based terms are
used less reliably by consumers (Jaeger, Chheang et al., 2013). Further, we have anecdotal evidence suggesting that some consumers
nd it confusing when the low-intensity version of the attribute
appears before the high-intensity version (e.g., sweet: low
appears before sweet: high in the list of CATA terms). This term
ordering frequently should occur in light of the recommendation to
use experimental designs for CATA terms that are balanced for presentation order (Ares & Jaeger, 2013; Ares, Etchemendy et al.,
2014). Further, because there is a tendency to give participants
few instructions on how to complete a CATA question (Ares
et al., 2013; Meyners & Castura, 2014), it is not clear what the response should be in the instance where sweet: high has been
ticked for a sample and the term sweet: low appears further
down the list of CATA terms. Our intention as experimenters would
be for only one of these terms to be ticked, to indicate whether the
sample was not very sweet or very sweet. However, a participant
may reasonably assume that for a sample where sweet: high
applies, sweet: low would also apply.
An alternative approach to obtaining intensity-based responses
using a CATA-style question was reported by Reinbach, Giacalone,
Ribeiro, Bredie, and Frst (2014). These authors asked participants
to answer yes or no for seven avour attributes of beer and

when selecting yes to indicate the intensity of the focal attribute


using a 15-point intensity scale anchored with very weak and
very strong. In line with other CATA question variants (Ennis &
Ennis, 2013; Smyth, Dillman, Christian, & Stern, 2006), the
rationale for this approach was, in part, driven by desire to limit
satiscing response strategies by participants. This is a documented limitation of CATA questions, whereby consumers reduce
the cognitive effort required to complete a task (Krosnick, 1999;
Rasinski, Mingay, & Bradburn, 1994; Sudman & Bradburn, 1992).
In this context, the current research aims at exploring the possibility of using attribute ratings (RATA: rate-all-that-apply) as a variant to CATA questions, in order to engage participants in greater
cognitive processing and improve sample description and discrimination. This approach has been recommended by Ng, Chaya, and
Hort (2013) for increasing the discriminative ability of CATA
questions when measuring consumers emotional response to food
products.
1.2. Aims and research strategy
The RATA question variant was implemented by asking participants, for terms they selected as applies, to rate intensity (using a
3-pt scale with anchors low, medium or high) or rate applicability (using a 5-pt scale anchored at slightly applicable and very
applicable). We considered fewer rather than more scale points
on the scales to be preferred in accordance with a key advantage
of CATA questions, which is that they are easy for consumers to
complete.
Comparisons were performed on data from simple CATA
questions and RATA variants, and drawing on research strategies
previously employed in methodological research relating to CATA
questions (e.g., Ares & Jaeger, 2013; Ares et al., 2013) analyses pertained to: (i) frequency of term selection, (ii) sample discrimination,
(iii) sample and term congurations, and (iv) stability of sample and
term congurations. With RATA questions, the generation of attribute ratings provided the opportunity to create summed indices of
responses that take into account the intensity of attributes, hereby
possibly adding to the discriminative ability of the RATA questions
(hereafter RATA-scoring). Hence, when performing the analyses,
comparisons were made between CATA, RATA and RATA-scoring
(where RATA refers to analyses of checked attributes in the RATA
format, without taking attribute intensity into account).
A nal component to the research was the use of self-report
questions concerning the task. It is possible, that by asking
consumers to focus on attribute intensity, in addition to presence/absence, that the perceived ease of the task decreases and
its tediousness increases. To monitor such an eventuality, selfreport questions, previously used in methodological CATA research
(Ares et al., 2013; Jaeger, Giacalone et al., 2013), were used in two
of the four studies and compared across the two question formats
(i.e., CATA vs. RATA).
2. Materials and methods
A total of four studies were conducted, in which 328 consumers
took part. Studies 13 involved the consumption of products (milk
desserts, bread and gummy lollies), whereas Study 4 was performed with yogurt labels. A between-subjects design was used
in all studies to compare product characterizations from simple
CATA and a rate-all-that-apply variant (RATA). Table 1 provides
an overview of the studies.1
1
In Studies 2 and 3 data were collected as part of sessions that featured multiple
tasks including several product categories and research methods. Only data relevant
to the aims of this research are included.

89

G. Ares et al. / Food Quality and Preference 36 (2014) 8795


Table 1
Overview of the four studies comparing check-all-that-apply (CATA) questions and a rate-all-that-apply (RATA) variant for product characterization.
Study ID

Number of
consumers in the test

Number of consumers who completed


the task using CATA (RATA) questions

Product
category

Number of
samples

Number of
sensory terms

Type of scale used


in the RATA approach

1
2
3
4

100
134
134
94

50
68
68
50

Milk desserts
Sliced bread
Gummy lollies
Yogurt labels

7
5
5
5

18
15
15
18

5-Point
3-Point
3-Point
5-Point

(50)
(66)
(66)
(44)

2.1. Participants
Four consumer studies were conducted, each with 94134 participants (Table 1). Studies 1 and 4 were conducted in Montevideo
(Uruguay), whereas Studies 2 and 3 were conducted in Auckland
(New Zealand). Participants in Study 2 also took part in Study 3.
In Uruguay participants were recruited from the consumer
database of the Food Science and Technology Department of Universidad de la Repblica (Uruguay), based on their consumption
of the focal products. In New Zealand participants were registered
on a database maintained by a professional recruitment rm and
were screened in accordance with eligibility criteria for each of
the studies. Participants gave informed consent and were compensated for their participation.
Participants were aged between 18 and 60 years old and the
percentage of female participants ranged from 60% to 69%. The
consumer samples comprised varying household compositions, income levels, education levels, etc. but were not necessarily representative of the general populations in Montevideo and Auckland.
2.2. Samples
Four product categories were tested (Table 1). All samples in
Studies 13 were commercially available in Uruguay or New Zealand and had been purchased from local supermarkets. Different
brands of milk desserts with low sugar content were used in Study
1. In Study 2 the sliced bread samples (national brands) were made
with different types of our (white, wholemeal, rye, gluten-free),
and contained different amounts and types of seeds (e.g., poppy,
linseed, sunower, pumpkin). Gummy lollies used in Study 3 were
branded and unbranded products, which varied in characteristics
such as sugar coating, liquid centre, and softness. In Studies 13,
serving sizes were always sufcient to allow 23 bites/sips per
sample. In Study 1 samples were presented at 10 C, while samples
in Study 23 were presented at room temperature. Odour-free
plastic containers were used as serving vessels.
Samples in Study 4 were yogurt labels, printed in high quality
and colour on 5  10 cm glossy paper. The labels, which were
designed by a graphic designer with previous experience in the
design of food labels for industry, had different combinations of
colours, main image and general design. The only text included
was Plain Yogurt (Yogur Natural in Spanish), in different typographies and colours.
2.3. Experimental treatments, sensory terms and data collection
The procedure for data collection in Studies 14 was similar. Between-subjects experiments were always used, comparing responses from two experimental treatments: CATA questions and
a rate-all-that-apply (RATA) variant. Approximately half of the participants were randomly assigned to each of the experimental
treatments (Table 1). One experimental treatment was CATA,
meaning that participants in this group were asked to check all
the terms that they considered appropriate to describe each sample. The RATA questions were implemented slightly differently in
the four studies. In Studies 2 and 3 consumers were asked to check

applicability
intensity
intensity
applicability

the terms they considered appropriate for describing samples and


then to rate the intensity of the applicable terms using a 3-point
structured scale (low, medium and high). In Studies 1 and 4
consumers were asked to rate the applicability of those terms
deemed applicable to describe samples, using a 5-point scale with
1 = slightly applicable and 5 = very applicable as end-point anchors. Consumers were asked to leave the scale blank in the case
of non-applicable terms.
The sensory terms used in each study were based on pilot work
or previous research using the same product categories. The lists of
terms comprised 1518 terms and covered multiple sensory
modalities (appearance, aroma, avour/taste, texture, after taste,
mouth feel). Specically, in Study 1 the following terms were used
to characterise milk dessert samples: aftertaste, caramel avour,
creamy, rm, gummy, heterogeneous, homogeneous, liquid, lumpy, milky avour, off-avour, rough, runny, smooth, sweet, tasteless, thick, and vanilla avour. In Study 2, for sliced bread
samples, the terms were: breaks apart easily, brown colour, chewy,
crunchy seeds, dense, doughy mouthfeel, dry, light, moist, off-avour, salty, seedy, soft, sticks in teeth, and sweet. The terms in
Study 3 (gummy lollies) were: articial avour, berry, chewy, cola,
dissolving, hard, intense, jelly-like, lemon/orange, liquid centre,
mouth puckering, soft, sour, sticks in teeth, and sweet. Finally, in
Study 4 (yogurt labels) the terms were: articial, attractive, creamy, fresh, healthy, light, liquid, milky avour, natural, novel, nutritious, smooth, sour, sweet, tasteless, thick, traditional and yummy.
Based on recommendations by Ares, Etchemendy et al. (2014),
Ares, Trrega, Izquierdo, & Jaeger (2014), the order in which the
terms were listed (both experimental treatments) was different
for each product and each participant, following a design balanced
for presentation order (Williams Latin Square).
All samples were labelled with 3-digit random codes. Products
were presented sequentially in accordance with designs that were
balanced for presentation order and carry-over effects (Williams
design). Samples in Studies 13 could be tasted more than once. In
Studies 2 and 3 a break of 1 min was forced between each sample.
Data collection took place in standard sensory booths, under white
lighting, controlled temperature (23 C) and airow conditions.
In Studies 2 and 3, participants answered two Likert questions
immediately following completion of the study: (i) it was easy to
answer the questions about these samples; and (ii) it was tedious
to answer the questions about these samples. The labelled 7-point
scale was anchored at 1 = disagree extremely and 7 = agree
extremely.
For classication purposes participants age, gender, and
frequency of consumption of the focal products were recorded. In
all instances differences between the participant proles of the
experimental treatment groups were non-signicant (p > 0.08).
Hence, it is possible to infer that differences between experimental
treatments may be linked to differences in study protocol, as
opposed to differences in group characteristics.

2.4. Data analysis


The procedure for data analysis in Studies 14 was similar. For
each experimental treatment (i.e., CATA or RATA questions),

90

G. Ares et al. / Food Quality and Preference 36 (2014) 8795

frequency of use of each sensory attribute was determined by


counting the number of consumers that used that term to describe
each sample. The RATA approach facilitated two approaches to
analysis: frequency of selection only or weighted frequency of
selection (RATA scoring). In this second approach the points of
the scale were assigned to numbers in increasing order of intensity
(i.e. 1 = low, 2 = medium, 3 = high) or applicability (1 = slightly
applicable, 5 = very applicable). For each sample and term, RATA
scores were calculated by summing up the scores provided by the
consumers who selected that term as applicable for describing that
sample.
Fishers exact test (Fisher, 1954) was used to determine significant differences between experimental treatments in the total
number of terms used by consumers to describe the whole sample
set, and differences in the frequency of use of each term.
Cochrans Q test (Manoukian, 1986) was carried out to identify
signicant differences among samples for each of the sensory
terms, for each experimental treatment. Friedmans test was used
separately on data from RATA scoring data to identify signicant
differences among samples for each of the sensory terms (Siegel,
1956).
Correspondence Analysis (CA) was performed on the frequency
table from each experimental treatment. CA was performed considering Hellingers distances, as recommended by Meyners et al.
(2013). For RATA data this analysis was carried out both on the frequency table containing the number of consumers who used each
term for describing each sample and on the sum of scores given by
all consumers to each term for describing each sample (RATA scoring). CA based on Hellinger distance was applied on RATA scoring
data to enable the comparison of sample and term congurations
with the CATA and RATA counts data. Rao (1995) has previously
applied this statistical technique to data matrices that do not correspond to frequency tables, given that they contain positive
values.
Similarity between the sample and term congurations in the
rst two dimensions, obtained using data from CATA and RATA
questions, was evaluated using the RV coefcient (Robert &
Escouer, 1976). The signicance of the RV coefcient was tested
using a permutation test (Josse, Husson, & Pags, 2007).
The stability of sample and term congurations from CATA and
RATA questions was evaluated using a bootstrapping re-sampling
approach (Ares, Trrega, Izquierdo, & Jaeger, 2014). The bootstrapping process consisted of extracting random subsets of different
size (m = 5, 10, 20, 30,. . ., 50,. . ., N) from the original data with N
consumers, using sampling with replacement. For each m, 1000
random subsets were obtained. For each subset the frequency table
corresponding to the data of the selected assessors was computed
and CA was performed. The agreement between sample congurations in the rst two dimensions of the CA and the reference conguration (obtained with all the consumers) was evaluated by
computing the RV coefcient between their coordinates (Abdi,
2010). A similar approach was used for term congurations. Average values (and standard deviations) for the 1000 random subsets
of size equal to the total number of consumers in each study (N)
were calculated and used as an index of stability. Also, the minimum number of consumers needed to reach an average
RV = 0.95 was determined (Blancher, Clavier, Egoroff, Duineveld,
& Parcon, 2012), as the benchmark to assess the stability of sample
and term congurations from CA.
Analysis of variance (ANOVA) was carried out for ease of test
and tediousness scores from Studies 2 and 3, considering experimental treatment as xed source of variation, while consumer
(within experimental treatment) was specied as a random effect.
All statistical analyses were performed using R language (R Core
Team, 2013). FactoMineR was used to calculate RV coefcients (L,
Josse, & Husson, 2008).

3. Results
3.1. Frequency of use of sensory terms and attribute ratings
Results related to frequency of use of sensory terms are shown
in the upper part of Table 2. Across the four studies, consumers
used a signicantly larger number of sensory terms (p < 0.0001)
for describing samples when answering the rate-all-that-apply
(RATA) variant than when using simple CATA questions. As shown
in Table 2a, consumers selected an average of 2737% of the terms
to describe samples using CATA questions, whereas the average
number of selected terms ranged from 36% to 52% when the RATA
variant was used. The average increase in the number of terms ranged from 9% for Study 3, to 90% for Study 4 (Table 2b). The average
increase in the frequency of use of terms when the RATA variant
was used ranged from 11% to 86% (Table 2d). Frequency of use signicantly increased for 794% of the terms (Table 2c). Overall,
across all measures, the RATA variant was associated with increased use of sensory terms relative to the simple CATA format.
Regarding the distribution of intensity/applicability scores used
by participants who completed RATA questions, consumers in
Studies 2 and 3 tended to use the three points of the scale (i.e.,
low, medium and high) with a similar frequency to rate the
intensity of the terms deemed as applicable for describing samples.
The frequency of use of the three points of the intensity scale ranged from 27% to 40%. In Studies 1 and 4 consumers tended to use
more frequently the top 3 points of the scale (i.e., more applicable)
(average frequency of mention 75%) than the 2 bottom points for
rating the applicability of the terms (i.e., less applicable). This suggested that consumers selected terms they regarded as clearly
applicable to describe samples.
3.2. Differences among samples
Cochrans Q test was used to determine signicant differences
among samples for each of the focal sensory terms, and results
are summarised in the middle part of Table 2. The percentage of
terms for which signicant differences among samples were identied ranged from to 28% to 94% when CATA questions were used
and from 39% to 100% when the RATA variant was used (Table 2e).
In Study 2, conclusions regarding similarities and differences
among samples did not differ between question formats. However,
different conclusions were reached for 633% of the terms in Studies 1, 3 and 4 (Table 2f).
Analysing RATA data by accounting for the score assigned to a
focal attribute (RATA scoring) did not change ability to discriminate among samples. As shown in Table 2e, the percentage of
terms with signicant differences did not differ when considering
frequency of use or scores for analyzing data from the RATA
question.
3.3. Sample and term congurations from Correspondence Analysis
The percentage of variance explained by the rst two dimensions of Correspondence Analysis was higher than 70% for all
methodologies in the four studies. No large differences between
methodologies were found, although the rst two dimensions
tended to explained a larger percentage of the variance when Correspondence Analysis was performed on data from RATA questions
compared to usual CATA (Table 2g).
As shown in the lower part of Table 2 (2h and 2i), sample
congurations were highly similar considering data from CATA
and RATA questions. The RV coefcients between sample congurations in the rst and second dimensions of the Correspondence
Analysis from CATA questions and the RATA variant were

91

G. Ares et al. / Food Quality and Preference 36 (2014) 8795


Table 2
Summary of results for the comparison of sensory characterizations with consumers obtained with CATA questions and a rate-all-that-apply (RATA) variant in Studies 14.
Study ID

Term usage
(a) Average percentage of terms used to describe samples (#)
(b) Average increase in the number of terms used for describing
samples when using RATA variant
(c) Percentage of terms which frequency of use signicantly
increased when using the RATA variant (p < 0.05)
(d) Average increase in the frequency of use of the terms
when using the RATA variant
Sample differences
(e) Percentage of terms with signicant differences
among samples (p < 0.05)
(f) Percentage of terms for which different conclusions were
drawn using CATA and RATA
Sample congurations
(g) Percentage of variance explained by the rst two
dimensions

(h) RV between sample congurations obtained from


Correspondence
Analysis of data from CATA and RATA questions
(i) RV between term congurations obtained from
Correspondence
Analysis of data from CATA and RATA questions
(j) RV between sample congurations obtained from
Correspondence
Analysis of data from CATA and RATA questions (scoring)
(**)
(k) RV between term congurations obtained from
Correspondence
Analysis of data from CATA and RATA questions (scoring)
(**)

1-Milk desserts ()

2-Sliced bread(/)

3-Gummy lollies (/)

4-Yogurt labels ()

CATA: 31%a
RATA: 36%b
14%

CATA: 32%a
RATA: 40%b
21%

CATA: 37%a
RATA: 41%b
9%

CATA: 27%a
RATA: 52%b
90%

33%

53%

7%

94%

18%

26%

11%

86%

CATA: 28%
RATA: 39%
RATA scoring ($): 39%
33%

CATA: 93%
RATA: 93%
RATA scoring ($): 93%
0%

CATA: 93%
RATA: 100%
RATA scoring ($): 100%
7%

CATA: 94%
RATA: 100%
RATA scoring ($): 100%
6%

CATA: 77.3%
RATA: 80.7%
RATA scoring ($):
85.1%
0.90**

CATA: 80.4%
RATA: 86.4%
RATA scoring ($):
84.3%
0.97***

CATA: 74.9%
RATA: 78.9%
RATA scoring ($):
77.5%
0.97***

CATA: 91.7%
RATA: 93.1%
RATA scoring ($):
92.7%
0.82*

0.67***

0.93***

0.94***

0.82***

0.92**

0.99***

0.97***

0.81*

0.72***

0.94***

0.94***

0.83***

Type of scale used in the RATA approach: 5-point applicability scale (), 3-point intensity scale (/). (#) Percentages with different letters are signicantly different at p 6 0.05,
according to Fishers exact test. ($) Indicates that RATA data were analysed by creating a summed index of the scores provided by all participants for each of the terms of the
question.
*
Indicates that the RV coefcient is signicant at p 6 0.05.
**
Indicates that the RV coefcient is signicant at p 6 0.01.

signicant for the four studies and reached values higher than 0.80.
RV coefcients did not vary largely when considering frequency of
use or scores in the RATA variant.
The RV coefcients between term congurations in the rst and
second dimensions of the Correspondence Analysis were lower
than those from sample congurations (Table 2j and k). However,
the RV coefcients were higher than 0.80 for Studies 24. For Study
1 the RV coefcient of term congurations from CATA and RATA
questions was lower (0.67 and 0.72, respectively) but reached signicance (p < 0.001). The lower RV coefcients found in Study 1
can be attributed to the fact that differences among samples were
smaller than in the rest of the studies. Further research is needed to
study the inuence of degree of difference among samples on the
performance of CATA and RATA questions.
3.4. Stability of sample and term congurations
For both question formats the RV coefcient of sample and term
congurations increased with increasing number of consumers in
the virtual panel, as was expected. Using Studies 1 and 2 as exemplars (milk desserts and sliced bread, respectively), Fig. 1 shows
the evolution of the average RV coefcients between the congurations of virtual panels of different sizes and the reference conguration as a function of the number of consumers for the CATA and RATA
questions, considering for the latter both frequency of use and

scores. In Study 1 the stability of sample congurations were similar


for frequency data of CATA and RATA questions, while RATA scoring
data tended to provide higher RV coefcients (Fig. 1(a)). Regarding
term congurations, RATA scoring tended to give the highest RV
coefcients. RATA data provided similar RV coefcients to CATA
for less than 20 consumers and similar RV coefcients to RATA scoring for 20 or more consumers (Fig. 1(b)). A similar trend was found in
Study 2 for term congurations (Fig. 1(d)). In this study RV coefcients from RATA questions tended to be higher than those from
CATA questions, regardless of considering frequency of use or scores
for RATA data (Fig. 1(c)). Also, standard deviations of RV coefcients
tended to be lower for RATA than for CATA questions.
As shown in Table 3, the average RV coefcient of sample and
term congurations, calculated for a sample size equal to the total
number of consumers in the test were similar for CATA and RATA
questions in Studies 3 and 4. However, for Studies 1 and 2 the average RV coefcient tended to be higher for RATA than for CATA
questions. In Study 1, the highest RV coefcients were obtained
when RATA data were analyzed considering attribute scores. The
minimum number of consumers needed to reach stable sample
congurations (i.e., an average RV coefcient of 0.95 or above)
was similar for CATA and RATA questions in Studies 3 and 4;
whereas in Studies 1 and 2 it was lower for RATA than for CATA
questions. It is interesting to note that in Study 1 stable sample
congurations were only reached when RATA data were analyzed

92

G. Ares et al. / Food Quality and Preference 36 (2014) 8795

(a)

(b)

Fig. 1. RV coefcient of sample and term congurations with respect to the reference conguration as a function of the number of consumers considered in the re-sampled
virtual panel for CATA, and RATA data, considering frequency of use of the terms and attribute scores, for Study 1 ((a) and (b), respectively) and Study 2 ((c) and (d),
respectively). Vertical bars correspond to standard deviations. The horizontal line represents the stability criterion, RV = 0.95.

considering attribute scores (Table 3). Stable term congurations


were not established in this study.
3.5. Ease and tediousness of the task
In Studies 2 and 3, the CATA and RATA question formats were
also compared in terms of perceived ease and tediousness. Significant differences in ease of test measures were not established
(p > 0.22). On average participants agreed strongly that both
tasks were easy (average = 6.1, standard deviation = 0.9), and these
values are comparable with previous studies (Ares et al., 2013;
Jaeger & Ares, 2014). However, in both studies CATA questions were
perceivedas a bit less tediousthan RATAquestions (p < 0.03), but with
effect sizes of no practical signicance. On average, consumers
disagreed or disagreed strongly with the fact that the tasks were
tedious (CATA: average = 2.3, standard deviation = 1.2; RATA: average = 2.4, standard deviation = 1.2). These values are also comparable
to previous reports (Ares et al., 2013; Jaeger & Ares, 2014).
4. Discussion
The present work explored the use of CATA question variants
wherein measures of attribute intensity are obtained. Across four
consumer studies it was found that, compared to simple CATA

questions, asking consumers to rate the intensity or applicability


of the terms after selecting them as appropriate for describing
samples (i.e., RATA variant) leads to an increase in the total number
of selected terms. This result may indicate that the RATA variant
required greater cognitive effort than simple CATA questions, discouraging the use of satiscing response strategies by consumers.
Rasinski et al. (1994) and Smyth et al. (2006) reported a similar result when asking respondents to answer yes or no to each of
the terms included in the CATA question, pointing to deeper
processing when completing the task. Jaeger et al. (2014) reported
a similar nding when comparing CATA and forced-choice Yes/No
question formats for sensory product characterization with
consumers. Across seven consumer studies the average increase
in the number of terms used for describing samples when using
forced-choice questions ranged from 29% to 61%.
In addition to differences in attribute citation frequency, the
percentage of terms in which signicant differences among
samples were identied was higher for the RATA question variant
in three of the four studies, suggesting greater discriminative
capacity. However, despite these differences, RATA and CATA question formats led to the same conclusions about similarities and
differences among samples for the majority of the sensory terms.
The similarity between term congurations from RATA and
CATA questions was also high, suggesting that the way in which

93

G. Ares et al. / Food Quality and Preference 36 (2014) 8795

(c)

(d)

Fig. 1 (continued)

Table 3
Number of consumers needed to reach an average RV coefcient of sample and term congurations from Correspondence Analysis equal to 0.95 and average RV coefcient of
sample and term congurations for a sample size equal to the total number of consumers, obtained via a bootstrapping re-sampling approach for check-all-that-apply (CATA) and
a rate-all-that-apply (RATA) variant.
Parameter

Methodology

Study ID
1-Milk
desserts ()

Average RV of sample congurations coefcient across simulations for the total


number of consumers

CATA
RATA
RATA scoring
($)

0.922
0.943
0.969

Minimum number of consumers necessary to reach an RV coefcient of sample


congurations of 0.95

CATA
RATA
RATA scoring
($)

N/A
N/A
37

Average RV of term congurations coefcient across simulations for the total


number of consumers

CATA
RATA
RATA scoring
($)

0.690
0.838
0.851

Minimum number of consumers necessary to reach an RV coefcient of term


congurations of 0.95

CATA
RATA
RATA scoring
($)

N/A
N/A
N/A

2-Sliced
bread (/)
0.989
0.997
0.995
20
8
10
0.978
0.986
0.987
40
23
25

3-Gummy
lollies (/)
0.988
0.988
0.988
20
21
19
0.980
0.979
0.980
35
36
34

4-Yogurt
labels ()
0.992
0.993
0.993
12
11
11
0.960
0.959
0.959
45
38
38

Type of scale used in the RATA approach: 5-point applicability scale (), 3-point intensity scale (/). N/A indicates that a RV coefcient of 0.95 was not reached in the
bootstrapping re-sampling approach. ($) Indicates that RATA data were analysed by creating a summed index of the scores provided by all participants for each of the terms of
the question.

94

G. Ares et al. / Food Quality and Preference 36 (2014) 8795

consumers used some of the terms to describe samples did not largely differ between the methodologies. Reinbach et al. (2014), in a
study with beers, also found high agreement between sample congurations and no improvement in discrimination among samples
in their comparison of simple CATA questions and CATA questions
augmented with attribute intensity ratings.
The stability of sample and term congurations from CATA and
RATA questions, calculated using a bootstrapping re-sampling
approach, were similar for two of the four studies. However, in
two instances RATA questions provided more stable sample and
term congurations. In Study 1, where sample differences were
relatively small, the highest RV coefcients were obtained with
RATA scores, suggesting that asking consumers to rate the terms
considered applicable for describing samples can help to stabilize
sensory spaces. The number of consumers required to reach stable
sample and term congurations were comparable to those
reported by Ares, Trrega, Izquierdo, & Jaeger (2014) and it was
also commensurate with the results by these authors that Study
1, where sample differences were small was less stable.
Regarding self-reported measures of ease of task, no signicant
differences between methodologies were found. However, the
RATA variant was perceived as more tedious than CATA questions,
although the difference was small. It remains to be seen if this result is robust. In a comparison of CATA and forced-choice Yes/No
questions Jaeger et al. (2014) found no differences in task ease/
tediousness scores.
An interesting outcome of the current research is that considerable heterogeneity existed among consumers in the use of intensity/applicability scores when completing RATA questions. When
a term was considered applicable for describing samples, consumers tended to use all the points of the scale, suggesting lack of
consensus in scoring. Similar heterogeneity in intensity scores by
consumers has previously been reported by Ares, Bruzzone, &
Gimnez (2011), Ares, Varela, Rado, & Gimnez (2011). However,
despite heterogeneity being present, it was found in the current
research that the RATA format was superior to CATA questions in
several instances, without being a more difcult and tedious task
for consumers to take part in.
The current research has revealed the potential of intensitybased CATA variants with consumers for sensory product characterization. Additional conrmation of this potential is required,
with other product categories, samples that are more subtly different, and more diverse consumer populations. Assuming that
the current results are valid, a question arises regarding the use
of RATA questions concurrently with hedonic evaluation of samples. Jaeger, Chheang et al. (2013), Jaeger, Giacalone et al.
(2013) and Jaeger and Ares (2014) recently showed that simple
CATA questions, when used concurrently with acceptability ratings are unlikely to cause hedonic bias. However, it is possible
that asking consumer to focus on attribute intensity is more likely
to induce an analytical mind set, which Prescott, Lee, and Kim
(2011) suggests is linked to hedonic bias. In a similar vein, it
has been suggested that JAR questions (which also are intensitybased) are more likely than simple CATA questions to be associated with hedonic bias (Adams et al., 2007). Thus, while RATA formats may deliver more nuanced sensory characterizations,
improved sample discrimination and more stable sample and
term congurations, they may not be ideal when used concurrently with hedonic questions. Future research will be needed
to answer this question.
Considering that the RATA approach may be an improvement
over usual CATA questions for sensory product characterization,
further research should aim at comparing different approaches
for implementing the methodology and analyzing the obtained results. Research on these topics may contribute to the development

of guidelines for the implementation RATA questions for sensory


characterization.
In this sense, the comparison of applicability and intensity ratings is one of the methodological aspects that should be addressed.
Considering that different cognitive processes may be involved in
the evaluation of attribute applicability and intensity (Lawless,
1999), both approaches may provide different sensory characterizations. For this reason, the best way of implementing RATA questions may depend on the specic product category being tested.
Attribute intensity measures may be more relevant for simple
products (e.g. milk desserts), while applicability measures may
be more appropriate for sensory characterizations of complex
products, such as wine.
Regarding data analysis, although the present work used CA
based on Hellinger distances on the summed scores, other
approaches are possible. One of the possibilities is the use of
Dravnieks applicability scores (Dravnieks, 1982). This author proposed the estimation of the percentage of applicability of a descriptor as the geometric mean of the percentage of assessors that used
the descriptor to describe a sample times the percentage of maximum summed applicability score given to that sample. The choice
of data analysis method for RATA data may depend on what rating
scale is used (i.e. applicability or intensity). Dravnieks applicability
scores may be better suited for applicability than for intensity
scores but other approaches may be applied to intensity scores.
5. Conclusion
The line between trained sensory panels and consumer based
sensory evaluation is blurring and increasingly the ability of consumers to provide product characterisation is being recognised.
This research has contributed to this development by comparing
the use of CATA questions by consumers for product characterisations and a variant hereof that elicits attribute ratings (RATA).
While simple CATA questions and the RATA variants performed
similarly, there was some evidence of superior sample discrimination and conguration stability when RATA questions were used,
and especially when RATA data were analysed as weighted
attribute scores. Hence, the research points to the potential of
intensity-based variants of CATA questions, and future research
should explore these in greater detail.
Author contributions
GA, SRJ and FB conceived and planned the study and wrote the
paper. GA and LV analysed the data. All other authors contributed
to data collection.
Acknowledgements
Staff at Plant & Food Research are thanked for help in planning
and collection of data, in particular S.L. Chheang, D. Jin, M.K. Beresford, and K. Kam. Financial support for studies conducted in
Uruguay was received from CAPES-Brasil, Comisin Sectorial de
Investigacin Cientca (Universidad de la Repblica, Uruguay).
For studies conducted in New Zealand, nancial support was
received from The New Zealand Ministry for Business, Innovation
& Employment and Plant & Food Research.
References
Abdi, H. (2010). Congruence congruence coefcient, RV coefcient, and mantel
coefcient. In N. J. Salkind, D. M. Dougherty, & B. Frey (Eds.), Encyclopedia of
research design (pp. 222229). Thousand Oaks (CA): Sage.

G. Ares et al. / Food Quality and Preference 36 (2014) 8795


Adams, J., Williams, A., Lancaster, B., & Foley, M. (2007). Advantages and uses of
check-all-that-apply response compared to traditional scaling of attributes for
salty snacks. In: 7th Pangborn Sensory Science Symposium. Minneapolis, USA, 12
16 August, 2007.
Ares, G., Barreiro, C., Deliza, R., Gimnez, A., & Gmbaro, A. (2010). Application of a
check-all-that-apply question to the development of chocolate milk desserts.
Journal of Sensory Studies, 25, 6786.
Ares, G., Bruzzone, F., & Gimnez, A. (2011a). Is a consumer panel able to reliably
evaluate the texture of dairy desserts using unstructured intensity scales?
Evaluation of global and individual performance. Journal of Sensory Studies, 26,
363370.
Ares, G., Varela, P., Rado, G., & Gimnez, A. (2011b). Identifying ideal products using
three different consumer proling methodologies. Comparison with external
preference mapping. Food Quality and Preference, 22, 581591.
Ares, G., & Jaeger, S. R. (2013). Check-all-that-apply questions: Inuence of attribute
order on sensory product characterization. Food Quality and Preference, 28,
141153.
Ares, G., Jaeger, S. R., Bava, C. M., Chheang, S. L., Jin, D., Gimenez, A., et al. (2013).
CATA questions for sensory product characterization: Raising awareness of
biases. Food Quality and Preference, 30(2), 114127.
Ares, G., & Varela, P. (2014). Comparison of novel methodologies for sensory
characterization. In P. Varela & G. Ares (Eds.), Novel techniques in sensory
characterization and consumer proling (pp. 365389). Boca Raton: CRC Press.
Ares, G., Etchemendy, E., Antnez, L., Vidal, L., Gimnez, A., & Jaeger, S. R.
(2014a). Visual attention by consumers to check-all-that-apply questions:
Insights to support methodological development. Food Quality and Preference,
32, 210220.
Ares, G., Trrega, A., Izquierdo, L., & Jaeger, S. R. (2014b). Investigation of the number
of consumers necessary to obtain stable sample and descriptor congurations
from check-all-that-apply (CATA) questions. Food Quality and Preference, 31,
135141.
Blancher, G., Clavier, B., Egoroff, C., Duineveld, K., & Parcon, J. (2012). A method
to investigate the stability of a sorting map. Food Quality and Preference, 23,
3643.
Bruzzone, F., Ares, G., & Gimnez, A. (2012). Consumers texture perception of milk
desserts II Comparison with trained assessors data. Journal of Texture Studies,
43, 214226.
Dooley, L., Lee, Y. S., & Meullenet, J. F. (2010). The application of check-all-thatapply (CATA) consumer proling to preference mapping of vanilla ice cream
and its comparison to classical external preference mapping. Food Quality and
Preference, 21, 394401.
Dravnieks, A. (1982). Odor quality: Semantically generated multidimensional
proles are stable. Science, 218, 799801.
Ennis, D. M., & Ennis, J. M. (2013). Analysis and thurstonian scaling of applicability
scores. Journal of Sensory Studies, 28, 188193.
Fisher, R. A. (1954). Statistical methods for research workers. Edinburgh: Oliver and
Boyd.
Husson, F., Le Dien, S., & Pags, J. (2001). Which value can be granted to sensory
proles given by consumers? Methodology and results. Food Quality and
Preference, 12, 291296.
Jaeger, S. R., Chheang, S. L., Jin, D., Bava, C. M., Gimenez, A., Vidal, L., et al. (2013a).
Check-all-that-apply (CATA) responses elicited by consumers: Within-assessor
reproducibility and stability of sensory product characterizations. Food Quality
and Preference, 30, 5667.
Jaeger, S. R., Giacalone, D., Roigard, C. M., Pineau, B., Vidal, L., Gimenez, A., et al.
(2013b). Investigation of bias of hedonic scores when co-eliciting product
attribute information using CATA questions. Food Quality and Preference, 30,
242249.
Jaeger, S. R., & Ares, G. (2014). Lack of evidence that concurrent sensory product
characterisation using CATA questions bias hedonic scores. Food Quality and
Preference, 35, 115.
Jaeger, S. R., Cadena, R. S., Torres-Moreno, M., Antnez, L., Vidal, L., Gimnez, A., et al.
(2014). Comparison of check-all-that-apply and forced-choice Yes/No question
formats for sensory characterization. Food Quality and Preference, 35, 3240.
Josse, J., Husson, F., & Pags, J. (2007). Testing the signicance of the RV coefcient.
In: IASC 07, August 30thSeptember 1st, 2007. Aveiro, Portugal: International
Association for Statistical Computing.

95

Kim, M. K., Lee, Y. J., Kwak, H. S., & Kang, M. W. (2013). Identication of sensory
attributes that drive consumer liking of commercial orange juice products in
Korea. Journal of Food Science, 78, S1451S1458.
Krosnick, J. A. (1999). Survey research. Annual Review of Psychology, 50, 537567.
Lawless, H. T. (1999). Descriptive analysis of complex odors: Reality, model or
illusion? Food Quality and Preference, 10, 325332.
L, S., Josse, J., & Husson, F. (2008). FactoMineR: An R package for multivariate
analysis. Journal of Statistical Software, 25(1), 118.
Lee, Y., Findlay, C., & Meullenet, J. F. (2013). Experimental consideration for the use
of check-all-that-apply questions to describe the sensory properties of orange
juices. International Journal of Food Sciences and Technology, 48, 215219.
Manoukian, E. B. (1986). Mathematical nonparametric statistics. New York, NY:
Gordon & Breach.
Meyners, M., Castura, J. C., & Carr, B. T. (2013). Existing and new approaches for the
analysis of CATA data. Food Quality and Preference, 30, 309319.
Meyners, M., & Castura, J. C. (2014). Check-all-that-apply questions. In P. Varela & G.
Ares (Eds.), Novel techniques in sensory characterization and consumer proling
(pp. 271305). Boca Raton: CRC Press.
Ng, M., Chaya, C., & Hort, J. (2013). Beyond liking: Comparing the measurement of
emotional response using EsSense Prole and consumer dened check-all-thatapply methodologies. Food Quality and Preference, 28, 193205.
Olsen, N. V., Menichelli, E., Meyer, C., & Ns, T. (2011). Consumers liking of private
labels: An evaluation of intrinsic and extrinsic orange juice cues. Appetite, 56,
770777.
Parente, M. E., Manzoni, A. V., & Ares, G. (2011). External preference mapping of
commercial antiaging creams based on consumers responses to a check-allthat-apply question. Journal of Sensory Studies, 26, 158166.
Plaehn, D. (2012). CATA penalty/reward. Food Quality and Preference, 24, 141152.
Popper, R., Rosenstock, W., Schraidt, M., & Kroll, B. J. (2004). The effect of attribute
questions on overall liking ratings. Food Quality and Preference, 15, 853858.
Popper, R. (2014). Use of just-about-right scales in consumer research. In P. Varela &
G. Ares (Eds.), Novel techniques in sensory characterization and consumer proling
(pp. 137155). Boca Raton: CRC Press.
Prescott, J., Lee, S. M., & Kim, K. (2011). Analytic approaches to evaluation modify
hedonic responses. Food Quality and Preference, 22, 391393.
R Core Team (2013). R: A language and environment for statistical computing. Vienna,
Austria: R Foundation for Statistical Computing.
Rao, C. R. (1995). A review of canonical coordinates and an alternative to
correspondence analysis using Hellinger distance. Qestii, 19, 2363.
Rasinski, K. A., Mingay, D., & Bradburn, N. M. (1994). Do respondents really Mark All
That Apply on self-administered questions? Public Opinion Quarterly, 58,
400408.
Reinbach, H. C., Giacalone, D., Ribeiro, L. M., Bredie, W. L. B., & Frst, M. B. (2014).
Comparison of three sensory proling methods based on consumer perception:
CATA, CATA with intensity and Napping. Food Quality and Preference, 32,
160166.
Robert, P., & Escouer, Y. (1976). A unifying tool for linear multivariate statistical
methods: The RV coefcient. Applied Statistics, 25, 257265.
Siegel, S. (1956). Nonparametric statistics for the behavioral sciences. New York:
McGraw-Hill.
Smyth, J. D., Dillman, D. A., Christian, L. M., & Stern, M. J. (2006). Comparing checkall and forced-choice question formats in web surveys. Public Opinion Quarterly,
70, 6677.
Sudman, S., & Bradburn, N. M. (1992). Asking questions. San Francisco, CA: JosseyBass.
Valentin, D., Chollet, S., Lelivre, M., & Abdi, H. (2012). Quick and dirty but still
pretty good: A review of new descriptive methods in food science. International
Journal of Food Science and Technology, 47, 15631578.
Varela, P., & Ares, G. (2012). Sensory proling, the blurred line between sensory and
consumer science. A review of novel methods for product characterization. Food
Research International, 48, 893908.
Worch, T., Crine, A., Gruel, A., & L, S. (2014). Analysis and validation of the Ideal
Prole Method: Application to a skin cream study. Food Quality and Preference,
32, 132144.
Worch, T., L, S., & Punter, P. (2010). How reliable are the consumers? Comparison
of sensory proles from consumers and experts. Food Quality and Preference, 21,
309318.

You might also like