You are on page 1of 29

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/259041031

Existing and new approaches for the analysis of CATA data

Article in Food Quality and Preference · December 2013


DOI: 10.1016/j.foodqual.2013.06.010

CITATIONS READS
244 3,713

3 authors:

Michael Meyners John C. Castura


Procter & Gamble Service GmbH, Kronberg, Germany Compusense Inc.
56 PUBLICATIONS 1,135 CITATIONS 49 PUBLICATIONS 1,291 CITATIONS

SEE PROFILE SEE PROFILE

Bernard Thomas Carr


Carr Consulting
48 PUBLICATIONS 6,364 CITATIONS

SEE PROFILE

Some of the authors of this publication are also working on these related projects:

Multisensory Aspects of Foods View project

All content following this page was uploaded by Michael Meyners on 09 May 2018.

The user has requested enhancement of the downloaded file.


Existing and new approaches for the
analysis of CATA data

Michael Meyners1*, John C. Castura2, and B. Thomas Carr3


1
Procter & Gamble Service GmbH, 65824 Schwalbach am Taunus, Germany
2
Compusense Inc., Guelph, Ontario, Canada
3
Carr Consulting, Wilmette, IL 60091, USA
*
Email: meyners.m@pg.com

Abstract
Check-All-That-Apply (CATA) questionnaires have seen a widespread use recently. In
this paper, we briefly review some of the existing approaches to analyze data obtained
from such a study. Proposed extensions to these methods include a generalization of
Cochran’s Q to test for product differences across all attributes, and a more informative
penalty analysis. Multidimensional alignment (MDA) is suggested as a useful tool to
investigate the association between products and the attributes. Comparisons of real
products with an ideal are useful in identifying specific improvements for individual
products. Penalty and penalty-lift analyses are used to identify (positive and negative)
drivers of liking. The methods are illustrated by means of CATA study on whole grain
breads.

Keywords: Check-All-That-Apply (CATA), Cochran’s Q, correspondence analysis,


multidimensional alignment, penalty analysis, penalty-lift-analysis

Introduction
Check-All-That-Apply (CATA) questions have seen an increased usage recently. They
are considered to investigate the perceptions of consumers on a variety of attributes. A
presentation by Adams, Williams, Lancaster and Foley (2007) on this method sparked
considerable interest in using CATA questions to obtain a rapid profile from naïve
consumers. In this paper, we briefly review some existing ways to analyze CATA data,
both graphically and by means of statistical tests. The main focus is on refining these
approaches to provide further insight into the data, and to add a few complementary
analyses that have, to the best of our knowledge, not been proposed so far. All methods
can similarly be applied to data from applicability testing studies (cf. Ennis & Ennis,
2013).
Notation
Let us consider a CATA study in which each assessor j evaluates each product k exactly
once per attribute a. Let nJ denote the total number of assessors, nK the total number of
products, and nA the total number of attributes. For simplicity, we assume that each
assessor evaluates each product exactly once; generalizations of some of the methods
below for incomplete or replicated designs or for data with missing values are
straightforward but inconvenient with regard to notation.
We assume the data typically to be organized in rectangular form where each of the
nJ*nK rows contains the data for one observation, i.e. one assessor and one product.
The attributes under consideration are displayed in nA columns. For the attributes, let “1”
indicate that this attribute has been checked by the respective assessor for this product,
and “0” indicate that the attribute has not been chosen. Additional columns indentify the
assessor and the product for this observation; further columns might contain data from
non-CATA questions like, e.g., ratings on liking or others.
As the analyses can be broken down to analyses by attribute, we re-organize the data
for each attribute a in a matrix Xa = {xajk} ~ (nJ, nK), with assessors in rows and products
in columns. The matrix Xa will contain the binary data, with a 1 indicating that the
attribute was selected and a 0 indicating that the attribute was not selected. Henceforth,
we omit the index a for simplicity unless required to make the notation clear.

Contingency table
Typically, the first summary of CATA data is to determine the column sums of X, i.e.
counting by product how many assessors checked the given attribute. Merging the
different attributes yields the so-called contingency table. The values might be displayed
as absolute counts or percentages, the latter being particularly useful if the number of
evaluations differs between products for any reason. Bar charts are frequently used to
visualize the contingency table (cf. Castura & Meyners, 2013).

Statistical test strategy for CATA


The analysis of CATA is typically considered exploratory and descriptive in nature,
rather than inferential. However, statistical testing for product differences and inference
to determine which attributes really differentiate between products and which attributes
show differences that are potentially due to chance only is useful to avoid over-
interpretation of the data.
If the data set includes evaluations of an ideal product, depending on the purpose of the
investigation this ideal product is treated as all others products in what follows, or it is
omitted from the analysis.
From now on, for convenience we assume that the study has been designed as a non-
replicated full cross-over (i.e. each panelist evaluates all products once) as the most
common test design for CATA studies. Other designs might be used, but might require
some modification of the approaches; some of the parametric tests might not even apply
anymore, as their assumptions are violated. If deviations are relatively small, they might
still provide relatively accurate results, while with larger deviations, the validity of the
parametric tests might be significantly compromised.

Cochran’s Q
Cochran (1950) proposes a test statistic Q to investigate differences between treatments
for cross-over studies with binary outcomes (like yes/no or checked/unchecked as used
for CATA). The test applies to one attribute at a time.
Cochran’s Q statistic for a single attribute is, in our notation, given by

 ( − 1) ∑ ( − )
=  
 ∑  − ∑ 

with Tk denoting the number of checks for product k with corresponding grand mean ,
and Rj the number of products for which assessor j checked the attribute under
investigation. Under the null hypothesis of no product differences, Q is asymptotically χ2-
distributed with (nK–1) degrees of freedom. Tate and Brown (1970) suggest that the χ2
approximation is acceptable if the corrected number of assessors times the number of
products is at least 24. For the correction in the number of assessors, those with no
variability across products are excluded, i.e. those that checked the respective attribute
for all or for none of the products are omitted from counting. In this manuscript, we refer
to the corrected number of assessors as the effective sample size.
Though 24 seems a relatively easy threshold to reach in a CATA study, in small studies
with few products or for comparisons of 2 or 3 products on attributes that are most often
elicited on all or no products, the threshold might still be limiting. Castura and Meyners
(2013) report effective sample sizes for pairwise comparisons for some attributes as low
as 15 from a total of 116 assessors.
An alternative proposal of Cochran (1950) is to use the ordinary F-test on binary data,
thereby treating binary data as if it were continuous. Although the F-test might give very
similar results in certain situations compared to Cochran’s Q test, the latter is more
appropriate for binary data (Cochran, 1950; Tate & Brown, 1970); therefore we will not
consider the F-test any further.
If only 2 products are compared, McNemar’s test (McNemar, 1947) and Cochran’s Q
test are equivalent. However, we recommend using the well-known sign test (Arbuthnott,
1710), which provides an exact version of these approaches and is simple to conduct.
A typical analysis by attribute could hence consist of Cochran’s Q followed by the sign
test for each pair of products (similar to an ANOVA F-test with subsequent pairwise
comparisons using Fisher’s Least Significant Difference), although this approach does
not protect against multiplicity issues.
Overall test
Instead of applying Cochran’s Q or related tests by attribute, we might be interested to
test for overall differences between products across all attributes. Such an omnibus test
will help to protect against inflated experiment-wise error rates. The well-known
Pearson’s χ2-test for contingency tables might be considered a reasonable approach,
were it not for the assumption of independence of observations, which is clearly violated
in typical CATA studies, because assessors often evaluate multiple (if not all) products,
and they assess all attributes. As Pearson’s χ2-test relies on the assumption that all
observations are independent (i.e. each participant would only assess one attribute for
one of the products), it does not provide a valid test for typical CATA data.
To the best of our knowledge, no valid global test for CATA data has been proposed. In
order to derive an approximate solution, note that Cochran’s Q test is asymptotically χ2-
distributed with (nK–1) degrees of freedom, and that the sum of two independent χ2-
distributed random variables with n and m degrees of freedom, respectively, is χ2-
distributed with n+m degrees of freedom. To derive an asymptotic test, assume for a
moment independence in evaluations of the different attributes, i.e. evaluation of any
attribute by an assessor is independent of his/her evaluations of all other attributes. If
true, the sum of all Q statistics across attributes is asymptotically χ2-distributed with

nA(nK–1) degrees of freedom. The sum of the Q statistics  ∗ = ∑  might then be
2
compared against the respective χ -distribution to determine an approximate p value.
This approach naturally generalizes Cochran’s Q test to multiple attributes.
However, the assumption of independence rarely holds. In a hypothetical evaluation, a
consumer who considers sour to apply might be less likely to use sweet to characterize
that same product. If such dependencies exist amongst attributes, then the Q statistics
are also not independent. The particular attributes and products in the study will
determine how strongly the assumption of independence is violated. We employ the
notion of randomization tests to derive a valid overall test which takes the dependency
structure into account. The same approach can be used to assess the significance of
product differences separately for each attribute.
Conceptually, the proposed method is very similar to the one proposed by Meyners and
Pineau (2010) for Temporal Dominance of Sensations (TDS) studies, and by Wakeling,
Raats and MacFie (1992) to test consensus in Generalized Procrustes Analysis (GPA).
The underlying idea is that, under the null hypothesis of no product differences, the
recorded data does not depend on the actual product tested, but would have been
identical if any of the other products had been evaluated. Therefore, randomly shuffling
the allocation of products to evaluations (while obeying the study design) should not
systematically change the test statistic used. If it nevertheless does change the test
statistic systematically, this provides evidence that the null hypothesis is not true. In
practice, usually 1000 or 10,000 randomly reallocations (including the original one) are
used. A proof of the validity of the concept is found in the textbook by Edgington and
Onghena (2007; see also Meyners & Pineau, 2010 for a more elaborate description of
the approach) and will be omitted here. It is worth noting that the concept of
randomization tests provides an exact test; but as we usually do perform only a random
subset of all possible randomizations, the test is considered to be quasi-exact. Arbitrary
precision could (in theory) be obtained by increasing the number of re-randomizations.
As above, the test statistic used is the sum of Cochran’s Q statistics for all attributes.
Rather than comparing the test statistic Q* against the χ2-distribution with nA(nK–1)
degrees of freedom as discussed above, we determine the so-called null-distribution by
means of appropriate re-randomizations of the data, typically using the observed data
plus 999 re-randomizations. Other test statistics might be used alternatively without
compromising the validity of the test; the choice of the test statistic merely defines
against which alternatives the test will be most sensitive. Meyners and Hartwig (2009)
used the very same approach employing Pearson’s χ2-statistic. We suggest using the
sum of Q statistics here as it is linked nicely with the Cochran’s Q test by attribute. Of
course, as for any global test, the hurdle for significance increases with an increasing
number of attributes that do not discriminate between products (i.e. attributes just
representing noise). Therefore, attributes other than those of main interest could be
dropped from the analysis, if not from the ballot prior to running the study.
Note that the re-randomizations are conducted such that rows are permuted within
consumers. Thereby, any assessor effects or potential dependencies between attributes
are maintained in the data, including potential differences in the average number of
attributes checked as well as systematic differences in the selection of attributes.
Though these effects are not modeled, they are respected for in the evaluation of
product differences, yielding valid p values under a model where assessor effects and
attribute dependencies are included.
A typical next step would be to apply the same test for subsets of products, investigating
the nature of any product differences that might exist. Most often, these tests would be
applied to pairs of products. Correction for multiplicity may be applied, but as CATA is
most often used for exploratory rather than inferential data analysis, it might be
reasonable to refrain from any correction; the test proposed here rather serves to avoid
serious over-interpretation of data that is possibly very noisy. An alternative could be to
conduct univariate analyses (one attribute at a time) across all products and then try to
identify the products that are discriminated by these attributes. Either way we are
interested in obtaining information at the level of individual attributes and pairs of
products. The advantage of testing pairs of products first is that we avoid exhaustive
investigation about the attributes on which two products might differ before we know that
they differ at all.
Note that the same approach might be used to compare products on a single attribute at
a time. The aforementioned approximations by the χ2-distribution only hold for
sufficiently large effective sample sizes. If effective sample sizes are too small for some
attributes, a randomization test can be used for each single attribute.
An implicit assumption for the application of randomization tests is that the order of
presentation has been randomized for each assessor independently from all the others,
i.e. no constraints have been imposed, such as balancing the sample order. However,
experience shows that results are robust to order balancing unless there are very strong
deviations from the randomization (e.g. the same order for each subject). Analytical
approaches (such as Cochran’s Q test) rely on exactly the same assumption, which
would be violated all the same. It is possible to modify the randomization tests to
account for design features, and also for other designs like incomplete block designs, all
of which will put some constraints on the randomization. However, given the infinite
number of constraints that might apply in theory, it is impossible to automate this in any
software; manual project-specific modifications are required instead. The R code that is
available upon request applies to the most common situation, but can be used as a
starting point for generalizations.
Lancaster and Foley (2007) suggest a different approach using bootstrapping and the
Cochran-Armitage linear trend test (Cochran, 1954, Armitage, 1955; cf. also Agresti,
2002). As we usually do not have any prior information or beliefs about possible
direction of trends, this approach only seems applicable to contrast pairs of products,
where a significant trend indicates product differences. The authors use the MULTTEST
procedure in SAS, which has the benefit of directly providing multiplicity correction for
the pairwise comparisons, but at the cost of not performing an adequate overall test for
CATA data. Agresti and Liu (1999) as well as Bilder and Loughlin (2002, 2004) propose
approaches to model and statistically test in the context of categorical variables with
multiple choices like in CATA. However, their approaches refer to a slightly different
scenario more typical for surveys in which two (or more) variables are assessed by a
single assessor, but not the scenario in which multiple products are evaluated by all
assessors. This has implications on the dependency structure in the data, such that
these approaches might not generalize easily to the most common CATA situation.
Similar to the setup proposed by Meyners and Pineau (2010) for TDS data, the different
tests provide an overview of the differences between products that can be assumed to
be real and those differences that might be due to chance only. A graphical display as
proposed by Meyners and Pineau (2010) might therefore be used to visualize these
results.
For further (graphical) analyses, it is worth considering omission of those attributes that
did not show any discrimination between the products. Some of the multivariate methods
are sensitive to whether such attributes are included, so it might be useful to avoid any
influence of non-significant attributes.
In larger and well-planned studies, it might seem of little use to actually run an overall
test. The products are typically chosen such that we already know that they differ, at
least to some extent. However, it is a good confirmation that the set of attributes was
reasonably chosen, and for smaller studies, an overall test is certainly helpful to avoid
interpreting noise in the data, so we recommend an overall test as a reaffirmation that
the data is indeed interpretable in some detail, and use the pairwise comparisons in
order to confine interpretation on attributes that really discriminate between products.
Graphical analysis
A contingency table is often displayed in a bar chart of the percentages or absolute
numbers of assessors checking an attribute by product. As CATA studies typically
involve a large number of attributes and several products, careful choices are required
regarding which attributes to display together, and whether the bars are grouped by
products or rather by attribute.
Correspondence Analysis (CA) is widely used to visualize a contingency table, and
might be considered as a generalization of Principal Component Analysis (PCA) for
ordinary data. The method projects the data into orthogonal components such as to
maximize the sequential representation of the variation in the data. Typically, only the
plot of the first two components is displayed; sometimes, due to too little variation
explained, additional dimensions are plotted as well. Details about CA are beyond the
scope of the paper, but are available from Greenacre (2007) as well as Abdi and
Williams (2010).
For classical CA based on χ²-distance, Legendre and Gallagher (2001) describe in a
different context that (translated to CATA data) attributes with low incidence rates can
have a major and undue impact on the results. To avoid having to omit these attributes
from the analysis, they propose to use the so-called Hellinger distance (Hellinger, 1909).
Popper, Abdi, Williams and Kroll (2011) made similar observations with regard to rarely
selected CATA terms, supporting the idea of using Hellinger distances. The R package
ExPosition (Beaton, Chin Fatt, & Abdi, 2012) provides a convenient interface to
determine a CA based on either χ² or Hellinger distances. As far as we are concerned, a
drawback of the current version of this package (and shared by other packages) is that
the aspect ratio of the default plots does not represent the relative variation explained;
distances between products in the plot therefore do not necessarily respect the actual
distances of the products.
An alternative approach to derive a perceptual map is given by Multi Factor Analysis
(MFA; Escofier & Pagès, 2008). MFA allows giving the same weight to (groups of)
variables, such that the perceptual map is not dominated by only a few attributes. This
balanced weighting is useful if, for example, smell, taste and texture attributes should
have similar impact on the results.
Of course, other methods exist to derive a perceptual map, e.g., Partial Least Squares
or covariance-based principal component analysis. These methods formally require
scale data, but are easily performed using CATA data with most software packages.
Interpretation of results from these analyses is done as if scale data were used, but
often resemble results from CA at least qualitatively. Applications using these methods
are rarely reported for CATA data and will not be addressed further, given that CA is
widely available.

Multidimensional alignment (MDA)


Any perceptual map displays only two (rarely three) dimensions, thereby representing
the relationship between attributes and products only incompletely. Depending on the
proportion of variance explained in the figure, the true relationships might differ more or
less from the visual impression from the plot. Consequently, an attribute might be related
substantially less or substantially more with a product than derived from the display.
Mathematically, the full information in the data is given in no more than (min(nA, nK) – 1)
dimensions. As typically nA > nK, this is in up to as many dimensions as we have
products, less one. Occasionally, due to perfect linear dependencies in the data, this
number might be even smaller. Therefore, attributes and products in a perceptual map
are vectors in a (nK – 1)-dimensional space. In order to reveal this information, Carr,
Dzuroska, Taylor, Lanza, and Pansini (2009) propose to determine the cosine between
, !
pairs of vectors through cos(∠, ) = ,! , ! , where <x,y> denotes the vector

product for vectors x and y. (Note that in a multidimensional perceptual map, both
products and attributes are represented mathematically by vectors, even though in many
cases only the attributes are depicted by arrows.) The cosine value can fall between -1
and +1. The angle between the vectors (or its cosine) in the full-dimensional space gives
the complete information about the relationship between products and attributes. Carr et
al. (2009) refer to this approach as Multidimensional Alignment (MDA). They propose to
display the cosines of the angles of a product with all attributes in a bar chart. For
interpretation, one needs to bear in mind that unlike correlation coefficients, absolute
cosines below 0.707 (=cos(45°) = -cos(135°)) indicate hardly any relationship at all. This
threshold is much larger than the threshold typically applied for interpreting correlation
coefficients, where smaller values are usually considered to indicate some relationship;
the threshold is high for the cosine due to its non-linearity. Consequently, a bar chart of
the cosines might me slightly misleading if not carefully interpreted. Instead, Castura and
Meyners (2013) propose displaying the angles directly on a reversed scale, where 180°
(π radians) indicates perfect negative relation, 0° (0 radians) perfect positive relation,
and 90° (π/2 radians) no relation at all. Alternatively, we propose here to display the
attributes in a semicircle for each product, thereby displaying all angles between the
attributes and the respective product in the multidimensional scale in just two
dimensions. The semicircle plot may become illegible if too many attributes are
displayed simultaneously; in this case a full circle plot provides increased legibility. It
should be noted that angles and their displays relate to the products only; we cannot
interpret the relationship of two or more angles of attributes with one product in order to
compare the attributes with each other. Two attributes might be reasonably well
correlated with a product, yet be orthogonal to each other in the multi-dimensional
space. However, it would be possible to apply the same approach to study the
relationship between one attribute and all others. The output would look the same, but
the one attribute under consideration is not correlated with itself and therefore not
shown.

φ-coefficient: relationship between attributes


An alternative way to study the relationship between attributes is given by the φ-
coefficient, a measure of correlation between two binary variables defined as
 %% − % %
$= .
&• %• • •%
Thereby, n11 (n00) is the frequency that the attributes are both (not) selected, n10 (n01) the
frequency that the first (second) attribute is selected but not the other, and n1• (n0•) is the
marginal frequency that the first attribute was (not) selected irrespective of the second,
and vice versa for n•1 and n•0. Note that φ is related to the Pearson’s χ²-statistic for 2x2
contingency tables through

)
$ = .

The φ-coefficient applied across assessors allows determination of which attributes are
typically checked together, and which are used rather independently to characterize the
products. Multidimensional scaling (MDS) can be applied to the matrix containing the φ-
coefficient for all pairs of attributes (or 1-φ in case a distance matrix is required as input
to MDS). The results of MDS can then be visualized again in a two-dimensional map.
Additional insight might be gained from the same analysis on subsets of the data, e.g.
observations above and below the mean liking only, or based on demographic variables.
Many coefficients of association could be used in place of the φ-coefficient. One
possibility is the Jaccard index (Jaccard, 1901), which might perform particularly well
with attributes that are rarely used to endorse the products. Lapointe and Legendre
(1994) use this index in a cluster analysis of Scotch whiskies, arguing that two products
sharing a characteristic was more relevant for similarity than two products both lacking a
characteristic. Investigating emotions might be another application where most attributes
are rarely used. In turn, there might be situations where most participants would endorse
certain attributes for all products (like, e.g., brown in the example below, which is used in
72% of all observations), and where the absence of a characteristic for two products is
more important with regard to their similarity than its presence. Jaccard index might
even partially mask the similarity between the attributes in such a case.

CATA with additional variables


Penalty-Lift-Analysis
If liking of the products under investigation has been rated along with the CATA
question(s), a penalty-lift analysis might be performed (Williams, Carr & Popper, 2011).
To this end, liking is averaged across all observations (i.e. assessors and products) in
which the attribute under consideration was used to characterize the product, and
across those observations for which it was not. Determining the difference between
these two mean values provides an estimate of the average change in liking due to this
attribute applying compared to not applying as indicated in the CATA questions. For
offnotes or other negatively associated attributes, liking might decrease due to an
attribute applying, resulting in a negative “penalty”. The penalty-lift is typically displayed
in a horizontal bar chart. Interpretation of the respective results might be misleading in
case the attributes used in CATA are highly correlated in the product space under
consideration. In that case, an attribute might be identified as an important driver of
liking just due to its correlation with a real driver. Therefore, results of this approach
should be carefully interpreted with sufficient understanding of the products evaluated
and the product space they span. This caution applies to other analyses as well, but is of
particular relevance here because the analysis does not give a hint on correlation at the
same time. In contrast, consider perceptual maps, in which the proximity of attributes
indicates high positive correlation (or negative correlation if on opposite sides), which is
usually taken into account implicitly in the interpretation.
Rather than aggregating across products, the analysis could be performed by product.
However, that approach would primarily return the mean liking of those people that
checked an attribute compared to that of assessors who did not do so for this particular
product. If due primarily to differences between certain consumer groups, for example, it
will certainly not provide an adequate assessment about which attributes drive liking or
acceptance across the product category. Furthermore, the number of (binary)
observations might be too small to obtain robust results if the analysis is restricted to a
single product.

Comparison of products with ideal


CATA question(s) can also be asked for an imagined ideal product. In this case,
Cowden, Moore, and Vanleur (2009) suggest comparing the real products with the ideal
based on the proportion of elicitations per product, and using a confidence interval for
the number of elicitations of the real product. This approach ignores the uncertainty
inherent in the number of elicitations for the ideal product, which is also a random
number empirically observed from the data. We propose modifying this approach by
determining the difference in the proportion of elicitations between each product and the
ideal. Any assessor selecting an attribute for both the real and the ideal product (or for
neither real nor ideal product) does not contribute any information on potential
differences between the products, whereas any assessor using an attribute for the real
but not for the ideal (or vice versa) does contribute. If there was no systematic difference
between the product and the ideal in this attribute (as posited under the null hypothesis),
the number of elicitations for the real product from those assessors that discriminate
between the products would be binomially distributed with chance parameter ½ and a
sample size equal to the effective sample size for this comparison. Based on these
parameters, a (90% or 95%) confidence interval is determined easily. Effective sample
sizes vary between attributes, thus confidence intervals have differing widths. The
differences in proportion of elicitations can then be plotted along with a corresponding
confidence interval, readily visualizing the differences between any product and the
ideal.

Penalty Analysis
A CATA study might include both scores on liking and the evaluation of an ideal product.
In this case, a penalty analysis can be used (Ares, Dauber, Fernández, Giménez, &
Varela, 2014). This analysis differs from the well-known penalty analysis for just-about-
right (JAR) questions in that the analysis is based on the differences between real and
ideal products (rather than deviation from the JAR value), and the impact on associated
liking scores.
For the proposed approach, we determine whether an attribute is used either only for the
ideal or only for the real product, which we define as incongruence. In contrast, if an
attribute is used for both or neither of the products, it is defined as congruence. We can
then determine the difference in mean liking for consumers with congruent and
incongruent elicitations, respectively. This value is a reasonable estimate of the average
impact on liking that the attribute has when used in a discriminating manner, compared
to its use in a non-discriminating manner. We might apply the same idea across
products, thereby allowing an overall evaluation. A difficulty in interpreting the visual
display of this analysis lies in the fact that the approach only gives the absolute
difference in liking, but does not indicate whether the attribute in question actually
increases or decreases liking. Interpretation of a corresponding plot must always relate
to the respective contingency table, and might require some interpretation of attributes
as potentially being positive or negative drivers of liking. Also, it is rather arbitrary
whether the changes are represented with a positive or a negative sign.
The difficulty just mentioned arises because both types of incongruence are treated the
same way in this approach. But whether an attribute is checked for an ideal but not the
real product, or vice versa, often matters. Therefore, we extend the proposal by Ares et
al. (2014) to allow respective differentiation. Substantially increased liking for
observations with the attribute checked for both products (denoted by (1,1)) over those
observations for which it is checked for the ideal but not the real product (denoted by
(1,0)) indicates a “must-have” attribute. If the difference in liking is small, the attribute
might be less relevant, even if consumers check it frequently for their ideal product. A
decreased liking for (1,1) compared to (1,0) should rarely be observed as it would
indicate that presence of the attribute has a negative impact on liking, which contradicts
the fact that it is has been selected for the ideal product.
The other relevant comparison along the same lines is between assessors that have not
checked the attribute for either of the products (denoted (0,0)) and those that have not
checked it for the ideal but have done so for the real product (0,1). A decrease in liking
from (0,0) to (0,1) indicates that the attribute should be avoided for a good product; if
approximately equal, we conclude that presence of the attribute does no harm. Liking
might also increase substantially from (0,0) to (0,1), which would indicate that the
attribute is not necessary (or considered necessary) for the ideal product, it nonetheless
has a positive impact on liking.
Therefore, determining and visualizing the different averages allows a more in-depth
investigation of potential liking drivers or inhibitors, and to differentiate “must-have”
characteristics from “nice-to-have” or “to-be-avoided” attributes. As before, the data can
be averaged across consumers for a more general interpretation. To visualize the
results, we plot the observed differences in liking against the percentage of consumers
for which incongruence occurred. The latter percentage indicates how important the
attribute under investigation is to discriminate between products across consumers.
Example
To illustrate the approaches proposed here, we consider a CATA study on six whole
grain breads. 161 consumers participated in this study after pre-selection based on
product usage criteria in a screener questionnaire. There were 86 women and 75 men,
all above the age of 18. All analyses were performed using R 2.15.1 (R Core team,
2012).
Samples were presented following a 6x6 Latin square in which order and first-position
carryover effects were balanced. Overall liking of samples was evaluated on a 9-point
scale before consumer indicated the CATA terms that applied to the sample. 31 CATA
terms were allocated to consumers in an order defined by a Williams design with "Other"
in the 32nd position. The same order of attributes was used for all evaluations by the
same consumer. After evaluation of the 6 real samples, consumers evaluated liking and
the same CATA questionnaire for their hypothesized ideal product. The contingency
table including mean values on Overall liking for this data set is shown in Table 1.

(Table 1 about here)

The average liking of the ideal product is clearly below 9, indicating that it doesn’t seem
safe to assume the liking for an ideal product to be the highest possible score; this
illustrates that this assumption, which is often made for penalty analysis, is not justified.
Only 64 of 161 consumers (40%) rated the liking of their ideal product to be 9, while 83
consumers rated the liking of their ideal product to be 8. A few (17) consumers even
rated liking of their ideal product to be 7 or less. Before going into further graphical
analyses, a statistical test for differences between the real products has been performed
overall and by attribute. The ideal has been omitted in this analysis as primary interest
was in comparing the real products. However, the same analysis could have been run
by treating the ideal as any other product. From the results in Table 2, we see a good
agreement between the randomized p values and those using the asymptotic
approximation to Cochran’s Q, both if applied across attributes or one attribute at a time.
The effective sample size varies from 23 for warm to 156 for seeds. These effective
sizes are reasonable; as shown in Table 1, warm has rarely been selected at all and
hence most consumers never selected it, while only a single consumer endorsed for
seeds for product 2, but most did so for products 5, 6 and 1. The last column of Table 2
includes the respective asymptotic p values if the ideal product is included in the
analysis. As most attributes showed statistically significant differences between the
products, this will not change a lot if another product (the ideal) is included; however,
two changes are noteworthy: if the ideal is included, warm and brown turn out to be
highly significant, though they haven’t been before. Referring to Table 1, it becomes
obvious that none of the products is regularly endorsed for being warm, while the
majority of consumers would like the ideal to be warm. In contrast, all breads are
endorsed by about three quarters of the consumers for being brown; for the ideal, it is
selected by only about half of the consumers. The breads in this study might hence have
been slightly too brown compared to consumers' ideal product.
(Table 2 about here)

A noteworthy number of additional comments were given under “other”; these comments
lend themselves to a different analysis, hence they will be omitted from the quantitative
analysis that follows. Also, the ideal product is omitted from the graphical analysis, as
primary interest is in the comparison between the real products and the product space
they span.

Correspondence Analysis has been used to visualize the data, using the R packages
ExPosition (Beaton, Chin Fatt, & Abdi, 2012) and prettyGraphs (Beaton, 2012)
with a few modifications to improve the graphical display. In this case, Hellinger’s
distance has been used. The resulting plot is shown in Figure 1. The analysis indicates
that the product space is low dimensional (2 or maximum 3-dimensional; the variance
explained in components 3 and 4 is about 6 and 2%, respectively). The main
discrimination is apparently between products endorsed on seeds, grainy, crunchy
(products 1, 3, 5) and the more traditional, springy and soft product 2. The second
dimension differentiates mainly on sweetness (including molasses). The size of the
circles in Figure 1 indicates how much the respective product or attribute contributes to
the total variance in the data.

(Figure 1 about here)

Based on the CA using Hellinger’s distance, MDA was used to determine the
association between products and attributes in all dimensions, i.e. including those not
displayed in Figure 1. As the number of attributes (31) is quite large to be displayed
simultaneously in a semi-circle plot, Figure 2 shows the results in full circles to increase
legibility. Attributes displayed nearly vertically indicate high (if above horizontal line) or
low (if below horizontal line) correlation with the respective product. Attributes displayed
almost horizontally are hardly associated with this particular product. The strength of the
association is further emphasized by the darkness of the font color. It is important to
note that it does not matter whether an attribute is displayed to the left or to the right of
the vertical line; they have been distributed across sides to maximize legibility. Further
note that proximity of attributes among each other does not indicate a relationship; such
relationships are not displayed in this layout. Figure 2 shows that product 2 is highly
associated with, e.g., traditional, soft and springy, which is consistent with the
interpretation of Figure 1. Next to a strong negative association with attributes like
grainy, seeds and sesame as also indicated from the two-dimensional display of the CA
results in Figure 1, we also find that this product is highly negatively associated with
nutritious, a relationship which is less apparent from the CA. Products 3 and, to a slightly
lesser extent, 1 are not very strongly associated with any of the attributes. An alternative
to displaying the attributes in a circle to indicate associations would be to use an
ordinary barplot. An example of such a display for results of an MDA can be found in
Castura and Meyners (2013) and is omitted here for brevity.

(Figure 2 about here)

To examine the association between the different attributes, we have determined the φ-
coefficient between each pair of attributes (not shown). Based on these analyses,
classical (metric) Multidimensional Scaling (MDS) based on the analysis of Mardia
(1978; as implemented in the R function cmdscale) was used on the distances
between attributes determined by 1–φ to create a two-dimensional map of these
attributes displaying their associations. It should be noted that the variation explained in
the first two dimensions is low (16% and 11%, respectively) such that further
components might have to be investigated (a scree plot – not shown – would indicate up
to 6 components). Overall, the associations found in Figure 3 showing the first
components of this MDS seem very reasonable.
In the next step, we looked at the associations based on observations above and below
the average mean liking separately. The idea behind this is that some attributes might
be co-elicited frequently for well-liked products, but co-elicited infrequently for less-liked
products (or vice versa). For example, firm and dense might be liked if they go together,
but not if only one of them is present; therefore for well-liked products, typically both or
neither are endorsed, but rarely only one of these attributes. For our data, the
corresponding MDS gives very similar results, which are omitted for brevity. However,
for other data sets, we have observed some differences in the respective MDS,
indicating that some attributes are endorsed simultaneously for products with low liking
but are less associated on products with high liking, or vice versa.

(Figure 3 about here)

In order to examine the impact of different attributes on liking, a penalty-lift analysis was
performed. Figure 4 shows the results, indicating that most attributes rather have a
positive connotation in this context. Not surprisingly, tasteful, satisfying, exciting, and
tempting are found to be the most important drivers of liking and increase liking by
almost up to 2 points on the 9-point scale when present compared to being absent. In
turn, sweet is not as strong an acceptance driver as for many other categories. The
results also show that “other” is the only real inhibitor of liking by about 1.3 points on
average; examining the open comments by means of a word cloud created using R
package wordcloud (Fellows, 2012) and displayed in Figure 5 indicates that most of
the open comments had rather negative connotations: most frequently used terms
include dry, bland, bitter, plain, boring, and tasteless and might be added to subsequent
studies, while at least one of, e.g., firm and dense might be omitted as they seem to be
highly associated (Figure 3), have a similar impact (Figure 4) and also do not
significantly discriminate between products (firm) or only borderline so (dense, cf. Table
2).

(Figure 4 about here)

(Figure 5 about here)

To compare individual attributes with their hypothetical ideals, we determined the


difference between the proportion of elicitations for the real and the ideal product. Figure
6 visualizes the results for bread 5. The dashed line indicates the 95% confidence
interval for each individual proportion; note that the width of the confidence interval
depends on the effective sample size (i.e. the number of consumers using the respective
attribute in one product, but not the other). It is apparent that consumers associated the
attributes homemade, satisfying, and fresh much more with their ideal product than with
the real product. The consumers found the bread less associated with warm than with
their ideal bread, which held for all products, as indicated earlier. On the other hand,
bread 5 was probably too coarse and perhaps too brown compared to an ideal.

(Figure 6 about here)

Finally, a penalty analysis has been performed. As indicated above, we didn’t assume
the liking scores for the ideal product to be equal to 9 as that did not seem to be a viable
assumption. Instead, the liking as expressed for a hypothetical ideal product was used.
The results of both the approach by Ares et al. (2014) and of the new proposed one are
overlaid in Figure 7. As proposed by those authors but in contrast to Castura and
Meyners (2013), we have chosen to plot the absolute change with a positive sign here
as most attributes are judged to rather have a positive connotation. Thereby, results
from the approach of Ares et al. (2014) and ours are more easily compared. An attribute
tasteless, if used instead of tasteful, would likely have been found in a similar position as
the latter if we assume that a product that is endorsed for tasteless is not so for tasteful
and vice versa, illustrating the limitation of the interpretation. In contrast, our approach
clearly differentiates positive from negative drivers of liking without further reference to
the original contingency table as given in Table 2.
Similar to previous analyses, the same drivers of liking are typically identified. However,
it is worth noting that the approach of Ares et al. (2014) does not indicate a strong
impact of either warm or homemade, for example, while these attributes are apparently
important “must-haves” for a good product. Similarly, crunchy (with a negative sign for
the analysis by Ares et al., 2014) is apparently a “must-have” for some and a “nice-to-
have” property of a good product for some other consumers. Too much brown and
dense is to be avoided, as this is on the lower end of the plot. A similar graph can be
derived for each individual product in order to identify specific attributes to be changed
for any particular bread; it is omitted here for brevity.

(Figure 7 about here)

Conclusions
CATA is a valuable method in the toolbox of sensory scientists, and many tools exist for
respective data analyses. These tools have been proven useful on many data sets. We
have extended this toolbox by adding some complimentary ways to look at the data, and
by refining some of the existing proposals to better match the requirements of the
researcher. Furthermore, the approaches described herein can be applied to all
consumers in a study, or to individual consumer segments or consumer clusters. These
methods can all the same be applied to data from an applicability testing study, as well
as to other scenarios in which non-CATA binary variables are collected.

Acknowledgements
The authors are grateful to Compusense Sensory & Consumer Services – and in
particular Karen Phipps and Sheila Fortune – for running the consumer test on whole
grain breads and making available the data set used to illustrate the methods described
in this manuscript. Two anonymous reviewers provided helpful comments on an earlier
version of the manuscript.

References

Abdi, H., & Willams, L. J. (2010). Correspondence Analysis. In: N. J. Salkind, D. M.,
Dougherty, & B. Frey (eds.): Encyclopedia of Research Design. Thousand Oaks,
CA: Sage. pp. 267-278.

Adams, J., Williams, A., Lancaster, B., & Foley, M. (2007). Advantages and uses of
check-all-that-apply response compared to traditional scaling of attributes. 7th
Rose-Marie Pangborn Sensory Science Symposium. Minneapolis, MN, USA.

Agresti, A. (2002). Categorical data analysis (2nd ed.). Hoboken, NJ: John Wiley and
Sons.
Agresti, A., & Liu, I. (1999). Modeling a categorical variable allowing arbitrarily many
category choices. Biometrics, 55, 936–943.

Arbuthnott, J. (1710). An Argument for Divine Providence, Taken from the Constant
Regularity Observ’d in the Births of Both Sexes. Philosophical Transactions of the
Royal Society of London, 27, 186–190.

Ares, G., Dauber, C., Fernández, E., Giménez, A., & Varela, P. (2014). Penalty analysis
based on CATA questions to identify drivers of liking and directions for product
reformulation. Food Quality and Preference 32A, 65-76.

Armitage, P. (1955). Tests for linear trends in proportions and frequencies. Biometrics,
11, 375–386.

Beaton, D. (2012). prettyGraphs: publication-quality graphics. R package version 1.0.


http://CRAN.R-project.org/package=prettyGraphs

Beaton, D., Chin Fatt, C. R., & Abdi, H. (2012). ExPosition: Exploratory analysis with the
singular value decomposition. R package version 1.1. http://CRAN.R-
project.org/package=ExPosition.

Bilder, C. R., & Loughlin, T. M. (2002). Testing for conditional multiple marginal
independence. Biometrics, 58, 200–208.

Bilder, C. R., & Loughlin, T. M. (2004). Testing for marginal independence between two
categorical variables with multiple responses. Biometrics, 60, 241–248.

Carr, B. T., Dzuroska, J., Taylor, R. O., Lanza, K., & Pansini, C. (2009).
Multidimensional Alignment (MDA): A simple numerical tool for assessing the
degree of association between products and attributes on perceptual maps. 8th
Rose-Marie Pangborn Sensory Science Symposium. Florence, Italy.

Castura. J. C. & Meyners, M. (2013). Check-all-that apply questions. In: P. Varela and
G. Ares (eds.): Novel Techniques in Sensory Characterization and Consumer
Profiling. Boca Raton, FL: CRC Press.

Cochran, W. G. (1950). The comparison of percentages in matched samples.


Biometrika, 37, 256–266.

Cochran, W. G. (1954). Some methods for strengthening the common χ2 tests.


Biometrics, 10, 417–451.

Cowden, J., Moore, K., & Vanleur, K. (2009). Application of check-all-that-apply


response to identify and optimize attributes important to consumer’s Ideal product.
8th Rose-Marie Pangborn Sensory Science Symposium. Florence, Italy.
Edgington, E., & Onghena, P. (2007). Randomization tests (4th ed.). Boca Raton, FL:
Chapman and Hall/CRC.

Ennis, D.M., & Ennis, J.M. (2013). Analysis and Thurstonian scaling of applicability
scores. Journal of Sensory Studies, in press. DOI: 10.1111/joss.12034

Escofier, B., & Pagès, J. (2008). Analyses factorielles simples et multiples; objectifs,
méthodes et interprétation (4th ed.). Paris, France: Dunod.

Fellows, I. (2012). wordcloud: Word Clouds. R package version 2.2. http://CRAN.R-


project.org/package=wordcloud.

Greenacre, M. (2007). Correspondence Analysis in Practice. (2nd ed.). Boca Raton, FL:
Chapman and Hall/CRC.

Hellinger, E. (1909). Neue Begründung der Theorie quadratischer Formen von


unendlichvielen Veränderlichen. Journal für die reine und angewandte Mathematik,
136, 210–271.

Jaccard, P. (1901). Étude comparative de la distribution florale dans une portion des
Alpes et des Jura. Bulletin de la Société Vaudoise des Sciences Naturelles, 37,
547–579.

Lancaster, B., & Foley, M. (2007). Determining statistical significance for choose-all-that-
apply question responses. 7th Rose-Marie Pangborn Sensory Science Symposium.
Minneapolis, MN, USA.

Lapointe, F.-J., & Legendre, P. (1994). A classification of pure malt Scotch whiskies.
Applied Statistics, 43, 237–257.

Legendre, P., & Gallagher, E. (2001). Ecologically meaningful transformations for


ordination of species data. Oecologia, 129, 271–280.

Mardia, K. V. (1978). Some properties of classical multidimensional scaling.


Communications in Statistics – Theory and Methods, A7, 1233–1241.

McNemar, Q. (1947). Note on the sampling error of the difference between correlated
proportions or percentages. Psychometrika, 12, 153–157.

Meyners, M., & Hartwig, P. (2009). Consumer associations with a toddlers’ product color
evaluated by a choose-all-that-apply questionnaire. 8th Rose-Marie Pangborn
Sensory Science Symposium. Florence, Italy.

Meyners, M., & Pineau, N. (2010). Statistical inference for temporal dominance of
sensations data using randomization tests. Food Quality and Preference, 21(7),
805–814.
Popper, R., Abdi, H., Williams, A., & Kroll, B. J. (2011). Multi-Block Hellinger Analysis for
Creating Perceptual Maps from Check-All-That-Apply Questions. 9th Rose-Marie
Sensory Science Symposium, Toronto, ON, Canada.

R Development Core Team (2012). R: A language and environment for statistical


computing. R Foundation for Statistical Computing, Vienna, Austria. ISBN 3-
900051-07-0, URL http://www.R-project.org/.

Tate, M. W., & Brown, S. M. (1970). Note on the Cochran Q Test. Journal of the
American Statistical Association, 65, 155–160.

Wakeling, I. N., Raats, M. M., & MacFie, H. J. H. (1992). A new significance test for
consensus in generalized procrustes analysis. Journal of Sensory Studies, 7, 91–
96.

Williams, A., Carr, B.T., & Popper, R. (2011). Exploring Analysis Options for
Check-All-That-Apply (CATA) Questions. 9th Rose-Marie Sensory Science
Symposium, Toronto, ON, Canada.
Table 1: Contingency table for the CATA evaluation and average liking scores for 6
breads and a hypothetical ideal whole grain bread.

Attribute Prod 1 Prod 2 Prod 3 Prod 4 Prod 5 Prod 6 ideal total


Fresh 89 70 71 71 91 108 142 642
Warm 6 7 4 11 9 8 90 135
Crusty 24 4 32 27 32 27 47 193
Soft 76 95 75 72 81 78 93 570
Sweet 102 18 30 97 53 81 85 466
Chewy 73 71 74 78 78 75 69 518
Tempting 48 17 16 21 35 44 75 256
Nutty 89 3 50 28 91 92 88 441
Moist 81 63 44 60 84 83 120 535
Grainy 111 4 80 35 127 128 102 587
Dense 58 40 40 56 43 51 41 329
Healthy 118 53 85 78 122 118 147 721
Texture 87 29 68 46 100 97 90 517
Comfort 45 33 18 38 36 46 86 302
Wheat 104 91 108 96 111 99 91 700
Exciting 15 2 3 8 19 24 59 130
Crunchy 41 1 32 7 63 75 36 255
Homemade 40 25 21 23 37 36 107 289
Brown 122 115 120 112 122 110 88 789
Seeds 125 1 82 24 139 133 109 613
Fullness 76 41 36 52 54 77 87 423
Coarse 34 7 39 26 67 44 18 235
Satisfying 89 39 48 57 73 93 126 525
Nutritious 113 42 76 70 109 115 134 659
Molasses 62 6 24 40 21 24 39 216
Tasteful 114 49 54 68 97 108 144 634
Rustic 61 2 36 18 71 67 72 327
Traditional 40 66 35 50 37 38 57 323
Springy 42 61 48 43 46 47 48 335
Sesame 31 0 22 7 39 42 38 179
Firm 37 33 28 33 22 33 35 221
Other 16 56 40 40 32 23 16 223
total 2169 1144 1539 1492 2141 2224 2579
Mean Liking 7.3 5.8 6.1 6.2 6.9 7.2 8.3 6.8
Table 2: Uncorrected p values from statistical testing for overall product differences and
effective sample sizes. Significant p values at level 5% are set in bold.

p value
p value p value effective (incl. ideal)
attribute (randomizations) (Cochran’s Q) sample size (Cochran’s Q)
Fresh 0.001 < 0.001 132 < 0.001
Warm 0.315 0.304 23 < 0.001
Crusty 0.001 < 0.001 66 < 0.001
Soft 0.055 0.062 132 0.018
Sweet 0.001 < 0.001 134 < 0.001
Chewy 0.928 0.926 118 0.886
Tempting 0.001 < 0.001 86 < 0.001
Nutty 0.001 < 0.001 128 < 0.001
Moist 0.001 < 0.001 131 < 0.001
Grainy 0.001 < 0.001 152 < 0.001
Dense 0.039 0.037 118 0.033
Healthy 0.001 < 0.001 116 < 0.001
Texture 0.001 < 0.001 127 < 0.001
Comfort 0.001 < 0.001 86 < 0.001
Wheat 0.023 0.023 93 0.009
Exciting 0.001 < 0.001 45 < 0.001
Crunchy 0.001 < 0.001 110 < 0.001
Homemade 0.002 0.001 74 < 0.001
Brown 0.285 0.242 89 < 0.001
Seeds 0.001 < 0.001 156 < 0.001
Fullness 0.001 < 0.001 128 < 0.001
Coarse 0.001 < 0.001 98 < 0.001
Satisfying 0.001 < 0.001 137 < 0.001
Nutritious 0.001 < 0.001 126 < 0.001
Molasses 0.001 < 0.001 84 < 0.001
Tasteful 0.001 < 0.001 146 < 0.001
Rustic 0.001 < 0.001 107 < 0.001
Traditional 0.001 < 0.001 104 < 0.001
Springy 0.099 0.094 105 0.154
Sesame 0.001 < 0.001 71 < 0.001
Firm 0.277 0.256 96 0.308
Other 0.001 < 0.001 91 < 0.001
overall 0.001 < 0.0011 < 0.0012

1 based on the approximate sum of Q statistics of 2863 on 160 degrees of freedom


2 based on the approximate sum of Q statistics of 4431 on 192 degrees of freedom
Crunchy
Coarse 3
Springy
Soft 2
Sesame 5
Seeds Wheat
Brown Traditional
Grainy Texture Fresh Chewy
Rustic
6 HealthyHomemade Moist
Nutty Crusty
Nutritious
Firm
Component 2 (16.8%)

Satisfying
Tasteful
Exciting Tempting DenseWarm
Fullness Comfort
1
4

Sweet
Molasses

Component 1 (72.2%)

Figure 1: First two dimensions from the CA based on Hellinger’s distance. A symmetric
display is used here; only distances between products and distances between attributes
can be interpreted directly, but not those between attributes and products.
product 1 product 2 product 3

y
ist n Spring
M

Fre eat D
ol

W
g

nal
a t

m
ExSe sse yi ne W r

h
fo

sh

Fir
isafm

ty
ar

Traditio
cit ed s t oem e

ut
s m Te
ings ae ars

Ns
l Cd

Ch ens
S
Te
xtu

iou
GRru f ut a Co

S ofty

MBor ow
ee y alt h y m re H
m
y

ew e
t
me

t
as ea t

tri
se h e Full S es a s
pt
Full ne Textu intiyc s
TSawrunc H Ho Num
l th ru eat ySeuesdtic

Nu
nes
in
ss e G ainyy
Bitrro CW
C oarsre ttN h ch R
g
C sty s y utr gy
e Cru Soft iowuns Cruhenwy Sprin
Sw ses DFreesh
ade m Den Comfort
ing mptin
g
Ex eet M Tastefu TraditionaMl olas rm t Ho se n F ir m
H om em Fir se
Satisfy Te ci ol as l Wa ois me
TFrr e

e
Sown

Mo
ft

s
tin

Co
ses

Cruustic Sestritious
maEx

eeds

R
M

N ut ty
rs

es
adsh i st g t de cit in
ro

arm

re Coa

mf
iti Sp ty e

lln
e
y B

nch
W on W rin us Sw
g

ort Ta
Cr

Fu
Grainy S
al hea gy

Te tef ul
He

ng
y
ew

m
fyi
Nu

alt
xtu

pt
Ch

s
tis

in
hy
t

Te

Sa

g
ame
product 4 product 5 product 6

Nu
Tr

rsce
SeN utt Nu

edti c
s
ad

oasti

N uious
tri
m

e Seus
ed y tr
itio

t
Ru
n
ar

ing

tty
W
ow ess

hy eC

ur g R
s

xt iti n y
CrG
na

HSeeesxture
h cit
y W

SeGr

TEexrcunch
Mo eat Co Br ull n He
Ch se

ars

altam
Ex
l

unr acin us
Cru alth Te
i st S m s Cgo

saam
F
Den

Spw sse sty y mp


ew

rin for t h M ol a
itio
Fir

Hom e yin
hyy

iny
f Tas tin

C
T tisf Crusty
made egey t SFor es

e
tefug
m

t Sweet l Sa
Tasteful Satisfying Tempt ing
C rusty Tasteful Sw Fullness
Molasses
Ex de W eet Satisfying Mo
ting hy citi sses ema sh he es
h ist H om
Temp a lt ng Mola
om Fre fort Fr
STor ad
NSuee

at ema
xtu ic

DWehea B
C hreingy
wy
He
re

ow
se

H m
Te ust

M W de
ft i tion

o
tt yds Cr

Co C
arm Fu oist

ns t rown
Br

us ar
D en
R

ar
W ingy m

al p
io

e S
rit se lln

on S
y

t es
G r n ch y

r
e

dit i ir m
om

Nu Sp
m

al
he

s
ain

Firm

oft
sa

fo

F
C
u

rt
y
Se

Tra

Figure 2: Associations between attributes and products based on MDA.


Firm Traditional
Dense

Brown

Springy
Chewy

Wheat
Molasses Warm
Crusty Homemade
Fullness Comfort
Soft

Exciting Fresh Moist


Coarse Tempting
Sweet
Satisfying
Rustic
Sesame Healthy
Texture Tasteful
Nutritious
Crunchy
Nutty

Grainy

Seeds

Figure 3: Associations between attributes based on classical MDS on the φ-coefficient.


The variation explained is about 16% and 11% for the first and second dimension,
respectively.
Tasteful
Satisfying
Exciting
Tempting
Comfort
Healthy
Nutritious
Fresh
Moist
Warm
Homemade
Fullness
Nutty
Sweet
Seeds
Crunchy
Grainy
Rustic
Texture
Soft
Sesame
Molasses
Traditional
Springy
Chewy
Wheat
Coarse
Crusty
Firm
Dense
Brown
Other

-1.5 -1.0 -0.5 0.0 0.5 1.0 1.5 2.0

Figure 4: Results of the penalty-lift analysis. The values indicate the change in liking of
observations for which the respective attribute was checked, compared to observations
for which that attribute was not checked.
tasteless

dry
flavourful
delicious
processed
thick
taste oatmeal
slight boring crunchy fresh
fluffy crust
texture salty sour little yeasty

nice
tasty airysticky
cutfulfilling poppy

stale
looksyummy
slightly tasting blahbit strong

somewhat
seed
plain bittersweet
heavy

recipe chemical odd


nutty appealing bread
tastes sweetness moist
light seeds burnt thin exciting
bland
aftertaste
seeded flavour
appearance

Figure 5: Word cloud from the open comments. Only words at least used twice across
all panelists and products are shown.
-0.5
-0.4
-0.3
-0.2
-0.1
0.0
0.1
0.2
0.3

Warm
Homemade
Satisfying
Fresh
Tempting
Sweet
Soft
Comfort
Moist
Tasteful
Coarse
Chewy
Seeds
Rustic
Fullness
Crunchy
Crusty
Brown
Sesame
Nutritious
Nutty
Grainy
Wheat
Exciting
Springy
Healthy
Figure 6: Comparison of elicitation rates of bread 5 and the ideal product.

Dense
Traditional
Texture
Firm
Molasses
2.0 Satisfying Tasteful

Exciting
Satisfying
Tempting
Exciting
Tempting Tasteful
Comfort
Tasteful Sweet Nutritious
Healthy
1.5
2
Healthy
Fresh Homemade
Traditional 5Nutty
Comfort
Grainy Nutritious Warm
Moist Seeds Moist
Warm
Fresh
1.0 Fresh Sesame Satisfying
Crunchy
1
change in liking

Texture
Coarse 3
Wheat Grainy
Springy Sweet
2 4 Crusty Seeds
Nutritious 6 Firm
Crunchy Traditional
Homemade Wheat 5 Homemade
Moist
0.5 Nutty
Brown Comfort
6 Brown Nutty 2
4 3
Texture
Texture Tempting
Seeds3 Firm
Healthy Exciting
Sesame Springy
1 5 Sweet 1 Springy Crusty Warm
Coarse
0.0
Grainy Coarse Sesame
Crusty
4 Crunchy
Traditional Wheat
pooled Firm 1 Molasses 4 Chewy
-0.5 must: real=0, ideal=1 2 Fullness 5 Rustic
nice: real=1, ideal=0 6 Brown 3 Soft 6 Dense

0 10 20 30 40 50
percentage of consumers with mismatch

Figure 7: Results of the penalty analysis based on general incongruence (bold), on


incongruence in which the attribute is missing in the real but not the ideal product
(“must-haves”, dark grey italics) and incongruence in which the attribute is not important
for the ideal but endorsed for the real product (“nice-to-haves”, light grey plain font).

View publication stats

You might also like