P. 1
Product and Package Testing

Product and Package Testing

|Views: 11|Likes:
Published by Mohit Bhalwara

More info:

Published by: Mohit Bhalwara on Jul 21, 2012
Copyright:Attribution Non-commercial

Availability:

Read on Scribd mobile: iPhone, iPad and Android.
download as DOC, PDF, TXT or read online from Scribd
See more
See less

04/03/2013

pdf

text

original

Product and Package Testing

Howard R. Moskowitz, Ph.D. Moskowitz Jacobs Inc. 1025 Westchester Avenue White Plains, New York 10604 The author wishes to acknowledge the assistance of Margaret Mirabile in the preparation of this article for publication.

Part 1 – Product Testing Overview
Objectives of Product Testing In the world of market research, product testing occupies a venerable place. Manufacturers need to know how their products perform. In the main, product testing methods have been developed and advanced, as well as used by researchers involved in fast moving consumer goods (e.g., beverages, shaving creams, etc.). The objectives of the research have varied from measuring basic acceptance (is the product good or bad, accept Vs reject), to providing consumer feedback for ongoing formulation, to concept-product fit, and even to estimate market share or expected volume. The many uses of product testing mean that there is no one discipline or set of knowledge which encompasses the world of evaluation that we call ‘product testing’. The stage at which the test is commissioned, the knowledge base of the researcher doing the test, and the use of the data all dictate different types of tests. The Different "Players" In the Product Testing World Market researchers are only one group of professionals involved in the evaluation of products. When looking at the professionals involved and the technical literature cited, the reader might come across disciplines as different as product developers, sensory analysis (R&D oriented), quality control (production oriented), consultants to management (process oriented, for the development process), market researchers, advertising agencies, statisticians, and many more. For health and beauty-aid products, one might also cross knives with perfumers. In durables, one might also meet up with engineers (Gacula 1993; Meilgaard, Civille and Carr 1987; Moskowitz, 1983; 1984; 1985; 1994; Stone and Sidel 1985). Each of the players in the product-testing world tests for a specific reason. As we will see below, the different players in turn bring to bear their predilections, biases (about what data should be, what respondents should do, and even language). What Type of Data Does The Researcher Present –The Difference Between Early Stage Guidance And Late Stage Confirmation Most researchers in the market research industry are familiar with late-stage product testing. In late-stage testing, most of the developmental work has finished. R&D has

probably finished creating its prototypes, tested these among consumers (more about that below), and arrived at one or two potentially acceptable prototypes. Criteria for selecting these prototypes for a confirmatory test may include success in earlier stage testing. The products may be promising because they are not distinguishable from the current gold standard (e.g., in the case of cost reduction, where the goal is to maintain product quality with cheaper ingredients), or the hope is that the product fits a marketing driven concept. Late stage product testing usually does not call for action beyond a pass/fail (although fail may call for additional work). Rather, the late stage product test provides the necessary information to confirm that R&D have developed an appropriate product. In common parlance, late stage (confirmatory) product testing acts as a ‘disaster check’ (although it is unclear how a potentially disastrous product, with low acceptance, could have even reached this stage of testing). In contrast to late stage confirmatory tests are early stage development studies. Variously called ‘research guidance tests’, ‘R&D hall tests’, and the like, these early stage tests serve the purpose of guiding R&D (research and development). Typically these tests are commissioned in order to screen winning products from among a wide variety of contenders (for future work), identify the characteristics of a product that drive acceptance (for brand learning), to segment consumers on the basis of sensory preference (for targeted development / marketing), etc. Early stage tests may be run by R&D, but often are run by market researchers as well. Early stage tests encompass a far wider range of questions and applications than do late stage confirmatory tests. Typically, these early stage tests provide a plethora of information that could not otherwise be obtained. Types of Problems to Be Answered The key to correct (or better ‘appropriate’) product tests is to identify a problem, and then to match the right test method to the problem. As simple as this seems, this dogma is violated as often as it is fulfilled. For instance, it is not unusual for researchers to use paired comparisons tests (head to head preference tests) in order to guide product developers, even though the data does not really help the developers who receive the report. Nor is it particularly unusual for researchers to rely upon in-house expert panels who select products based upon ‘liking’ or ‘preference’, in order to go forward with these products for further development – despite the fact the in-house expert panelists are not the relevant consumers of the product, and their preferences may be irrelevant to the population at large. Product testing can address many different types of problems, depending upon the agenda and needs of researchers. Many of those who commission product tests simply want some type of single number – e.g., significance (is my product better than a competitor), scoreboard rating (how high do I score). Others using product testing want to learn about the product (e.g. what drives liking, are there segments in the population with different preferences, along what particular attributes is the product strong or the product weak). Typically, these different questions come from different audiences that commission the research.

Part 2 – Product Testing Specifics

Affective Tests – How Much Do I Like This Product? Traditionally, most product tests have been commissioned to answer questions about liking. These tests may comprise paired comparisons (in which one product is paired against a standard, such as the market leader). In other cases, the researcher will use a scale of liking (e.g., the 9 point hedonic scale, varying from dislike extremely to like extremely, Peryam and Pilgrim 1957). Still, other researchers use appropriateness as a measure of acceptance, rather than degree of liking (Schutz 1989). Paired comparisons are often used when the marketing objective is to beat or at least to equal a specific product. [Marketers think in terms of performance in comparison to a competitor or to a gold standard]. Scales of liking are used when the researcher wants to determine the degree of acceptance, or even, whether or not the product is liked at all (viz., classification). Table 1 presents various scales for liking that have been used by researchers (Meiselman 1978). Table 1 Verbal descriptors for hedonic scales Scale Points 2 3 3 3 5 5 5 9 Descriptors Dislike, unfamiliar Acceptable, dislike, (not tried) Like a lot, dislike, do not know Well liked, indifferent, disliked (seldom if ever used) Like very, like moderately, neutral, dislike moderately, dislike very Very good, good, moderate, tolerate, dislike (never tried) Very good, good, moderate, dislike, tolerate Like extremely, like very much, like moderately, like slightly, neither like nor dislike, dislike slightly, dislike moderately, dislike very much, dislike extremely FACT Scale (Schutz 1964): Eat every opportunity, eat very often, frequently eat, eat now and then, eat if available, don’t like – eat on occasion, hardly ever eat, eat if no other choice, eat if forced

9

Liking can be further refined in these tests by having the respondent assess the different components of the product as they appear to the senses (e.g., like appearance, fragrance/aroma, taste/flavor, texture/feel, etc.). Quite often, these attribute liking ratings correlate highly with overall liking, suggesting that respondents may have a hard time differentiating overall liking from attribute liking. [Respondents do differentiate the

different senses, however, in terms of scaling amount of an attribute, rather than liking of an attribute, as we will see below]. It is worth noting that quite often researchers use these affective tests along with action standards. An action standard dictates that the product will move forward in the development or marketing process (e.g., to further development or even to market introduction), if the product meets specific objectives (e.g., achieves a certain minimum acceptance level). Paired Preference Tests In consumer research, the role of the paired preference test appears to be sacrosanct, whether deserved or not. Paired preference simply means putting one product up against another, and instructing the respondent to indicate which one the respondent prefers. The ‘preference’ measure does not show how much one product is liked (or how much more one product is liked versus another). Rather, the paired preference test is simply a head to head match, with ‘winner take all’ (on at least a person by person basis). The results of paired preference tests are reported in percents, rather than in degree of liking (as would be reported in a scaling exercise). Paired preferences tests can extend to other attributes because the researcher can also ask the respondent to indicate which product has more of a specific characteristic. The characteristic need not even be a sensory one (such as depth of color, thickness, graininess of texture, etc.). Rather, the characteristic could even be an image one (e.g., one product is ‘more masculine’). Paired preference tests, popular as they are, provide relatively little information. The conventional (yet unproven) wisdom is that consumers make choices by implicitly comparing products to each other. Thus, the typical paired test pits a new product against either the gold standard that it is to replace, or against the market leader (and only the market leader) against which it is thought to compete. The positives of paired preference testing are the simplicity of the test in the field execution, and the ease of understanding (viz., a paired comparison result). The negatives are that the demand for head-to-head comparison may focus attention onto small, irrelevant differences that exist (especially when the respondent is searching for a hook on which to hang the comparison), and that the data cannot be used for much beyond the comparison results. Paired comparison data are not particularly useful for product development because they give no real guidance. It is worthwhile digressing for a moment here to understand a little of the intellectual history of paired testing, because of the widespread use of the methods, and the limitations (which do not appear to affect the use or misuse of the procedure). Paired testing got its start more than a century ago. The German psychologist, physiologist, and philosopher, Gustav Theodor Fechner (Boring 1929), was interested in the measurement of sensory perception. However, according to Fechner, the human being is not able to act as a simple measuring instrument in the way that we understand these instruments to operate. [Today’s

researchers, especially those in experimental psychology and psychophysics, would vehemently disagree with Fechner, but keep in mind that we’re dealing with the start of subjective measurement, not with its well developed world today]. According to Fechner, one way to measure sensory perception was to ask people to discriminate between samples. From their behavior Fechner was able to determine the magnitude of difference needed between two samples to generate a difference. The psychometrician, L.L. Thurstone (1927) carried this analysis one step further by developing paired comparison methods, in which respondents judged which sample was stronger, heavier, liked more, etc. From the paired comparison data (the ancestor of our paired preference tests), Thurstone developed a subjective scale of magnitude. Thurstone’s scaling methods were soon adopted by researchers to erect scales of sensory magnitude and liking, from which developed the long-standing acceptance of paired methods in applied product testing. Thurstone, publishing in the academic literature, was thus able to influence subsequent generations of applied market researchers, many of whom do not know the intellectual origins of this test. Sensory Questions – How Much Of A Characteristic Does My Product Possess? Respondents can act as judges of the amount of a characteristic. Sometimes these characteristics or attributes can be quite simple – e.g., the sweetness of a beverage. Other times the attributes may be more complex, such as the ‘tomato flavor’, which calls into play a host of sensory attributes. Still, other times, the attributes may be simple, but require explanation (e.g., the ginger burn of a ginger candy). There is an ongoing debate in the scientific literature (especially fragrance, but also food) about the degree to which a respondent can validly rate sensory characteristics. On one side of this dispute are those who believe that the only attribute that a consumer can validly evaluate is liking. These individuals believe that it is improper to have consumers rate sensory attributes. On the other side of the dispute, are those (including this author) who believe that a well instructed respondent can validly rate sensory attributes, and indeed such a respondent (not an expert, mind you) can switch focus from liking to sensory to sensory directional (see below), for many attributes. The dispute is simple enough to solve through experimentation. In cases where experts and consumers rate the same products it has been shown that their ratings correlate with each other (Moskowitz 1995), and with known physical variations of the product (Moskowitz and Krieger 1998). The scientific literature suggests that consumers can validly rate sensory attributes, and that their attribute ratings line up with known physical changes in the stimulus. Literally thousands of scientific articles in all manner of disciplines related to sensory perception suggest that the unpracticed respondent can assign numbers whose magnitude matches the physical magnitude of the stimulus (see Stevens 1975). One could ask for no clearer validation of a respondent’s abilities. Sensory Directionals or "Just Right" Information – What To Do Quite often in developmental research a key question is ‘what’s wrong with this product (if anything), and what does the consumer feel that we should do to correct this problem’. When a respondent is asked to evaluate problems, the typical question is known as a sensory directional. The question may be phrased somewhat as follows: "Please describe

. Ethyl hexanoate. Can liner. texture). apple. No chocolate ever has enough ‘real chocolate flavor’. without much depth in the quality of information being generated. from the very general (overall appearance. disguised as sensory attributes. Ethyl acetate Citrus. Typically. Some questionnaires are filled with specifics. it is important that the questionnaire cover the key sensory attributes of the product. product developers use ‘rules of thumb’ by which they translate these just right scales to direction. perhaps because these attributes are hedonic attributes (liking). other questionnaires appear to be probing the surface. taste. but sometimes on target. Acetone Isoamyl acetate. other times off target for taste/flavor. pear. Table 2 shows an example of an attribute list for beer (Clapperton. the respondent does know the optimal or ideal level of darkness or stripes). melon. Where Do Sensory and Other Product Test Attributes Come From? Since much of product testing involves finding a way to have consumers communicate with product developers. Even if the developer were to use a lot of chocolate flavoring. The directions are generally off target for certain types of emotion-laden attributes such as ‘real chocolate flavor’. aroma. perfumy. banana. Note that the respondent is assumed to know both the ‘ideal’ level of dryness/wetness. the product developer knows that often the respondent means that the product is not sweet enough – or. because the question allows them to become experts by telling R&D product developers what to do.Vinous Plastic like. Sensory directionals are surprisingly ‘on target’ for visual attributes (viz. amount of pepper flavor.g. raspberry. Thus when a respondent says that a beverage lacks ‘real fruit flavor’. the product would still taste too bitter. First Tier Term 1 2 3 4 5 6 Spicy Alcoholic Solvent like Estery Fruity Floral Warming. A glance at different questionnaires will reveal a remarkable range of attributes.. even amount of black pepper flavor). geraniol.this product: 1=far too dry… 5 = Perfect on dryness/wetness … 9=far too wet". black currant. down to the very specific (e. Dagliesh and Meilgaard 1975). that changing the amount of sugar will change the fruit flavor. flavor. vanilla Second Tier Term . Respondents often find this type of question fun to answer. strawberry Phenylethanol. Table 2 A comprehensive attribute list for the sensory characteristics of beer. usually on target for texture. and the degree to which the product deviates from that ideal.

bread-crust. cooked onion Meaty Black currant leaves. cooked sweet-corn. dimethyl sulfide. caprylic. mercaptan. cheesy. mineral oil Rancid oil Amine. lightstruck. burnt rubber Parsnip/celery. shellfish Corn grits. iodoform. buytric Vegetable oil. beany. garlic. cooked cabbage. isovaleric. Primings. autolyzed. mealy Walnut. molasses Licorice. roast-barley. coconut. almond Woody 27 28 29 30 . bakelite Buttery Soapy/fatty. smoky Carbolic.7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 Acetaldehyde Nutty Resinous Hoppy Grassy Straw-like Grainy Malty Worty Caramel Burnt Medicinal Diacetyl Fatty acid Oily Rancid Fishy Sulfitic Sulfidic Cooked vegetable Yeasty Ribest Papery Leathery H2S. chlorophenol. tarry. catty. tomato ketchup. syrupy.

many of these attribute lists cover such a wide range of characteristics that only a few attributes are relevant. . characterless. In many corporations there may be a so-called "laundry list" of attributes that have been used over the years by various researchers involved in the product category. there is no fixed list of attributes. sour Flat. Yet. the list chosen is a personal one. All too often. laundry.g. musty Honey. The researcher. or laundry lists.31 32 33 34 35 36 37 38 39 40 Moldy Sweet Salty Acidic Bitter Metallic Astringent Powdery Carbonation Body Earthy. gassy Watery. the experience and insight of the researcher. in recent years there have sprung up other. Civille and Lyon 1995) provide a start for the researcher.. dictated by the specific product. Generally. and then discuss the attributes in the group session. Besides legacy lists. over-sweet Acetic.g. satiating The novice researcher asked to create a product questionnaire often feels at a loss. For many products that are used at home (e. personal care. These lists. many practitioners use focus groups to elicit consumer relevant attributes. The number of attributes emerging from this exercise can be enormous. Unfortunately. appearing in a variety of publications (e. food prepared at the stove) the respondents may be given products ahead of the focus group. and perhaps runs a focus group. and then sample the product. asked to use the products. In the end.. jammy. all the while rating attributes relevant to the product. and the momentary needs for information. more general lists of attributes. requiring that at the end of the exercise the researcher pare down this large number to a workable few that can be used in the questionnaire. Besides legacy lists and focus groups. the laundry list is a legacy list – comprising so many attributes as to be daunting and formidable. however. confronted with the task. The focus groups are set up with consumers who first discuss the product. many practitioners in product testing are loathe to dispense with any attributes lest these attributes somehow later on be discovered to be ‘critical’ to decision making. then simply uses judgment. typically extracting one or two promising attributes from the list going back to the legacy lists.

366 MHz processor in a computer). The category scale is by far the most widely used because it is simple and can be anchored at both ends. If the concept promises a specific flavor (e. The scale sounds easy enough to use. In one variation the respondent reads the concepts.Scales – What’s The Right Metric To Measure Subjective Responses? Over the years researchers have fought with each other. an Italian flavor). We are talking here of a more profound difference in scaling – the nature of the scale. an opportunity to form expectations is given. Many researchers in business often use rank order or ordinal scaling. or a specific performance (e. The interval scale has no fixed zero. or too much of what the concept promises.. so that one cannot calculate ratios of scale values. such as ratio scales (with a fixed zero) do allow the researcher to make these ratio claims. [Sometimes the researcher anchors every point in the scale as well]. aspirational expectations. One cannot even talk about the differences being equal between ranks 1 and 2 versus ranks 2 and 3. the texture of the product. along with non-sensory expectations (e. On a nine-point scale of liking we cannot say that the liking of 8 is twice as much liking as the liking of 4. the standard deviation. such as calculating the mean. One of the key issues to keep emerging in the evaluation of product/concept fit is whether or not respondents really have an idea of what a product should taste like. We saw above that there are a variety of scales to measure liking. perform like. smell like.. and in actuality it is easy (at least respondent say that they have no problems).g. However. such as ordinal scales (viz. The respondent then evaluates one or several products. The scale is also simple – either the product delivers too little. just right.).g. there is no fixed zero point. perform T tests. however. etc. just show an order of merit. Category scales are assumed to be interval scales – the differences between adjacent category points is assumed to be equal up and down the scale. They have to be careful not to interpret differences in ranks as reflecting the magnitude of differences in subjective magnitude. most researchers end up with the scale that they find easiest to use. researchers want to discover whether or not a product and a concept fit together. Products can be tested within the framework set up by these expectations in the product-concept test. In the end. Many academic researchers use ratio scales to study the magnitude of perception. such as sophisticated. regression. etc. often passionately.. The concept sets the expectations for the product. The standard nine point hedonic scale (Peryam and Pilgrim 1957) is an example of a category scale. Weaker scales. rank order these products in degree of liking). about the appropriate scales to use. For each product the respondent rates the degree to which the product delivers what the concept promises.g. When a consumer buys a product there are often some expectations about the appearance. then respondents can easily .. Product to Concept Fit Beyond simply measuring acceptance or sensory attributes. and from time to time ratio scales have been used in the commercial realm of product testing. Interval scales (like Fahrenheit or centigrade) allow the researcher to do many statistical analyses. the taste. and the allowable transformations. Other scales. The actual test execution is quite simple.

That is. Another way to look at base size considers the stability of the mean rating with an increasing number of respondents. and that perhaps the reason for fatigue is ‘boredom’ with the task. that we can evaluate many more samples than is commonly thought. were the study to be repeated. the larger the base size). but the reasons will be different for each person. A base size of 100 might be better (or at least appear so psychologically). whether the attribute deals with liking.g. [Many novice researchers. appears to hold even when the data comprises ratings from a homogeneous population of individuals with the same preference pattern (Moskowitz 1997). such as those developed for perfumes. 300+) for claims substantiation tests (Smithies and Buchanan 1990). the more comfortable the researcher should feel about the results – presuming. it appears that the researcher will not be in particular danger if the base size exceeds 50. Therefore. Scientific research in the senses of taste and smell (where fatigue is most likely to occur) suggest that the human senses are far more robust. and that this number. drops down with the square root of the number of respondents. Each additional rating affects the average less and less (viz. but the mean will not change much from 50 ratings to 100 ratings. preferring rather to do qualitative research instead]. The greater the number of respondents in a study (viz. in fact the consumer . This has led to recommendations or at least rules of thumb dictating 100+ respondents for conventional product tests. it is difficult.. The respondent may try to articulate the reasons for the concept/product fit. 50. by 1/n. of course. where n=number of ratings). of course.ascertain whether or not the product lives up to the concept. For many aspirational concepts. Some of the conventional (but not necessarily correct) ‘wisdom’ avers that the typical respondent can evaluate only two or three samples before fatigue sets in. for it is the only data point. to truly show that a product and concept agree with each other. Base Size – How Many Respondents Are Enough? The issue of base size continues to appear in product testing. but far more (e. of course. The statistician would aver that the reason for the large base size is that it reduces the uncertainty of the mean. The standard error of the mean. Matters can get out of hand. if not impossible. assuming the sampling rules are maintained for choosing the respondents. one feels more comfortable about obtaining the same mean the next time if one uses a large enough base size in the first place. The author has shown that the mean stabilizes at about 50 ratings. that the respondents participating are the correct respondents. or with sensory judgments. rather than some sensory loss. perhaps because base size more than any other factor influences the cost of a project. in fact. If. Base size issues are not as simple as they might appear. however. The first rating is. How Many Products Can A Respondent Validly Taste Or Smell? A recurring bone of contention in product tests is the number of samples that a respondent can evaluate without becoming fatigued.. often become so fearful about the uncertainty that they refuse to do studies unless there is an enormous base size. a measure of expected variation of the mean. the most influential rating in determining the average. if this requirement for uncertainty reduction in itself demands a base size so high that the test is unaffordable.

as long as the respondent is motivated. . Central Location Test Product Pickles Juice Yogurt Coffee Bread Cheese Carbonated soft drink (diet) French Fries Cereal (cold) Milk based beverage Soup Sausages Hamburgers Candy – chocolate Croissants Salsa Cereal (hot) Ice Cream Mousse Lasagna Max Wait Comments 7 10 10 10 10 4 15 10 10 10 10 10 15 15 15 10 10 10 10 15 10 Fat leads to satiety Longer waits necessary for hot salsas Amount ingested has to be watched Fat leads to satiety Combination of fat + sugar yields satiation Amount ingested has to be watched Can become filling Fat leads to satiety Amount ingested has to be watched Fatty residue on tongue After taste can linger Fatty residue on tongue Little adaptation or residual exists Garlic and pepper aftertaste build up Watch out for sugar overload 14 12 12 12 12 10 10 10 10 10 10 8 8 8 8 8 8 8 6 6 Carbonated soft drink (regular) 15 Perhaps some of the concern of fatigue can be traced to what is observed to occur in a corporate environment. when team members in product work (marketers. sensory analysts) meet to evaluate many samples and to decide on next steps. The tasting and evaluation is so haphazard that after a few samples most participants remark that they cannot remember the first sample that they tasted. (See Table 3). then in fact no fatigue sets in. Table 3 Maximum Number Of Products That Should Be Tested (Max). and has prepared for that eventuality. R&D. Four Hour.respondent knows that the task will last an hour. Minimum Waiting Time Between Samples (Wait). and the samples are spaced out with enough time to rinse the mouth. Pre-Recruit. Despite R&D’s best attempts to present the samples in a coherent order the team members end up tasting the samples rapidly.

or prepared appropriately. the interviewer sets up a test station somewhere in the mall (e. the mall intercept method is frequently used. and the refusal rates are increasing. and the respondent task is correspondingly simple (e.. at a store front. but the quality of the data. Nonetheless. Control is necessary when the samples must be tested absolutely "blind" (viz. where the preparation is simple. This is called a ‘refusal’. evaluates the products on attributes. invited to participate for a short interview. or as a hall/pre-recruit test. and records the ratings. in an office right off the mall.g. the respondent may return with the completed ratings (and either be finished with the study or receive a new set of products to evaluate). In a mall intercept. The interviewer may call the respondent on the phone and record the ratings. and then the study begins. The interview can be set up as a mall intercept. Of course the respondent may be well compensated to participate. lasting 5-30 minutes. as shoppers become tired of the interruption. What Is The Best Venue In Which To Test Products? Product evaluation can range from the simple testing of one or two snack chips to the complex preparation and evaluation of many frozen entrees. The respondent is intercepted by an interviewer. etc. which of these two carbonated beverages do you prefer. The simplest product tests are done at home. The respondent takes home one (or more) products.. mail in the ratings. no hint as to brand – hard to do at home.trying to keep all the information in their head. For more elaborate tests. or a test facility. uses the products. the researcher may hire a hall.) Product tests are done in three different types of venues. and pay the respondent to participate for an extended period. or even punch in the ratings on the Internet. Quite often respondents refuse to participate. and the need to test several samples with the same respondent more than compensates for the extra cost. Table 4 compares home use and central location tests Table 4 Central Location Vs Home Use Tests: Advantages Vs Disadvantages Aspect Control Over Ingestion And Evaluation Test Site Vs Typical Consumption Amount Of Product To Be Evaluated Central Location High Unnatural Limited Home Use Test Low Natural Unlimited . unless the product is repackaged). rate each carbonated beverage on these ten attributes.. market researchers have been used to fairly simple types of tests. Traditionally. or when the samples must be tested under tight control. They would be better off simply rating each sampling and then tallying the results at the end of the tasting.g. Sometimes the researcher wants to have more control over the samples. or even at an electronic kiosk). In this extended test the respondent provides a great deal of information.

obtain data from a single individual on far more products than traditionally has been the case. if the study is run on the evening or on a weekend 2 . Most importantly.g. In contrast. enabling the evaluations to proceed at a leisurely pace. With such extensive data. Very little of the interview time is spent briefing the respondent. 3-4 at most). The session may last several hours. [Despite what purists say. done in a pre-recruit central location or at home. and motivation of respondent Must be phone recruited. the researcher has been afforded the opportunity to do far more complex analyses than were ever considered possible. Easy 20-50 Few Yes No. it is fairly straightforward to evaluate these many products. One can almost envision the evaluation session to be a ‘factory’. One need only ensure adequate motivation and time between samples]. and then finishing a session. The approach is set up so that the respondent is pre-recruited and paid to participate (increasing or at least maintaining motivation). with each respondent evaluating many products. the choreographed test session is set up to acquire data from respondents in an optimal fashion. Table 5 Choreography of an Extended Product Evaluation Session Step 1 Activity Study designed to test multiple products within an extended one or two day session Respondents recruited to participate for an extended session Rationale Maximizes amount of data. Sometimes the number of products may be as high as 10-15. Typically not a problem. delivering the product. with most of the time being spent in acquiring the data. Hard 50-100 Choreographing A Product Test Over the past 25 years market researchers specifically. making the respondent feel comfortable.. A typical example of choreographed evaluation appears in Table 5.Number Of Different Products Tested Measure Satiation and Wearout Mix Many Concepts and Many Products Number of panelists/product for stability Many No Yes. along with sensory analysts. the researchers have recognized the value of obtaining data on a large set of products. These activities in a short intercept evaluation may take up 5-10 minutes in a 20-minute interview. have begun to test many different products in a single evaluation session. This product testing. Rather than limiting the test to a few products (e. however. or 25%-50% of the time. in a 120-240 minute interview these 5-10 minutes are negligible.

The ability to compare many products on a scorecard basis creates a deeper understanding of the product category. image attributes. or even fit to different concepts. order of products strictly controlled Data provides information on respondent attitudes. geo-demographics. and the chief interviewer orients the respondent as to the purpose of the study Interviewer guides respondent through first product Usually 25 people show up for the test. Respondents unsure at the start. the scores of many products on the same attribute. Respondents may rate the product on sensory. Where appropriate an interviewer ‘checks the data’ Respondent waits a specified time and proceeds with second product Respondent finishes the evaluations. Respondent maintains ongoing motivation and interest throughout the entire evaluation 4 5 Respondent completes first product. but this checking maintains interest and motivation Waiting. . The products are randomized across respondents Neither the interviewer nor the respondent knows the ‘correct ratings’. with all of the products rated on the same characteristics. but have no problem going through the evaluation.3 Respondent shows up. These profiles or report cards are analyzed in a variety of ways. Fundamentally they require the respondent to rate each product on different characteristics. The orientation is important because it secures cooperation This is slow. or. purchase patterns. The scales are similar. Table 6 shows an example of such data. proceeds to an extensive classification Respondent paid and dismissed 6 7 8 Part 3 – Analyses of Product Test Data Basic Data: Product x Attribute Profiles In the past three decades. liking. with an interviewer guiding the group. and is far easier to comprehend than a series of paired comparisons among all of the products. These report cards or matrices enable the researcher to glance quickly through the data set to determine the scores of a particular product on different attributes. researchers have begun to recognize that there is much to be learned by obtaining profiles of multiple products on multiple attributes. because the interview guides the group through all the attributes. waits for second product. more importantly. etc.

All scales are rated on 0-100. Sensory scales: 0=None at all…100=extreme Liking scales: 0=hate…100=love Product Appearance Like appearance Brown Flecks Tomato Pieces/Amount Tomato Pieces/Size Vegetable Pieces/Size Aroma/Flavor Like Aroma Aroma Like Flavor Flavor Strength Tomato Flavor Meat Flavor Mushroom Flavor Onion Flavor Green Pepper Flavor Vegetable Flavor Herb Flavor Black Pepper Flavor Garlic Flavor Cheese Flavor Salt Taste Sweet Taste Aftertaste Sour Taste Oily Flavor Texture/Mouthfeel Like Texture Crisp Vegetable Texture Oily Mouthfeel Thickness 58 35 29 53 63 34 24 41 60 40 25 47 45 61 61 55 13 16 25 11 18 52 19 30 11 25 37 48 26 26 44 57 57 59 25 24 20 9 18 36 14 20 32 24 28 44 36 23 57 18 76 49 56 36 36 30 36 70 36 34 5 23 19 70 54 24 61 57 59 22 18 11 54 43 30 42 35 15 59 62 73 64 40 38 A B C .Table 6 Example of product x attribute profile.

the sensory range achieved. we should not be surprised to see these strong sensory-liking curves for the flavor attributes. for consumer goods (whether food or health and beauty aids). The equation that describes this relation is: Overall Liking = A + B (Attribute Liking). Conversely. Sometimes the curve will be flat. then peaks. There is usually quite a pronounced relation between flavor/aroma sensory levels and liking. indicates the relative importance of the attribute. Thus. auditory. and tactile senses. Rather. The correlation statistic. and then drops down. The curve. The quadratic function shows where liking tends to peak. these matrices enable the researcher to understand what characteristics ‘drive acceptance’ (or drive other attributes). In days gone by. shows that liking first increases with sensory attribute level. One approach to understand drivers of liking plots overall liking on the ordinate versus attribute liking on the abscissa. Figure 1 How flavor intensity ‘drives’ overall liking. the researcher might simply correlate all of the attributes with overall liking to determine which particular attribute drives liking. shown in Figure 1 (Moskowitz 1981). a quadratic function. High values of slope B mean that the attribute is important – unit increases in the liking of the attribute correspond to high increases in overall liking. attributes such as good quality and good taste correlated with overall liking. assuming as it does a linear relation between two variables.Drivers of Liking and The Inverted U Shaped Rule for Products At a higher level of analysis. B. suggesting that as the sensory attribute increases. Sometimes the curve will show an inverted U shaped curve. . The slope. liking actually decreases. More recently. and fits a quadratic function to the data. but not for many of the appearance or texture attributes. showing that although respondents differentiate the product on a sensory attribute. and reveals the degree to which liking co-varies with a single sensory attribute. the sensory differences do not drive liking (at least on a univariate basis). and fit a curve to the data. smell) provoke more hedonic reactions (like Vs dislike) than do the visual. other times the curve will increase and perhaps flatten out slightly as if it were reaching an asymptote. The researcher creates a scattergram. and the preferences of the respondents. but less of a relation between texture or appearance and liking. We know from basic research (Flavor – Hedonic Levels) that the chemical senses (taste. it has been the fashion to plot overall liking versus sensory attributes. often failed to show the importance of sensory attributes as drivers of liking. In other situations the curve will show a downwards sloping relation. The specific form of this inverted U shaped curve will depend upon the specific product being testing. low values of slope B mean that the attribute is unimportant (Moskowitz and Krieger 1995). This is somewhat of a tautology.

liking first increases. and by key subgroups. these key subgroups comprise individuals falling into different geo-demographic classifications. and not based upon the magnitude of liking. It is important to keep in mind that the segmentation is based upon the sensory level at which a person’s liking rating peaks. Psychologists interested in the basic science of perception have introduced newer concepts of segmentation.viz. by geo-demographics. into different brand usage patterns (e. When applied to product test results (more than 6 products must be evaluated by each respondent) . do not show particularly striking differences in the consumer responses to products. [A short digression is appropriate here. For the most part these segmentations. overarching psychological schemata.g. That is. Respondents showing similar sensoryliking patterns may fall into different segments if one respondent assigns high numbers to products and the other respondent assigns low numbers to the same products]. Some of these segmentations also may be based upon responses to a battery of questions. The fundamental idea behind sensory-based segmentation begins with the previously suggested relation . It should come as no surprise that product test results are often reported by total panel.. peaks (at some optimal sensory level). or by other. and then drops down with further increases in sensory attribute (Moskowitz. could introduce artifacts. heavy Vs light users of Brand Z. users versus non-users). Jacobs and Lazar 1985). which divide people into groups based upon pre-set classifications (Mitchell 1983). in a product test these different subgroups of individuals appear to show quite similar reactions to the products themselves. rather than relations between liking and sensory attribute. users of brand X Vs brand Y. Traditionally. whether by brand.Sensory Preference Segmentation Marketers are the first to recognize that consumers differ from each other. as a sensory attribute increases. Segmentation based upon the pattern of liking ratings..

The geometrical space may comprise one. which shows data for coffee). The left panel shows how overall liking for respondents from each of five countries (1-5) is driven by perceived bitterness. two. individuals who like weak flavored. Figure 2 Sensory segmentation – results from a five-country study of coffee. The right panel shows the same liking-bitterness relation. have the same values. albeit in different proportions. (See Figure 2. three or many more dimensions (although it is difficult to visualize a space of more than three dimensions). That is. and values. each segment shows a different sensory level at where liking peaks. There may be individuals who like strong flavored products. and the like. This organizing principle for product development.this segmentation. emerging from product test results (along with emerging from basic psychophysics) is forming the foundation of new approaches for consumer research and applied product development. Product mapping is done typically with disconnected products. Note that each segment is represented in each country. lighter colored products. . Consumers in different segments may use the same brand. brand usage. based upon the sensory-liking patterns. Product Mapping and Creating A Product Model From the Map At an even higher level of analysis the matrix of product x attribute rating is used to map products into a geometrical space (see Figure 3). reveals that there are often two. three or more groups in the population with distinct sensory preference patterns. etc. Sensory preference segments transcend geo-demographics. this time using data from sensory segments obtained from the five countries.

such as liking. The locations of the product in the map are usually based upon the sensory properties of the products. Such an approach using mapping and modeling (with factor scores as independent variables. sensory rating] with the independent variables being the factor scores. Second. when the objective is to understand the categories. image rating. As a consequence the researcher builds a set of equations (one per attribute. discover drivers of liking. the dimensions) act as independent variables in a regression model. Figure 3 Example of mapping a product category (ointment). with products located close together being sensorially similar to each other. and estimate the full sensory profile and liking rating corresponding to the location in the map. based upon a principal components of the sensory attributes. aptly named because the approach considers the full range of products currently in a category. with the outcome being at least two dimensions (or factors). the map serves double duty. being factor scores. When the coordinates of the brand map are factor scores (from factor analysis). and all rating attributes as dependent variables) typically would occur early in the research process. Products are circles. the map locates the products in a space as the heuristic to visualize.usually those current in the product category. Holes in the . This type of analysis can only occur if the researcher tests multiple products (at least six. At the end of the day the researcher is now able to identify a location in the map. without regard to the connectedness of their underlying formulation. Munoz. the coordinates of the map (viz. [The models are not just linear equations. but also quadratic equations].. but occasionally with prototypes (so called category appraisal. Brand managers and product developers use these brand maps as heuristics to understand how the consumer perceives the products as a set of interchangeable (or noninterchangeable) items. [The map coordinates. but preferably 10 or more). and then identify unfilled holes in the product category. Chambers and Hummer 1996). are parsimonious and statistically independent so that no violence is done to the data]. with the size proportional to liking. First.

because the optimization model uses the relation between variables under the developer’s control and consumer rated attributes. consumer liking. Product developers use these designs to create arrays of products which comprise known variations (e..g.g. and by obtaining reactions (e. statisticians developed systematic arrays of variables with the property that the variables were independent of each other. Hunter and Hunter 1978. features under the developer’s control). This is called R-R or response-response analysis. the developer can link the variables under direct operational control to what the consumer responds.. cost of goods. Product Optimization Originally. as are sensory attributes). Plackett and Burman 1946).. yield.. Khuri and Cornell 1987. This is called S-R or stimulus-response analysis. attention has focused on the variables under the developer’s control (rather than on the sensory-liking relation). sensory attributes and even image attributes)..category are locations without any products. because the optimization model uses relations between two dependent or response variables (viz. . By varying the independent variables (viz. These are called experimental designs (Box. Several decades ago. Furthermore. optimization was done using the relation between liking and sensory attributes (Schutz 1983). however. It is a short step from the experimental design + consumer reaction to modeling. and another short step from modeling (or equation building) to optimizing (viz. More recently. these experimental designs lend themselves to statistical modeling by regression. in ingredients or in process conditions). liking is a rated attribute.

The right system (3 variables. the formulation must stay within the range tested) and subject to implicit constraints (viz. Each variable (A-K) appears at two options (1=new. one or more attributes of the product. respectively). such as a sensory attribute or even cost of goods must remain within specified constraints). Table 7 shows two experimental designs – one a screening design (allowing the researcher to investigate many different variables. Six ingredients were varied systematically according to a central composite design. high.. subject to explicit constraints (viz. and allowing the researcher to optimize liking or any other variable. on. . The left system (11 variables. A-K. A-E in 15 runs.). PROD 1 2 3 4 5 6 7 8 9 10 11 12 A B C D E F G H 1 1 0 1 1 1 0 0 0 1 0 0 1 0 1 1 1 0 0 0 1 0 1 0 0 1 1 1 0 0 0 1 0 1 1 0 1 1 1 0 0 0 1 0 1 1 0 0 1 1 0 0 0 1 0 1 1 0 1 0 1 0 0 0 1 0 1 1 0 1 1 0 0 0 0 1 0 1 1 0 1 1 1 0 0 0 1 0 1 1 0 1 1 1 0 0 I 0 1 0 1 1 0 1 1 1 0 0 0 J K 1 0 1 1 0 1 1 1 0 0 0 0 0 1 1 0 1 1 1 0 0 0 1 0 PROD A B 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 3 3 3 3 1 1 1 1 3 1 2 2 2 2 2 3 3 1 1 3 3 1 1 2 2 3 1 2 2 2 C 3 1 3 1 3 1 3 1 2 2 2 2 3 1 2 Table 8 Example of an optimization run (for pizza). etc. and the other a more complex experimental design (called a Box Behnken design). The Box Behnken or response surface design enables the researcher to create a quadratic model relating formulation to liking. cost and other attributes. Table 7 Example of two experimental design systems commonly used in product development. off.. 12 runs) is a Plackett-Burman screening design. is a central composite design.identifying the combination of the variables under design that produces the highest liking. The results led to a product model. It is used to identify optimal levels of a process or an ingredient). each at two levels). 0=old. or the highest liking constrained to a given cost. The design is used to identify key drivers of a response. low. Table 8 shows results for the optimization of a commercial pizza which varied on six formula variables.

42 1.g.78 0.Flavor Crust .00 1.00 2.00 2. creates a design. liking ratings.22 1.21 2.Amount Cheese . The product development process used to be far more circuitous and tortuous.00 2.18 0.31 0.. sensory ratings.59 3. and identifies either the optimum product (meeting consumer needs). the researcher testing it and providing feedback (e.00 Consumer research has taken 25 years to recognize the value of experimental design and product optimization. the researcher evaluates these products. or shows that . directional ratings.07 0. With experimental design the developer creates a set of products.02 0.25 0.00 2.).Hardness Mushroom – Amount 286 76 65 76 61 63 54 45 21 199 72 11 73 62 69 58 45 20 150 69 9 68 58 69 58 45 20 174 71 5 77 63 67 57 46 14 1.20 0. Optimize Constraint # 1 Constraint # 2 Total None None Total None Total None Total Cheese Flavor >63 Cost<200 Cost<150 Cost<175 Ingredient Pepperoni Mushroom Sausage Beef CheeseA CheeseB Ratings Cost Liking . and then everyone returning to the drawing boards to give it one more try. with the developer creating a prototype.67 1. Data from optimization studies let the researcher quickly identify the most promising formulations.29 1.00 2.Amount Crust .99 0.30 2.56 3. etc.27 1.Bready Crust .interrelating formulations (under the developer’s control) and ratings (acquired by the researcher).Total Beef – Amount Cheese .

but usually the procedure generates an optimum fairly quickly (viz. especially for fast moving consumer goods of different sensory characteristics. We know that there is variety seeking among consumers (van Trijp 1995). ethnographic product testing looks at how respondents use specific products. The Estimation of Volumetrics Many marketers use product-concept tests to estimate the amount of a product that they will sell. we have dealt exclusively with paper and pencil (or computer based) questionnaires. Emerging Product Test Methods – Observation Up to now. Like ethnographic methods. in one or two iterations).. Optimization procedures. researchers have developed databases and then models showing the relation between concept scores (before product usage). because the data generate both a database (for future use). and human factors (a branch of psychology that looks at the behavioral interaction of a person and a machine. expanded by Bases (1999) use models that interrelate many of the marketing variables. product scores (during usage). Through years of experience. or get bored with the same soap fragrance. either defined as short term ‘satiation’ (just can’t eat or drink any more). The process repeats. Right now the goal of ethnographic product research is to identify the ways that products are used. or the same cereal. in order to improve them. long in coming to consumer research. so that eventually consumers habituate. is known as ‘satiety evaluation’. or a person and a stimulus). Recently. Ethnography looks at behavior in the normal environment. concrete action (to solve the current problem in an actionable fashion). but it is a trend that bears watching. especially in R&D facilities of larger companies. and estimated volume. as well as immediate. satiety and boredom aspects of product testing are just in their infancy. Similarly. These approaches. provides one of the most promising technologies for improving the usefulness of product tests. Ethnographic research is not typically (but is sometimes) used to evaluate different products. occasionally by product developers using small-scale R&D efforts (called Research Guidance Panels). and long term ‘habituation’ (just got tired of the taste. but these new methods are part of the evolving growth of product research. The goal here is to identify the factors that make consumers stop consuming a product. the ethnographic approach to product testing has begun to gain favor. Product optimization is occasionally run by market researchers. Concept scores and product scores . and buy another fragrance. Ethnographic research is only in its infancy for product evaluation. Consumer researchers are now turning their attention to identifying the factors underlying this boredom. The jury is out as to what will be found. another cereal. or to come up with new ideas. and wanted to try something new). We know that people stop eating foods when they are satiated. Emerging Product Test Methods – Satiety Evaluation Home use evaluation is supposed to capture the ‘real’ nuances of product use. or the same product but in a different flavor. One area of evaluation that is coming into play. We can trace ethnographic research back to a combination of two sciences – anthropology (which looks at the behavior of people in their environment).the optimum lies further out.

As a consequence. Manufacturers are sponsoring short in-house courses on the importance of package design. and distribution. Companies might launch new products. This system assesses the appeal and market potential for products. as well as estimate volume for Years 2 and 3. Part of this lack of research stems from the perception that the package is not particularly important – it is just a vehicle by which to enclose the product (especially in fast moving consumer goods). because that research. 1999). respectively. volumetric testing incorporates many more aspects of the marketing mix. perhaps because many more dollars are spent on advertising. Until recently. Part of the lack of research can be traced to the design houses. positioning. so it is immediately more important to be ‘right’]. however. [No design house wants consumers to design the package for them]. like qualitative in advertising. with increased competition in many categories. There is no search for patterns because these studies concentrate on one product. there is a paucity of research literature on package design. once the bastion of the artist. however. based upon the reaction of consumers to both the concept (first … simulating the marketplace reality of advertising first). little attention had been given to the product package. and involve their in-house package design professionals as well as outside design houses. are welcoming quantitative research. [We can compare this dearth of literature to the extremely high volume of literature on advertising testing. Package design firms are also adopting a more holistic approach to the design process. not just the product itself. Other factors considered are awareness (or some surrogate of advertising). which perceive their work as artistic. Part 3 – Package Design Research A Short Overview Package research is often the neglected child in consumer testing of price. the widespread popularity of these test market simulators have led to their use as ‘de facto’ product-concept tests. Bases® II estimates Year 1 sales volume after launch. incapable of judgments. and a number of well respected companies specializing in package research. [It should be acknowledged that these design firms always welcomed qualitative research. Today. According to researchers at Cheskin (a brand identity and . much of the research done was either qualitative (focus groups) or none at all. These market models differ from the developmental models previously discussed because the market models come into play at the end of the development process. Quite often. Unlike conventional product-concept tests that look only at the fit of the product and concept. In contrast to the methods described above. and the response from many individuals. However. Package design firms. one concept. The most widely used volumetric modeling is Bases® II (AC Nielsen Bases. product.are only two of the variables. and researchers are rising to the occasion. the market potential analysis calls into play many factors. followed by reaction to the product (again simulating marketplace reality). although there are scattered articles here and there. and package. the package is assuming renewed importance. probed and revealed aspects of the package that were important for the designer].

First. and the listener can select specific phrases to suit his/her own purpose and agenda. In all of these objectives package testing must provide some idea as to whether or not the new package is successful. This type of testing typically involves attitudinal . can be very valuable in the creative process. what would you say about the beer?"… What Should Package Testing Provide? Package testing serves a number of purposes. the conventional package evaluation consisted of focus groups. does not come out with hard and fast results. Focus groups. properly conducted. commenting on the design of a beer package (1999). and can talk about the coherence between the package itself and the brand. Profiling Packages At the simplest level the researcher can determine whether or not the package is acceptable to the consumer. Second. and can cover a great deal of territory in a warm.. Consequently. can identify key features that elicit interest. A sensitive package designer gets a great deal out of focus groups because the designer can see and hear how the consumer reacts to the packages. The most important purpose for package testing is to confirm or disconfirm the objectives of the package designer.What does the package say about the beer inside?…How should your client position the product?…If you just look at the packages. A modicum of sensitivity to the creative process is also in order. Respondents like to tell designers what to do – even if the designers aren’t really interested in following the sage advice. they can be misused.. It should come as no wonder that this type of qualitative research has been welcomed by designers. research can actually help you create better design – not by dictating color and form.design firm). since package testing often reflects on the creative abilities of the designer – who is both an artist and a business-responsive individual. Focus groups can. informal manner. In a focus group people say many different things. how do you know that your design will deliver the goods?. but by informing your intuition…When your job is to make a product sell itself at the point of sale.. In a package design focus group the respondent can verbalize reactions to package features. communicating new benefits. and even more importantly. Focus Groups For many years. however. By presenting different possible packages. or fits the brand. although consumers could reject the package). not judged by another professional. "Beyond helping you manage clients. The focus group is private. the designer can see which ones ‘work’ and which do not. generally in a nonthreatening. non-judgmental manner (viz. because in essence it reflects how they would intuitively go about obtaining feedback about their creations. (Glass 1999). especially in the up-front developmental stage. respondents often like to ‘play designer’. Focus groups provide the designer with a great deal of feedback. backfire in several ways. there may develop an antagonism between client/respondent (with the client wanting to follow the respondent’s suggestions) and the designer (who has a basic artistic and business idea in mind and simply wants feedback). enhancing the chances that the product will be selected. Typically package designers create packages (either graphics and/or structures) with some objective in mind – such as reinforcing the brand.

and obtains a profile of ratings. This action then constitutes the first step in selecting the product. and that it communicate an effective or good tasting product. Typically the designer receives only the simplest of '‘briefs’ or project description. Eye tracking allows the researcher to identify the pattern that the follows. then the researcher can obtain ratings of person-package interaction (ranging from ease of carrying or gripping the product. Elliot Young (1999) has demonstrated that in these T-scope tasks the speed at which the stimulus information is available is often so low that the respondent has to use color and design cues.). Other ratings may deal with one’s expectation of the product based upon the package. Some of the ratings may deal with interest in the package (or in the product. with its concentric halos) are rapidly recognized. Behavioral Measures – T-Scope and Speed of Recognition At a more behavioral level the package testing may involve behavioral measures. The researcher assumes that those packages that are perceived in the short interval permitted by the T-Scope have shelf presence. similar to the way that the researcher obtains product ratings. Tide® brand detergent. The researcher shows the package to the respondent. In order for a package to make its impact it is important that the package ‘jump off’ the shelf (visually). The objective of package designers is to guide the consumer to the package. then one can present single packages and determine the fastest shutter speed (viz. during a very short exposure time. If the research interest focuses on the recognizability of the single package.measures.g. In some demonstrations. The key differences are that the researcher may obtain profiles of the ‘expectation’ of the product (based upon the package). ease of removing the product. etc. The typical shelf is an extremely complex array of package design over which the consumer’s eye wanders. Well-known brands with distinctive package features (e. The data from the evaluative package tests provide the diagnostics to demonstrate whether or not the package as designed actually lives up to the requirements set by the package design group at the client. If the respondent actually uses the product in the package. ease of storing the package. then the researcher can place the test package at different locations on the shelf and then parametrically assess the contribution of package design and location on the shelf as joint contributors to "findability". such as the requirement that the package be upscale. The attributes can vary substantially from one product category to another. who wants to find out whether or not the package is ‘on target’. as well as ratings of the package itself. rather than brand names. the least time needed) for the package to be correctly identified. One behavioral measure is the ability to identify whether or not a package is actually on the shelf. that it live up to the brand. ease of opening.. even if the brand name is incorrect. Is the eye drawn to the client’s particular package? Does the eye wander away from the package (thus diminishing the chances that the customer will select the . based upon exposure to the package). The rationale behind this type of testing (called TScope or tachistoscope testing) is that the typical shopper spends relatively little time inspecting a store shelf. If the research interest focuses on the ‘findability’ of the package on the shelf. Eye Tracking and The Features of A Package Eye tracking is another method for testing the shelf. This type of information is extremely valuable to the package designer.

. there is far less in the literature (and in actual practice) using systematically varied package features. colors. Optimizing Package Design – Can Art and Science Mix? One of the most intriguing developments is the advance of conjoint measurement (systematic stimulus variation) into package design. and for how long key messages are looked at (but not whether these are good or not). etc. Young and his associates have amassed a variety of generalizations about where on the shelf the package should be. Some of the power of the conjoint approach applied to graphics design can be seen by inspecting Figure 4 (showing template. explores the stimulus.product)? Young 1999 has also demonstrated the use of eye tracking technology. and for how long. D = juice package corresponding to the specific set of designed features . The eye tracking technology traces the location of the line of site. It shows how the consumer. and from the ratings researchers estimate the part-worth contribution of each component. an over the counter medicine). The conjoint system enables the designer to rapidly understand the effectiveness of each concept element. the eye tracking technology can show if and when. how many facings the package should have. and a table of utilities).g. graphics..). etc. created on the computer screen). and records what the eye sees. tracks how the eye wanders.. When done for a single stimulus (e. however. based upon military research developments in the 1970’s. A = categories or sets of features.g. components. benefits. prices. Of course attention must always be paid to the artistic coherence of the design.g. and by discarding poor performing elements. the researcher and package designer quickly discover what every design feature contributes to consumer interest. communication. etc. but this time the components are features of the package (e. by incorporating high performing graphic elements. From the responses of consumers to test packages (e. systematic variation of components) has begun to enter package research. when presented with either a shelf set or a single package. Figure 4 Components of package design research. and then to create new and better combinations. The same research paradigm (viz.g. B = specific template used to embed graphic features. resisting research on the one hand. The respondent evaluates full combinations of these components (e. and the concepts are full packages comprising these systematically varied features. one finished package. Conjoint measurement comprises the experimental variation of components in a concept in order to understand the underlying dynamics of how the components perform. Based upon these analyses. C = features in specific designed package. different names.). Unlike concept testing using conjoint measurement. When done for the entire shelf the eye tracking technology can identify whether or not the package is even looked at.. etc. The best explanation for this is that package design is still an artistic endeavor.. and yet welcoming research insights on the other.

" in The History Of Experimental Psychology. Box. package research development is still in its infancy.E. Statistics For Experimenters. References AC Nielsen Bases. That unhappy situation is beginning to remedy itself as the competitive environment forces recognition of package research as a key new source of information. (1929). Bases.Hunter. E. J. G. useful to maintain or to achieve competitive advantage in the marketplace.com).. Up to now it was the artistic aspects of package design. . (1999). New York: John Wiley. New York: Appleton Century Crofts.P. and S. Boring. Package Design – An Overview In contrast to product (and to concept) development. researchers are only beginning to recognize the importance of the "package" as a key factor. "Sensation and Perception.G... along with the role of the package as a secondary factor that hindered the full flowering of package research. Sales Brochure (from the internet… www. Hunter (1978). For the most part.

. Civille.Civille and B. H. Westport: Food and Nutrition Press. Lancaster1.. Cosmetic Product Testing: A Modern Psychophysical Approach. M. Moskowitz. New Directions In Product Testing And Sensory Analysis Of Food. Inc. (Spring). Petersen and A. Gacula.R.V. Westport: AVI. and B.R. Response Surfaces. The Nine American Lifestyles. H. M." Food Overall Quality and Preference. G. 11.R. Moskowitz. Glass. Johnson. (1995). Product Testing and Sensory Evaluation of Food: Marketing and R&D Approaches.R. Cornell (1987). "Progress TowardsAn International System of Beer Flavor Terminology. (1995). Moskowitz. Personal Communication Khuri. ed. and J. Alan. FlavLex. 89-95. Moskowitz.T Carr (1987).E. West Conshohocken: copyright Softex. (1985). Version 1. 12: 273-280. H. "Experts versus Consumers.Dagliesh and M.C Meilgaard (1975). J. 8. H. H. Mitchell. Trumbull: Food And Nutrition Press.J.248." in Encyclopedia of Food Science. Food Concepts and Products: Just In Time Development. M. (1978). H. "Scales for Measuring Food Preference.. (1994).S." in Critique: The magazine of design graphic thinking. Boca Raton: CRC Press.G.A.L. Meilgaard. eds.R.. (1999).H. H. Clapperton. (1993). Sensory Evaluation Techniques. "Relative importance of perceptual factors to consumer acceptance: Linear versus quadratic analysis. Westport: Food And Nutrition Press. Moskowitz. "Base Size in Product Testing: a Psychophysical Viewpoint and Analysis.. C. (1984). New York: Marcel Dekker Inc.15 . G." Journal Of Food Science." Journal of Sensory Studies. 1995b. 675-678.. New York: MacMillan. 244.Cheskin. Design and Analysis of Sensory Optimization. Moskowitz. 19-35. A.R. (1997). A." Master Brewers Association Of America Technical Journal. (1981). Moskowitz. Lyon. New York: Marcel Dekker Inc. H.R.C. Trumbull: Food and Nutrition Press. Jr. American Society for Testing and Materials. 247-256. Meiselman. 46. (1983). (1983). "Five Bottles Of Beer.

L. New York: Council Of Better Business Bureaus." Food Technology. Peryam. Transcript Proceedings.L. Stone.G.Theory With Applications. L. in Food Acceptance ed. 11.." Journal Of Food Science.M. London: Elsevier. and J. (1983).H." Psychological Review 34. S. 261-294. 33.Moskowitz. Krieger (1995). and. "Multiple Regression Approach to Optimization.A. Van Trijp. "International Product Optimization: A Case History. Psychophysics: An Introduction To Its Perceptual. Personal Communication.L. New York: John Wiley.S. and B. 47-62. 9-14. H.E. NAD Workshop. and F. 168-191. 6. "Variety-Seeking In Product Choice Behavior . (1964)." Journal of Food Quality. 83-91." Food Technology. 202-213.G.. D. Smithies." Food Quality and Preference..D. Neural And Social Prospects." Journal of Sensory Studies. Pp. and B. (1975).R. H. Young. B. 37. Moskowitz. and J. Wageningen. Sensory Evaluation Practices. Schutz.C.M. "The Design of Optimum Multifactorial Experiments.M. Thurstone. S. Munoz. Schutz.. . New York: John Wiley. Lazar (1985). Substantiating A Taste Claim. 9. "A Food Action Rating Scale for Measuring Food Acceptance. B. (1989). Moskowitz. Thomson. Sidel (1985). The Netherlands. (1995). "Product Response Segmentation and the Analysis of Individual Differences in Liking. H. 11. "The Contribution of Sensory Liking to Overall Liking: An Analysis of Six Food Categories. Stevens. "Hedonic Scale Method of Measuring Food Preferences." Biometrika. "A Multifaceted Category Research Study: How to Understand a Product Category and its Consumer Responses. Mansholt Studies. E. 30.J.G. 443-454. D. 305-325. A. R.. Chambers IV and S. J. Hummer (1996). H. 273-286.R. Plackett. Schutz. Burman (1946). and the Advertising Research Foundation. R.R. Jacobs and N.R. "A Law of Comparative Judgment. 115134.. H. Beyond Preference: Appropriateness as a Measure of Contextual Acceptance of Food. 8.H. Buchanan (1990).. H. H. Pilgrim (1957). E. 1999. 1." Food Quality and Preference. (1927)." in The Food Domain. Krieger (1998).

He has written/edited eleven books. and the globalization and democratization of concept development for small and large companies alike.75pt.4pt"> Measure Satiation and Wearout No Yes Mix Many Concepts and Many Products Yes.S.border-bottom:solid windowtext . in order to accomplish product optimization and reverse engineering. Whereas these methods are standard and well accepted today. Among his important contributions to market research is his 1975 introduction of psychophysical scaling and product optimization for consumer product development. they required a massive culture change in the 1975 business community.border-left: solid windowtext 1.5pt. Moskowitz founded a $2. and abroad. Army Natick Laboratories. In the 1980's his contributions in sensory analysis were extended to health and beauty aids. Dr. Dr. Moskowitz has also developed and refined procedures which enable research to interrelate products. Prior to that he graduated Queens College (New York).borderright: solid windowtext . integrated and accelerated development (DesignLab® )). and serves on the editorial board of major journals. his research and technology developments have led to concept and package optimization (IdeaMap® ). concepts. experts and physical test instruments. Hard Number of panelists/product for stability . and an inventor of world class market research technology. He has won numerous awards. published well over 180 articles. In 1992 Dr. among them the Scientific Director's Gold Medal for outstanding research at the U.D. a firm he founded in 1981.HOWARD R. transaction-oriented approach (IdeaMap® ) Wizard). has lectured in the U.75pt. Phi Beta Kappa. administered through the Association of Chemoreception Scientists. :none. Easy No.padding: 0in 5. in experimental psychology. Moskowitz is simultaneously a renowned experimental psychologist in the field of psychophysics (the study of perception and its relation to physical stimuli). MOSKOWITZ Howard Moskowitz is president and CEO of Moskowitz Jacobs Inc. with degrees in mathematics and psychology. in an affordable. consumers. Finally. Dr.75pt.S.000 prize for young scientists working in the psychophysics of taste and smell.4pt 0in 5..mso-border-top-alt:solid windowtext . Moskowitz graduated Harvard University in 1969 with a Ph.

Sometimes the number of products may be as high as 10-15. Rather than limiting the test to a few products (e. the researcher has been afforded the opportunity to do far more complex analyses than were ever considered possible.. The approach is set up so that the respondent is pre-recruited and paid to participate (increasing or at least maintaining motivation). along with sensory analysts. enabling the evaluations to proceed at a leisurely pace. and then finishing a session. 3-4 at most). These activities in a short intercept evaluation may take up 5-10 minutes in a 20-minute interview. with most of the time being spent in acquiring the data. and . the researchers have recognized the value of obtaining data on a large set of products. This product testing.g. in a 120-240 minute interview these 5-10 minutes are negligible. obtain data from a single individual on far more products than traditionally has been the case. The session may last several hours. With such extensive data. have begun to test many different products in a single evaluation session. Most importantly. Table 5 Choreography of an Extended Product Evaluation Session Step 1 Activity Rationale Study designed to test multiple Maximizes amount of data. with each respondent evaluating many products. or 25%-50% of the time. delivering the product. A typical example of choreographed evaluation appears in Table 5. done in a pre-recruit central location or at home. making the respondent feel comfortable. One need only ensure adequate motivation and time between samples]. however. it is fairly straightforward to evaluate these many products. One can almost envision the evaluation session to be a ‘factory’. In contrast. the choreographed test session is set up to acquire data from respondents in an optimal fashion. [Despite what purists say.20-50 50-100 Choreographing A Product Test Over the past 25 years market researchers specifically. Very little of the interview time is spent briefing the respondent.

geo-demographics. and the chief Usually 25 people show up for the test. order of products strictly and proceeds with second product controlled Respondent finishes the evaluations. Respondents unsure at the start. but have no problem going through the evaluation.2 3 4 5 6 7 8 products within an extended one or motivation of respondent two day session Respondents recruited to participate Must be phone recruited. These report cards or matrices enable the researcher . etc. These profiles or report cards are analyzed in a variety of ways. Respondent paid and dismissed Respondent maintains ongoing motivation and interest throughout the entire evaluation Part 3 – Analyses of Product Test Data Basic Data: Product x Attribute Profiles In the past three decades. Fundamentally they require the respondent to rate each product on different characteristics. because the interview through first product guides the group through all the attributes. Where respondent knows the ‘correct ratings’. interviewer orients the respondent as The orientation is important because it to the purpose of the study secures cooperation Interviewer guides respondent This is slow. Data provides information on proceeds to an extensive respondent attitudes. appropriate an interviewer ‘checks but this checking maintains interest and the data’ motivation Respondent waits a specified time Waiting. The scales are similar. with all of the products rated on the same characteristics. The products are randomized across respondents Respondent completes first product. if the study is run on the evening or on a weekend Respondent shows up. with an interviewer guiding the group. Typically not a for an extended session problem. researchers have begun to recognize that there is much to be learned by obtaining profiles of multiple products on multiple attributes. Neither the interviewer nor the waits for second product. classification purchase patterns.

image attributes. All scales are rated on 0-100. liking. Respondents may rate the product on sensory. Table 6 shows an example of such data. The ability to compare many products on a scorecard basis creates a deeper understanding of the product category.to glance quickly through the data set to determine the scores of a particular product on different attributes. Table 6 Example of product x attribute profile. or even fit to different concepts. and is far easier to comprehend than a series of paired comparisons among all of the products. or. more importantly. the scores of many products on the same attribute. Sensory scales: 0=None at all…100=extreme Liking scales: 0=hate…100=love Product Appearance Like appearance Brown Flecks Tomato Pieces/Amount Tomato Pieces/Size Vegetable Pieces/Size Aroma/Flavor Like Aroma Aroma Like Flavor Flavor Strength Tomato Flavor Meat Flavor Mushroom Flavor Onion Flavor Green Pepper Flavor Vegetable Flavor Herb Flavor Black Pepper Flavor Garlic Flavor Cheese Flavor Salt Taste Sweet Taste A 61 57 59 22 18 11 45 61 61 55 13 16 25 11 18 52 19 30 11 25 37 B 54 43 30 42 35 15 44 57 57 59 25 24 20 9 18 36 14 20 32 24 28 C 59 62 73 64 40 38 57 18 76 49 56 36 36 30 36 70 36 34 5 23 19 .

liking actually decreases. Sometimes the curve will show an inverted U shaped curve. often failed to show the importance of sensory attributes as drivers of liking. The equation that describes this relation is: Overall Liking = A + B (Attribute Liking). a quadratic function. low values of slope B mean that the attribute is unimportant (Moskowitz and Krieger 1995) More recently.Aftertaste Sour Taste Oily Flavor Texture/Mouthfeel Like Texture Crisp Vegetable Texture Oily Mouthfeel Thickness 48 26 26 58 35 29 53 44 36 23 63 34 24 41 70 54 24 60 40 25 47 Drivers of Liking and The Inverted U Shaped Rule for Products At a higher level of analysis. The correlation statistic. and fit a curve to the data. Sometimes the curve will be flat. There is usually quite a pronounced . B. and then drops down. and the preferences of the respondents. these matrices enable the researcher to understand what characteristics ‘drive acceptance’ (or drive other attributes). indicates the relative importance of the attribute. suggesting that as the sensory attribute increases. it has been the fashion to plot overall liking versus sensory attributes. The specific form of this inverted U shaped curve will depend upon the specific product being testing. assuming as it does a linear relation between two variables. other times the curve will increase and perhaps flatten out slightly as if it were reaching an asymptote. then peaks. showing that although respondents differentiate the product on a sensory attribute. High values of slope B mean that the attribute is important – unit increases in the liking of the attribute correspond to high increases in overall liking. The slope. In days gone by. the sensory differences do not drive liking (at least on a univariate basis). This is somewhat of a tautology. shown in Figure 1 (Moskowitz 1981). shows that liking first increases with sensory attribute level. The curve. In other situations the curve will show a downwards sloping relation. Conversely. the sensory range achieved. attributes such as good quality and good taste correlated with overall liking. One approach to understand drivers of liking plots overall liking on the ordinate versus attribute liking on the abscissa. the researcher might simply correlate all of the attributes with overall liking to determine which particular attribute drives liking. Rather.

We know from basic research (Flavor – Hedonic Levels) that the chemical senses (taste. but not for many of the appearance or texture attributes.relation between flavor/aroma sensory levels and liking. Figure 1 How flavor intensity ‘drives’ overall liking. The researcher creates a scattergram. Sensory Preference Segmentation . but less of a relation between texture or appearance and liking. auditory. and tactile senses. smell) provoke more hedonic reactions (like Vs dislike) than do the visual. for consumer goods (whether food or health and beauty aids). The quadratic function shows where liking tends to peak. Thus. and reveals the degree to which liking co-varies with a single sensory attribute. and fits a quadratic function to the data. we should not be surprised to see these strong sensory-liking curves for the flavor attributes.

in a product test these different subgroups of individuals appear to show quite similar reactions to the products themselves. Sensory preference segments transcend geo-demographics. which divide people into groups based upon pre-set classifications (Mitchell 1983). It should come as no surprise that product test results are often reported by total panel. Respondents showing similar sensoryliking patterns may fall into different segments if one respondent assigns high numbers to products and the other respondent assigns low numbers to the same products].viz. or by other. whether by brand. . individuals who like weak flavored. That is. each segment shows a different sensory level at where liking peaks. This organizing principle for product development. and not based upon the magnitude of liking. (See Figure 2. which shows data for coffee). and then drops down with further increases in sensory attribute (Moskowitz. Consumers in different segments may use the same brand. users of brand X Vs brand Y. rather than relations between liking and sensory attribute. overarching psychological schemata. these key subgroups comprise individuals falling into different geo-demographic classifications. by geo-demographics. liking first increases. and values. do not show particularly striking differences in the consumer responses to products. For the most part these segmentations. Psychologists interested in the basic science of perception have introduced newer concepts of segmentation. [A short digression is appropriate here. There may be individuals who like strong flavored products. emerging from product test results (along with emerging from basic psychophysics) is forming the foundation of new approaches for consumer research and applied product development. brand usage. reveals that there are often two. based upon the sensory-liking patterns. etc.g. as a sensory attribute increases. lighter colored products. That is. have the same values. Jacobs and Lazar 1985). Some of these segmentations also may be based upon responses to a battery of questions. and the like. Segmentation based upon the pattern of liking ratings. When applied to product test results (more than 6 products must be evaluated by each respondent) this segmentation. peaks (at some optimal sensory level).Marketers are the first to recognize that consumers differ from each other.. into different brand usage patterns (e. users versus non-users). The fundamental idea behind sensory-based segmentation begins with the previously suggested relation . It is important to keep in mind that the segmentation is based upon the sensory level at which a person’s liking rating peaks. and by key subgroups. could introduce artifacts. three or more groups in the population with distinct sensory preference patterns. heavy Vs light users of Brand Z. Traditionally..

Munoz. The geometrical space may comprise one. Note that each segment is represented in each country. Product mapping is done typically with disconnected products. but occasionally with prototypes (so called category appraisal. The left panel shows how overall liking for respondents from each of five countries (1-5) is driven by perceived bitterness. Brand managers and product developers use these brand maps as heuristics to understand how the consumer perceives the products as a set of interchangeable (or noninterchangeable) items. two. . The locations of the product in the map are usually based upon the sensory properties of the products. with products located close together being sensorially similar to each other. aptly named because the approach considers the full range of products currently in a category. three or many more dimensions (although it is difficult to visualize a space of more than three dimensions). this time using data from sensory segments obtained from the five countries. The right panel shows the same liking-bitterness relation. Chambers and Hummer 1996). without regard to the connectedness of their underlying formulation. albeit in different proportions.Figure 2 Sensory segmentation – results from a five-country study of coffee. usually those current in the product category. Product Mapping and Creating A Product Model From the Map At an even higher level of analysis the matrix of product x attribute rating is used to map products into a geometrical space (see Figure 3).

[The models are not just linear equations. image rating. [The map coordinates. sensory rating] with the independent variables being the factor scores. based upon a principal components of the sensory attributes. and then identify unfilled holes in the product category. . the map locates the products in a space as the heuristic to visualize. At the end of the day the researcher is now able to identify a location in the map. are parsimonious and statistically independent so that no violence is done to the data]. when the objective is to understand the categories. such as liking.When the coordinates of the brand map are factor scores (from factor analysis). with the outcome being at least two dimensions (or factors). but also quadratic equations]. the map serves double duty.. the dimensions) act as independent variables in a regression model. and estimate the full sensory profile and liking rating corresponding to the location in the map. but preferably 10 or more). First. Such an approach using mapping and modeling (with factor scores as independent variables. and all rating attributes as dependent variables) typically would occur early in the research process. As a consequence the researcher builds a set of equations (one per attribute. Figure 3 Example of mapping a product category (ointment). with the size proportional to liking. This type of analysis can only occur if the researcher tests multiple products (at least six. Second. the coordinates of the map (viz. being factor scores. discover drivers of liking. Holes in the category are locations without any products. Products are circles.

consumer liking. and another short step from modeling (or equation building) to optimizing (viz. and allowing the researcher to optimize liking or any other variable. subject to explicit constraints (viz. Table 7 . in ingredients or in process conditions). yield. features under the developer’s control). This is called R-R or response-response analysis.. and by obtaining reactions (e. It is a short step from the experimental design + consumer reaction to modeling.. Table 7 shows two experimental designs – one a screening design (allowing the researcher to investigate many different variables. sensory attributes and even image attributes). Hunter and Hunter 1978. More recently. Plackett and Burman 1946). By varying the independent variables (viz. Table 8 shows results for the optimization of a commercial pizza which varied on six formula variables. These are called experimental designs (Box.. one or more attributes of the product. the developer can link the variables under direct operational control to what the consumer responds. Furthermore. these experimental designs lend themselves to statistical modeling by regression. however. statisticians developed systematic arrays of variables with the property that the variables were independent of each other. Product developers use these designs to create arrays of products which comprise known variations (e. This is called S-R or stimulus-response analysis. cost and other attributes. because the optimization model uses relations between two dependent or response variables (viz.. and the other a more complex experimental design (called a Box Behnken design)..g. because the optimization model uses the relation between variables under the developer’s control and consumer rated attributes. each at two levels).g. liking is a rated attribute. The Box Behnken or response surface design enables the researcher to create a quadratic model relating formulation to liking. such as a sensory attribute or even cost of goods must remain within specified constraints). or the highest liking constrained to a given cost. etc. the formulation must stay within the range tested) and subject to implicit constraints (viz. as are sensory attributes). cost of goods. Khuri and Cornell 1987.. Several decades ago.). identifying the combination of the variables under design that produces the highest liking. attention has focused on the variables under the developer’s control (rather than on the sensory-liking relation). optimization was done using the relation between liking and sensory attributes (Schutz 1983)..Product Optimization Originally.

The left system (11 variables. It is used to identify optimal levels of a process or an ingredient). PROD A B C D E F G H 1 1 1 0 1 1 1 0 0 2 1 0 1 1 1 0 0 0 3 0 1 1 1 0 0 0 1 4 1 1 1 0 0 0 1 0 5 1 1 0 0 0 1 0 1 6 1 0 0 0 1 0 1 1 7 0 0 0 1 0 1 1 0 8 0 0 1 0 1 1 0 1 9 0 1 0 1 1 0 1 1 10 1 0 1 1 0 1 1 1 11 0 1 1 0 1 1 1 0 12 0 0 0 0 0 0 0 0 I 0 1 0 1 1 0 1 1 1 0 0 0 J K 1 0 0 1 1 1 1 0 0 1 1 1 1 1 1 0 0 0 0 0 0 1 0 0 PROD 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 A 3 3 3 3 1 1 1 1 3 1 2 2 2 2 2 B C 3 3 3 1 1 3 1 1 3 3 3 1 1 3 1 1 2 2 2 2 3 2 1 2 2 3 2 1 2 2 . 0=old. The design is used to identify key drivers of a response. A-E in 15 runs. respectively). Each variable (A-K) appears at two options (1=new. 12 runs) is a Plackett-Burman screening design. low. on. The right system (3 variables. is a central composite design. A-K. high.Example of two experimental design systems commonly used in product development. off.

Amount Crust .Total Beef – Amount Cheese . etc.18 0.00 174 71 5 77 63 67 57 46 14 1.Table 8 Example of an optimization run (for pizza).30 2.31 0.99 0. Optimize Constraint # 1 Constraint # 2 Ingredient Pepperoni Mushroom Sausage Beef CheeseA CheeseB Ratings Cost Liking .27 1. with the developer creating a prototype. Data from optimization studies let the researcher quickly identify the most promising formulations.67 150 69 9 68 58 69 58 45 20 1. Six ingredients were varied systematically according to a central composite design. and then everyone returning to the drawing boards to give it one more try.07 0.00 2.Bready Crust .59 3.).00 286 76 65 76 61 63 54 45 21 Consumer research has taken 25 years to recognize the value of experimental design and product optimization.21 2.78 0.00 2.20 0. The product development process used to be far more circuitous and tortuous.00 2.00 2.02 0. directional ratings.Amount Cheese .Hardness Mushroom – Amount Total None None Total Total Total Cost<200 Cost<150 Cost<175 None None Cheese Flavor >63 1.Flavor Crust . the researcher testing it and providing feedback (e.. sensory ratings.g. liking ratings. The results led to a product model. With experimental design the .29 1.22 1.56 3.42 199 72 11 73 62 69 58 45 20 1.00 2.25 0. interrelating formulations (under the developer’s control) and ratings (acquired by the researcher).

and identifies either the optimum product (meeting consumer needs). Emerging Product Test Methods – Observation Up to now. the ethnographic approach to product testing has begun to gain favor. occasionally by product developers using small-scale R&D efforts (called Research Guidance Panels). Product optimization is occasionally run by market researchers. ethnographic product testing looks at how respondents use specific products. because the data generate both a database (for future use). We can trace ethnographic research back to a combination of two sciences – anthropology (which looks at the behavior of people in their environment). another cereal. Similarly. or the same cereal. We know that people stop eating foods when they are satiated. or the same product but in a different flavor. we have dealt exclusively with paper and pencil (or computer based) questionnaires. or to come up with new ideas. but it is a trend that bears watching.developer creates a set of products. The goal here is to identify the factors that make consumers stop consuming a product. provides one of the most promising technologies for improving the usefulness of product tests. satiety and boredom aspects of . long in coming to consumer research. Recently. and human factors (a branch of psychology that looks at the behavioral interaction of a person and a machine. Ethnographic research is not typically (but is sometimes) used to evaluate different products. so that eventually consumers habituate. and wanted to try something new). Right now the goal of ethnographic product research is to identify the ways that products are used. especially in R&D facilities of larger companies. as well as immediate. in one or two iterations). or shows that the optimum lies further out. concrete action (to solve the current problem in an actionable fashion). creates a design. Ethnographic research is only in its infancy for product evaluation. One area of evaluation that is coming into play. Like ethnographic methods. The process repeats. but usually the procedure generates an optimum fairly quickly (viz. especially for fast moving consumer goods of different sensory characteristics. We know that there is variety seeking among consumers (van Trijp 1995). Consumer researchers are now turning their attention to identifying the factors underlying this boredom. or get bored with the same soap fragrance. is known as ‘satiety evaluation’. Optimization procedures. the researcher evaluates these products. and long term ‘habituation’ (just got tired of the taste. or a person and a stimulus). either defined as short term ‘satiation’ (just can’t eat or drink any more). Ethnography looks at behavior in the normal environment. Emerging Product Test Methods – Satiety Evaluation Home use evaluation is supposed to capture the ‘real’ nuances of product use. and buy another fragrance. in order to improve them..

The Estimation of Volumetrics Many marketers use product-concept tests to estimate the amount of a product that they will sell. followed by reaction to the product (again simulating marketplace reality). This system assesses the appeal and market potential for products. Other factors considered are awareness (or some surrogate of advertising). but these new methods are part of the evolving growth of product research. based upon the reaction of consumers to both the concept (first … simulating the marketplace reality of advertising first). the market potential analysis calls into play many factors. researchers have developed databases and then models showing the relation between concept scores (before product usage). as well as estimate volume for Years 2 and 3. not just the product itself. Through years of experience. volumetric testing incorporates many more aspects of the marketing mix. These approaches. expanded by Bases (1999) use models that interrelate many of the marketing variables. product scores (during usage). Quite often. There is no search for patterns because these studies concentrate on one product. These market models differ from the developmental models previously discussed because the market models come into play at the end of the development process. respectively.product testing are just in their infancy. the widespread popularity of these test market simulators have led to their use as ‘de facto’ product-concept tests. In contrast to the methods described above. however. Part 3 – Package Design Research . The most widely used volumetric modeling is Bases® II (AC Nielsen Bases. one concept. 1999). Concept scores and product scores are only two of the variables. The jury is out as to what will be found. Unlike conventional product-concept tests that look only at the fit of the product and concept. Bases® II estimates Year 1 sales volume after launch. and the response from many individuals. and estimated volume. and distribution.

Part of this lack of research stems from the perception that the package is not particularly important – it is just a vehicle by which to enclose the product (especially in fast moving consumer goods). but by informing your intuition…When your job is to make a product sell itself at the point of sale. [We can compare this dearth of literature to the extremely high volume of literature on advertising testing. which perceive their work as artistic. As a consequence. the package is assuming renewed importance. because that research. commenting on the design of a beer package (1999). A modicum of . however. and researchers are rising to the occasion. and a number of well respected companies specializing in package research. once the bastion of the artist. Package design firms are also adopting a more holistic approach to the design process. research can actually help you create better design – not by dictating color and form.What does the package say about the beer inside?…How should your client position the product?…If you just look at the packages. Typically package designers create packages (either graphics and/or structures) with some objective in mind – such as reinforcing the brand. communicating new benefits. probed and revealed aspects of the package that were important for the designer]. Companies might launch new products. Until recently. [No design house wants consumers to design the package for them]. Today.. so it is immediately more important to be ‘right’]. product. [It should be acknowledged that these design firms always welcomed qualitative research. Package design firms. enhancing the chances that the product will be selected. much of the research done was either qualitative (focus groups) or none at all. are welcoming quantitative research. little attention had been given to the product package. positioning. “Beyond helping you manage clients. and involve their in-house package design professionals as well as outside design houses. and package. perhaps because many more dollars are spent on advertising.A Short Overview Package research is often the neglected child in consumer testing of price. In all of these objectives package testing must provide some idea as to whether or not the new package is successful. According to researchers at Cheskin (a brand identity and design firm). incapable of judgments. Part of the lack of research can be traced to the design houses. with increased competition in many categories. However. there is a paucity of research literature on package design. what would you say about the beer?”… What Should Package Testing Provide? Package testing serves a number of purposes. how do you know that your design will deliver the goods?. although there are scattered articles here and there. The most important purpose for package testing is to confirm or disconfirm the objectives of the package designer. like qualitative in advertising. Manufacturers are sponsoring short in-house courses on the importance of package design..

because in essence it reflects how they would intuitively go about obtaining feedback about their creations. backfire in several ways. Focus groups can. Some of the ratings may deal with interest in the package (or in the product. however. does not come out with hard and fast results. and even more importantly. informal manner. and can cover a great deal of territory in a warm. based upon exposure to the package). The focus group is private. they can be misused. Other ratings may deal . although consumers could reject the package). Focus groups provide the designer with a great deal of feedback. respondents often like to ‘play designer’. (Glass 1999). Focus Groups For many years. especially in the up-front developmental stage. and obtains a profile of ratings. This type of testing typically involves attitudinal measures. Focus groups. The attributes can vary substantially from one product category to another. the designer can see which ones ‘work’ and which do not. the conventional package evaluation consisted of focus groups. Consequently. In a package design focus group the respondent can verbalize reactions to package features. Respondents like to tell designers what to do – even if the designers aren’t really interested in following the sage advice. similar to the way that the researcher obtains product ratings. can identify key features that elicit interest. generally in a non-threatening. The key differences are that the researcher may obtain profiles of the ‘expectation’ of the product (based upon the package). and the listener can select specific phrases to suit his/her own purpose and agenda. non-judgmental manner (viz. since package testing often reflects on the creative abilities of the designer – who is both an artist and a business-responsive individual. not judged by another professional. can be very valuable in the creative process.sensitivity to the creative process is also in order. By presenting different possible packages. as well as ratings of the package itself. or fits the brand. and can talk about the coherence between the package itself and the brand. Second. Profiling Packages At the simplest level the researcher can determine whether or not the package is acceptable to the consumer. It should come as no wonder that this type of qualitative research has been welcomed by designers. In a focus group people say many different things. The researcher shows the package to the respondent. A sensitive package designer gets a great deal out of focus groups because the designer can see and hear how the consumer reacts to the packages.. First. properly conducted. there may develop an antagonism between client/respondent (with the client wanting to follow the respondent’s suggestions) and the designer (who has a basic artistic and business idea in mind and simply wants feedback).

If the research interest focuses on the ‘findability’ of the package on the shelf. In some demonstrations. The rationale behind this type of testing (called TScope or tachistoscope testing) is that the typical shopper spends relatively little time inspecting a store shelf. who wants to find out whether or not the package is ‘on target’. One behavioral measure is the ability to identify whether or not a package is actually on the shelf. The typical shelf is an extremely complex array of package design over which the consumer’s eye wanders.with one’s expectation of the product based upon the package. during a very short exposure time. rather than brand names.This type of information is extremely valuable to the package designer. and that it communicate an effective or good tasting product. The objective of package designers is to guide the consumer to the package.. The data from the evaluative package tests provide the diagnostics to demonstrate whether or not the package as designed actually lives up to the requirements set by the package design group at the client. If the respondent actually uses the product in the package. Tide® brand detergent.g. Is the eye drawn to the client’s particular package? Does the eye . Behavioral Measures – T-Scope and Speed of Recognition At a more behavioral level the package testing may involve behavioral measures. In order for a package to make its impact it is important that the package ‘jump off’ the shelf (visually). Typically the designer receives only the simplest of '‘briefs’ or project description. ease of storing the package. Eye Tracking and The Features of A Package Eye tracking is another method for testing the shelf. even if the brand name is incorrect.). then one can present single packages and determine the fastest shutter speed (viz. the least time needed) for the package to be correctly identified. ease of removing the product. Elliot Young (1999) has demonstrated that in these T-scope tasks the speed at which the stimulus information is available is often so low that the respondent has to use color and design cues. Eye tracking allows the researcher to identify the pattern that the follows. then the researcher can place the test package at different locations on the shelf and then parametrically assess the contribution of package design and location on the shelf as joint contributors to “findability”. such as the requirement that the package be upscale. with its concentric halos) are rapidly recognized. that it live up to the brand. ease of opening. then the researcher can obtain ratings of person-package interaction (ranging from ease of carrying or gripping the product. The researcher assumes that those packages that are perceived in the short interval permitted by the T-Scope have shelf presence. If the research interest focuses on the recognizability of the single package. etc. Well-known brands with distinctive package features (e. This action then constitutes the first step in selecting the product.

Of course attention must always be paid to the artistic coherence of the design. the researcher and package designer quickly discover what every design feature contributes to consumer interest. graphics. The same research paradigm (viz. Unlike concept testing using conjoint measurement. the eye tracking technology can show if and when. and for how long key messages are looked at (but not whether these are good or not). The respondent evaluates full combinations of these components (e.g. Young and his associates have amassed a variety of generalizations about where on the shelf the package should be. Optimizing Package Design – Can Art and Science Mix? One of the most intriguing developments is the advance of conjoint measurement (systematic stimulus variation) into package design. by incorporating high performing graphic elements. one finished package. The conjoint system enables the designer to rapidly understand the effectiveness of each concept element. based upon military research developments in the 1970’s. . The eye tracking technology traces the location of the line of site.). The best explanation for this is that package design is still an artistic endeavor. When done for the entire shelf the eye tracking technology can identify whether or not the package is even looked at. components. benefits. Some of the power of the conjoint approach applied to graphics design can be seen by inspecting Figure 4 (showing template.). tracks how the eye wanders. and by discarding poor performing elements. and a table of utilities). and for how long. etc.wander away from the package (thus diminishing the chances that the customer will select the product)? Young 1999 has also demonstrated the use of eye tracking technology. etc. and the concepts are full packages comprising these systematically varied features. and yet welcoming research insights on the other. there is far less in the literature (and in actual practice) using systematically varied package features. communication. From the responses of consumers to test packages (e. how many facings the package should have. however. etc. an over the counter medicine).. colors.g. etc. prices. created on the computer screen). different names. systematic variation of components) has begun to enter package research. explores the stimulus..g..g. and from the ratings researchers estimate the part-worth contribution of each component. and records what the eye sees. but this time the components are features of the package (e. resisting research on the one hand.. Based upon these analyses. When done for a single stimulus (e. Conjoint measurement comprises the experimental variation of components in a concept in order to understand the underlying dynamics of how the components perform. It shows how the consumer. when presented with either a shelf set or a single package.. and then to create new and better combinations.

A = categories or sets of features.Figure 4 Components of package design research. B = specific template used to embed graphic features. C = features in specific designed package. D = juice package corresponding to the specific set of designed features .

. package research development is still in its infancy. along with the role of the package as a secondary factor that hindered the full flowering of package research. For the most part. That unhappy situation is beginning to remedy itself as the competitive environment forces recognition of package research as a key new source of information. Package Design – An Overview In contrast to product (and to concept) development. Up to now it was the artistic aspects of package design. useful to maintain or to achieve competitive advantage in the marketplace . researchers are only beginning to recognize the importance of the “package” as a key factor.

Statistics For Experimenters. and B. J. New York: Appleton Century Crofts. Clapperton.References AC Nielsen Bases. 12: 273-280.. 89-95. “Progress Towards An International System of Beer Flavor Terminology. Civille.. “Sensation and Perception. Hunter (1978). Trumbull: Food and Nutrition Press. Boring.com). (1993).E.G. (1999).C.” in The History Of Experimental Psychology. Bases. Version 1. (Spring). Lyon.15 . . J. New York: John Wiley. Lancaster1. Sales Brochure (from the internet… www. Box.G. M.. (1999). “Five Bottles Of Beer. Gacula.” Master Brewers Association Of America Technical Journal. and S. (1929). FlavLex. West Conshohocken: copyright Softex. G. eds.P. E.. Cheskin.Dagliesh and M. Inc.” in Critique: The magazine of design graphic thinking. G.Hunter. (1995). C.C Meilgaard (1975).E. American Society for Testing and Materials. Design and Analysis of Sensory Optimization.. Jr.

S. Moskowitz.. Alan.R. Personal Communication Khuri. M. Cornell (1987). Response Surfaces. “Scales for Measuring Food Preference. Product Testing and Sensory Evaluation of Food: Marketing and R&D Approaches. Moskowitz. A. H.” in Encyclopedia of Food Science. Moskowitz. A.H. 244. M. H.Civille and B. (1981). Petersen and A. G. Westport: Food and Nutrition Press. H.R.Glass. (1985). Cosmetic Product Testing: A Modern Psychophysical Approach. 675-678. Boca Raton: CRC Press. “Relative importance of perceptual factors to consumer Acceptance: Linear versus quadratic analysis.A. New Directions In Product Testing And Sensory Analysis Of Food. The Nine American Lifestyles. Johnson. Westport: AVI. (1984). and J. Sensory Evaluation Techniques. New York: Marcel Dekker Inc.R. H. Moskowitz. Meiselman.T Carr (1987). Meilgaard. (1983). (1978).248.L. 46. (1983). ed. H..V. .R. New York: MacMillan.” Journal Of Food Science. Westport: Food And Nutrition Press.J. New York: Marcel Dekker Inc. Mitchell.

R. 8. “International Product Optimization: A Case History.R. Hummer (1996). Food Concepts and Products: Just In Time Development. H. Plackett..M. 1995b. B. H. H.R.R. 33. “The Design of Optimum Multifactorial Experiments. and. “Product Response Segmentation and the Analysis of Individual Differences in Liking. “Experts versus Consumers.R.” Biometrika. “The Contribution of Sensory Liking to Overall Liking: An Analysis of Six Food Categories. 8.. “Base Size in Product Testing: a Psychophysical Viewpoint and Analysis.R.” Journal of Sensory Studies. (1995). Moskowitz. (1997). and B.. Jacobs and N. Moskowitz. Pp. B.Moskowitz.L. 6. Lazar (1985). 305-325 . H. 11. 83-91. (1994). and J. Burman (1946). 261-294. Moskowitz. Munoz.” Food Quality and Preference.” Journal of Food Quality. 247-256. H.E.” Food Overall Quality and Preference. E. A. 9. Krieger (1995). R. Moskowitz. 19-35. 168-191.. Trumbull: Food And Nutrition Press.” Food Quality and Preference.” Journal of Sensory Studies. Chambers IV and S. 443-454. Krieger (1998).D. Moskowitz. H. “A Multifaceted Category Research Study: How to Understand a Product Category and its Consumer Responses. 11.

(1975).G.” Journal Of Food Science. “A Food Action Rating Scale for Measuring Food Acceptance. Schutz. 11. 9-14. 37.M.Peryam. New York: Council Of Better Business Bureaus.H.. in Food Acceptance ed. H. Pilgrim (1957). H. New York: John Wiley. Transcript Proceedings. “Hedonic Scale Method of Measuring Food Preferences. and the Advertising Research Foundation. Schutz..L. London: Elsevier. Stevens. Smithies. Neural And Social Prospects. Buchanan (1990). Beyond Preference: Appropriateness as a Measure of Contextual Acceptance of Food. “Multiple Regression Approach to Optimization. (1964). Substantiating A Taste Claim.” Food Technology. Psychophysics: An Introduction To Its Perceptual. 202-213.S. Stone.H.A.G. S. R. and J. 30. H. and B.R. NAD Workshop. Sensory Evaluation Practices. Thomson.” Food Technology. 47-62. Schutz. S. (1989). . and F. H. Sidel (1985). D.J.G. (1983). D. 115-134. New York: John Wiley.

. Wageningen. E.Theory With Applications.” in The Food Domain. Young. 273-286. L. (1927).M. Van Trijp. The Netherlands..C. “Variety-Seeking In Product Choice Behavior .L. 1. (1995). J. Personal Communication.Thurstone. 1999. Mansholt Studies. “A Law of Comparative Judgment.” Psychological Review 34.

Finally. in an affordable. and the globalization and democratization of concept development for small and large companies alike. Dr. Prior to that he graduated Queens College (New York). He has won numerous awards. administered through the Association of Chemoreception Scientists. in experimental psychology. Moskowitz has also developed and refined procedures which enable research to interrelate products.. published well over 180 articles. concepts. and abroad.000 prize for young scientists working in the psychophysics of taste and smell. MOSKOWITZ Howard Moskowitz is president and CEO of Moskowitz Jacobs Inc. they required a massive culture change in the 1975 business community.D. Moskowitz is simultaneously a renowned experimental psychologist in the field of psychophysics (the study of perception and its relation to physical stimuli). Phi Beta Kappa. Dr. with degrees in mathematics and psychology. experts and physical test instruments. In 1992 Dr. and an inventor of world class market research technology. and serves on the editorial board of major journals. Whereas these methods are standard and well accepted today. He has written/edited eleven books. In the 1980's his contributions in sensory analysis were extended to health and beauty aids.S. consumers. Army Natick Laboratories. has lectured in the U. integrated and accelerated development (DesignLab®)). . a firm he founded in 1981. Among his important contributions to market research is his 1975 introduction of psychophysical scaling and product optimization for consumer product development. transaction-oriented approach (IdeaMap®) Wizard). Moskowitz graduated Harvard University in 1969 with a Ph. in order to accomplish product optimization and reverse engineering. Moskowitz founded a $2. Dr. his research and technology developments have led to concept and package optimization (IdeaMap®).S. among them the Scientific Director's Gold Medal for outstanding research at the U.HOWARD R.

You're Reading a Free Preview

Download
scribd
/*********** DO NOT ALTER ANYTHING BELOW THIS LINE ! ************/ var s_code=s.t();if(s_code)document.write(s_code)//-->