The Selection and Training of Judges For Discrimination Testing

THE SELECTION AND TRAINING OF JUDGES FOR DISCRIMINATION TESTING LOUIS P. BRESSAN and RONALD W. BEHLING CHISTORICALLY, food companies have developed discrimination testing panels for the purpose of determining whether or not a perceived difference exists between two samples, These panels have been used to minimize unneces- Sary (and costly) extensive hedonic or acceptance testing of ‘Samples which in fact were not really perceived as different by the average consumer. ‘A review of the literature reveals that most of the discrimination panels have been developed for testing flavor, aroma, and texture of foods (Martin, 1973; Wittes fand ‘Turk, 1968, Szezesniak et al, 1975.) Some five years ‘ago, we developed discrimination panels for testing of fragrances in household products. These panels have routinely given reliable and valid results. Based on this Successful experience, we decided to expand this program to include discrimination testing of polish gloss and softness of towels. The illustration on the facing page shows the game plan used in selecting and training judges for the three types Of sensory tests. : General procedures for the selection and training of qualified judges for discrimination testing were available {ASTM, 1968), but because of the specific nature of our Intended application of the discrimination testing techniques, a number of additional factors had to be considered. These included: © The source of potential panel members. © The development of appropriate sample sets for the training and screening of judges in the desired sensory modes. tion panel with higher sensitivity than the average consumer. ‘The philosophy of selecting judges with superior diserim- ination abilities is justified on an economic basis. That is, if fa panel of judges screened for a particular acuity, and therefore capable of finding small differences, does not find ‘a significant difference between products on the dimension in question, itis highly unlikely that the average consumer will do so. Furthermore, any change in the consumers’ attitude toward the product based on the sensory dimen- Sion measured would also be unlikely. LOCATING PANELISTS A number of approaches to the problem of locating a source of potential panelists were considered. The previous: Iv identified olfactory discrimination panel, experienced in both duo-tn and tnangle testing, was not readily available because of prior testing commitments, A second alternative was to serven curtent acceptance panel members; however, availability #as more limited, The third option was to serven members of our R&D Biology Research Group. This group was. physw alls separated from the main sensory {esting facity, an # was recognized that their remoteness rom the main testing facility would limit their availabilit for testing. We therefore decided to conduct the sensory testing at the Rewarch Center. Research Center managers and supervisors expresed interest and encouraged their Stall to participate, While it was recognized that testing Conditions at the center were less than ideal, modification of The authors are with $C Johnson & Son, Inc, 1525 Howe St. Racine, WI 53403 62. FOOD TECHNOLOGY NOVEMBER 1977 Table 1—FRAGRANCE TEST SAMPLE SET Table 2-GLOSS TEST SAMPLE SET 2 Ae 2 3 oa lG ‘ haae 12 ont ‘a eaie sample preparation and presentation were made to over- come the physical limitations. As an additional safeguard, one of the professionals at the center was asked to observe land coordinate the testing program. ‘Since the testing location was physically remote from the main sensory testing facility, observation and interaction ‘with the panelists on a daily basis were minimal. The tests were administered in a conference room at the center, where it was not possible to physically separate the individual testing stations to minimize distractions. In addition, the lighting of the room varied during the day and from day to day. Existing supplemental lighting did _not adequately compensate for the change in natural lighting th. curred DESIGNING TESTS ‘The triangle test was used for fragrance discrimination Previous training and discrimination panel selection using standardized fragrance oils had been sucessful. This same sample set was used for the present testing. This sample set ‘was developed specifically for this purpose by a well-known fragrance company and consisted of eight paits of fragrance ail (Table 1) Tov amber totles halted with he oils, ‘The triangle test was also used for polish evaluation, 1ce fatigue would be minimized in gloss perception. A. range of gloss differences was achieved by using @ selected set (Table 2) of eight combinations of polishes and substrates possessing low., medium-, and high-gloss finishes ‘The three wood panels of each ‘type of finish had been matched as closely’ as possible. The set of polishes applied to these panels included three aerosol products and one 43‘SENSORY TEST GAME PLAN shows the steps taken by judges in quaitving for the three types of tests discussed in this article polishing cloth. A sufficient quantity of each product from a Single lot was available for the entire test. The same person, always applied the products 18 hr prior to. testing, Substrates were cleaned between apphations with naph- tha, ‘The duo-trio test was selected for tactile evaluation. ‘Samples representing different levels of softness (Table 3) were produced by treating Cannon Ecstasy Bath Towels (86% cotton, 14% polyester) with various laundry products. ‘The products involved in the six pairs of treatments were two detergents, one fabric softener normally added to the washing machine, and one fabric softener recommended for use in the dryer. A laundry “load” of six towels was treated with these products, using conventional washing machines and electric clothes dryers, A standardized procedure was used for all treatments, TRAINING JUDGES At the first training session, the triangle testing of fragrances was discussed, Everyone in the group was given a triangle test of two synthetic and one natural strawberry fragrance oils on fragrance blotter strips. The difference between the samples Was intentionally obvious to illustrate the mechanics of the triangle test. The questionnaire was identical to that used during the testing phase. A question and answer period followed, wherein a number of general guidelines were provided, Judges were requested to make their choice in a relatively short period of time (approximately 2 min), and to trust their first impressions if they eet en ; Ee Bette ey i Ei Ese Set SAAS er ee ier Ee Se ; eh See He : Hb Bet ioe 1 Eee eee : $m SE, Eo te Ee $s Ee ety on sti io eee a 1S Soho taney 16 Shoe Ate een were uncertain about their choice. Everyone was asked to take four such fragrance tests during the working day; however, time intervals between each test were left to the discretion of each judge. A second training session was held after the fragrance testing to resolve any problems. In addition, the diserimina, tion testing of gloss’ and towel softness. was described Examples from each test were provided. Judges were instructed to identify the wood panel with the different 74 NOVEMBER 1977—FOOD TECHNOLOGY 62Discrimination Testing . . . instructed to dip each blotter into the designated sample, then sniff the three blotters, decide which one was different, ~ and indicate their choice on the questionnaire. Two orders ‘of presentation were used for each sample pair. Four triangle tests for gloss differences were presented at a single test session, Judges were asked to choose the panel hhaving the different gloss and indicate their choice on the questionnaire. Two orders of presentation were used for each sample pair. During the same test session, four duo- trio tests of towel softness were administered. Because of a concern that panelists might experience some tactile fa- concen eR ee the gloss differences, and then complete the remaining two softness tests. For each test, the treated towels were folded three times and placed in cardboard cartons. Judges could feel the towels through an opening in the side, but visual cues. were minimized. ‘The control sample was alwa presented first. All four orders of sample presentation (A, B Ys A; B, Avs A; A, B vs B: and B, A vs B) were used for the first two sample pairs, to give judges an opportunity to experience sample pairs which Were thought to be quite different while learning the mechanics of the test procedure. ‘After this, two orders of presentation (A, B vs A and B, A vs B) were used for the remaining sample pairs, thought to have smaller degrees of difference. They were told to record gf the questionnaire the number ofthe sample that was the “same” as the control MEASURING RELIABILITY While the original olfactory discrimination panel had produced reliable results, the newly established panels had hot been tested for reliability. Two ways of testing for reliability have been published. The first, and perhaps the Tost obvious, would be to replicate the tests (ASTM, 1968), however, this approach is not always practical in the ‘business environment. The second would be to have judges express the degree of difference they perceive between Samples (Hail et al., 1959). One disadvantage of this approach is that the limited experience of the judges might lessen their confideice in_ discriminating as well as measuring the degree of difference. This issue is further complicated by the variability of the judges’ attitudes ‘within any given time period. These attitudinal changes can be caused by a combination of factors such as professional pride, daily work pressures, etc. Iti, therefore, important to Monitor a judge's performance (a) relative to the panel as a whole, and (b) relative to his or her past performance. ‘That is, when the panel as a whole has found a significant difference, the individual panel member's performance can be evaluated, In addition, if the panel as a whole has difficulty with a particular order of presentation, the individuals performance can be viewed relative to this result. While we have found this procedure to be invaluable in determining the reliability of our panel, it does not obviate the need for replicate testing or the retesting of individual judges periodically with @ set of “standard” samples. DETERMINING VALIDITY ‘The procedure outlined in the ASTM Manual 434 (ASTM, 1968) for the "Selection of General Discrimination Panels” states that "Each test should represent a difference such tat the group of candidates as a whole will establish & Significant difference but the percent of correct responses should not go above 80 percent.” This procedure would appear to identify only those panel members who had little ‘or no acuity {or the particular sensory mode being tested. Gur approach has differed; we attempted to identify as ‘qualified judges those panel members with better-than- Average acuity for the sensory mode tested, This was accomplished by utilizing a sample set for each sensory mode based on tests that had decreasing differences Between sample pairs. ‘Those sample pairs where minimal differences could be perceived were intentionally included to identify those panel members with better-than-average acuity. 66 FOOD TECHNOLOGY NOVEMBER 1977 In Figure 2, the percent of correct judgments for each fragrance test in. the sample set has been plotted in descending order. The range of difficulty for the sample set is evident. As can be seen in comparing fragrance tests 3 and 4, order of presentation also contributes to the degree of dificulty, However, the question of validity of each test in the sample set must be answered. By drawing in the line at the percent of correct answers by chance (33% for the {rangle est), it i apparenc that all tests received at least 5 correct judgments, If a test had fallen below this line, its validity would certainly be questionable. The upper limit should be approximately 80%, since a higher percent of correct answers would not result in discriminating tests ‘The percent of correct judgments for each test in the loss sample set has been’plotted in Figure 3. ‘Again. the Finge of difficulty for the sample set is evident. Since (ext rests that yield less than 33% correct responses in & triangle test should not be considered valid, tests 1, 10, and {5 must be considered invalid. Lighting conditions in the testing area were less than ideal and could have accounted forthe low percent of “correct” answers on these three test ‘These three tests were dropped from the sample set before judges’ performance was determined, For the softness test, the range of difficulty of the sample set (Fig. 4) is not as pronounced as for the fragrance and polish gloss tests, This can be attributed, in large part, to the fact that one-half of the sample set was based on all four orders of each of two pairs of towel treatments. In addition, it is difficult to develop a series of softness treatments on towels that will result in tests with a range of difficulty, Since in a duo-trio test, 50% of the correct answers can be accounted for by chance, tests scoring below this lev should not be considered valid. In Figure 4, one test falls just below this line. Since it is recognized that this test ‘could obtain a higher percent correct judgment in a subse- quent testing session, it was retained in the sample set CHOOSING QUALIFIED JUDGES After the sample sets have been evaluated for range of difficulty and validity, the issue of generalizing to the consumer can be reviewed (Schutz, 1971). Ideally, the group of judges that are screened should be representative of the Consume. Since this random selection was not practi was necessary to analyze performance for any strengt weaknesses. In Figure 5, the panel's perfo fragrance test has a slight negative skew. Despite this skewness, the panel appears to have a performance distribu tion that would be expected from a larger and/or more randomly selected panel. ‘The distribution of performance of the panel for the gloss test (Fig. 6) also approximates a normal distribution, The distribution for the softness test (Fig. 7) is upscale of the distributions for the fragrance and gloss tests. This was expected, since a duo-trio test was used, ‘wherein 50% of the judgments come about by chance. More important, the performance of the panel for softness approximates a normal distribution. Having determined those tests in each sample set that are valid and that the panel of judges in total has acuity that is representative of the consumer, we then determined the ‘qualified judges for each sensory mode. For the fragrance test (Fig. 8), 10 judges out of 32 (or 32%) scored more th: 68% correct. Those who scored in the “very good” and “excellent” categories are considered qualified judges. For the gloss test (Fig. 9), 11 out of 31 judges (35%) scored more than 68% correct. The softness test (Fig. 10) proved to be more difficult—only 5 out of 31 judges (16%) scored more than 87% correct. This level of performance is considered necessary for judges in the “very good” category for a duo; trio test. Since only 5 judges would not normally bé adequate for on-going testing, the number can be increased by including the better-performing half of the “good category. This results in 12 out of 31 judges (39%) scoring more than 81% correct, ~ ast experience in screening potential judges for discrim ination testing of fragrances had shown that between one- 25‘well as an overall test panel performance “card.” To show our appreciation, a small celebration was also promised CONDUCTING TESTS With the exception of the control in the duo-trio tests of softness, all test samples were labeled with (random) three recorded on each questionnaire, in addition to the respon- dent's assigned number and the corresponding test number. ‘The three questionnaires are shown in Figure 1 ‘The presentation of the fragrance oil samples was modified so that the triangle test could be self-administered. Odorless blotter strips labeled with, the sample numbers eae ciaben fa allcases sample code numbers were were attached. to each questionnaire. Judges were Fig. 5—Fragrance tests (92 896 Fig, Fragrance test Fig. 6—Gloss tests (31 juolges) Fig. 7—Sottness tests (31 su39es! Fig. 9—Gloss tests Fig. 10—Soliness tests ‘rigs. 5-7 DISTRIBUTION OF PERFORMANCE Figs. 8-10—PANEL PERFORMANCE 36 NOVEMBER 1977—FOOD TECHNOLOGY 65Discrimination Testing . . . sloss. They were encouraged to pick up the individual panels and hold them in a position to maximize lighting, effects on the panel surface. ‘The duo-trio test procedure for towel softness was also liscussed during the second session, Considerable emphasis was placed on the fact that the judges were to identify the sample that felt the same as the “control.” The control was we available only at the start, but if necessary the two coded samples could be handled more than once to arrive at thelr decision. The presentation of the towels in cardboard cartons to minimize visual cues was also described. followed by, question and answer period \ this point, the group was advised that after testing aS completed, they would receive individual “report cat © Fragrance triangle test Softness duo-trio test Fig. 2—Fragrance mangle tests mcqumcgymoc Fig. 3--Gloss triangle tests Fig 1 QUESTIONNAIRES used for wrangle testing of Vagrance ond gloss and duo-tio testing of softness 64 FOOD TECHNOLOGY-NOVEMBER 1977 Figs D4RANGE OF DIFFICULTY who ne posts corer rower by eho) oe

The Selection and Training of Judges For Discrimination Testing

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

The Selection and Training of Judges For Discrimination Testing

Uploaded by

Copyright:

Available Formats

You might also like