You are on page 1of 18

Rough Draft: Please send comments

Expert System for Advertising Persuasiveness: Effectiveness of Strategy, Attention, and Persuasion
J. Scott Armstrong1 The Wharton School University of Pennsylvania Philadelphia, PA 19104-6371 E-mail: armstrong@wharton.upenn.edu Phone 215-898-5087; Fax 215-898-2534 August 2001 I used published empirical research and expert opinions about persuasion to develop a set of guidelines for advertising. The guidelines are conditioned on the objectives of the ad, the nature of the product, and the characteristics of the target market. The guidelines can serve as an aid for developing effective print, TV, or radio advertisements, and can be used for copy testing. ESAP, an expert system, calls for ratings of the key features of the ad. It shows which features of an advertisement reduce its effectiveness as well as those that improve it. The system allows an ad to be judged against various benchmarks, such as an earlier version of the ad or competitors ads. When we used the expert system to rate ads, inter-rater reliability was modest, suggesting that ratings should be done by a panel of trained experts. A test of concurrent validity showed that eight successful advertisements were rated highly by the expert system, while five unsuccessful advertisements received low ratings. In a test of predictive validity, the expert system correctly predicted the better advertisement in copy testing for attention on six of seven advertisement pairs, and for persuasion on two of three pairs. [Details about the expert system known as ESAP can be found at advertisingprinciples.com. This system is available for use in teaching and research, although explicit permission is required.].

Acknowledgements: Jesse Engle, David Krezmienski, Deborah Fox, and Joseph B. Fitzpatrick, III assisted in the design and programming of the expert system. Douglas Martin, Lisa Negron, and Rebecca Walden summarized much of the advertising research for the training manual. Brian Wansink provided advice at various stages of this project. Useful comments were received during presentations to the Weatherhead School of Management at Case Western Reserve University, Hakuhodo Advertising in Tokyo, and the U. of Arizona. Deborah Fox, Ronald Goodstein, John Rossiter, and Terry Shimp provided helpful comments. Dara Yang provided editorial assistance.

The trade of advertising is now so near to perfection that it is not easy to propose any improvement. Dr. Samuel Johnson, 1759 The time has come when advertising in some hands has reached the status of science. Claude C. Hopkins, 1923

Since Hopkins statement in 1923, science has confronted art in the development and evaluation of advertising copy. The consensus of those in advertising seems to be that art wins. This study renews the controversy by using an expert system to organize knowledge about persuasion. For well over half a century, researchers have been doing studies on how to use communications to persuade others (e.g., Abelson 1959 summarizes 139 publications, most published in the 1950s). Much of this research is specifically aimed at advertising. The findings relate to all aspects of advertising copy (content, illustrations, music, and so on). They yield advice on the conditions under which various procedures are most effective. With one exception (Burke, et al. 1990), I have been unable to find publications that systematically apply research-based findings to advertising copy. Perhaps this is due to the difficulty of summarizing the extensive research on persuasion. There is also the issue of how to structure this knowledge and link it to advertising effectiveness. Suitable validation materials are now available to test television advertising at a reasonable cost. In contrast, in the 1980s, it took nine years and $500,000 to collect just five pairs of commercials for the Advertising Research Foundations Copy Research Validity Project (Rossiter and Eagleson 1994, p. 22). Uses of Expert Systems in Advertising The Expert System for Advertising Persuasion (ESAP) is designed as an aid to developing persuasive advertisements. It provides a comprehensive set of recommendations for the development of an ad and it reports on the research to support these recommendations. ESAP can also be used as a copy testing procedure. This can be done at any stage in the development of the ad. ESAP can provide extensive diagnostic information for improving advertising copy. This is perhaps the major advantage it has over many existing copy testing procedures. Diagnostic procedures, such as weakness summaries, can help determine further research needs. In addition, this expert system can help to integrate the research done on the product and target market as well as any copy-testing research. Expert systems are expected to aid decision making because they structure knowledge and they use knowledge that is typically neglected. An expert system should be more useful for novices than for experts because novices have little to go on. 2

Structuring Knowledge Structured approaches have been found to be useful in other areas such as improving the creativity of groups (Valacich, Dennis and Connolly 1994), problem solving,2 and strategic planning (Armstrong 1982; Miller and Cardinal 1994). Studies in personnel psychology have also concluded that structured approaches are more accurate than unstructured ones; Meehl (1986, p. 372), after summarizing this research, wondered why people were so surprised at this result. After all, he said, when you check out at a supermarket, you dont eyeball the heap of purchases and say to the clerk, Well it looks to me as if it is $17.00 worth; what do you think? The clerk adds it up. A common way to structure a complex problem is to break it into pieces, solve each piece, then reconstruct the problem (Raiffa 1968). Although the decomposition procedure has been subjected to only a modest amount of validation, the evidence is encouraging. Decomposition is expected to be useful when the problem is complex and where the knowledge about each of the parts is equal to or better than the knowledge about the whole problem (MacGregor 2001). I expect this to apply to the analysis of advertising copy. An expert system offers the possibility to integrate domain knowledge with knowledge about how to create effective advertising copy. Such an integration an be done through condition-action statements: e.g., Given the conditions x, y, and z, use procedure A. Unfortunately, much of the research fails to adequately describe the conditions that relate to the effectiveness of various guidelines. This applies to some large-scale statistical studies that related success measures to aspects such as humor or the use of illustrations. As a result, many of the conditions are provided by expert judgment.

Neglect of Research The typical advertiser rarely uses research on persuasion. This view is supported by Helgesens survey of 40 respondents from the ten largest advertising agencies in Norway (1994). Rothenberg (1994) illustrates this with his in-depth description of the pitches for the U. S. automobile account for Subaru. Partly, the problem may be due to time and cost constraints: one must learn about many research findings, then tailor them to the situation.

Here is a problem that has been used to show that sometimes people cannot solve problems intuitively, but they can solve it by using a structured approach. Assume that you are planning to visit people that you met recently. They have two children and you know that one is a boy. What is the probability that the other is a girl? (Using an unstructured approach, most people conclude that the answer is 50%. It isnt.) 3

Moreover, many advertisers believe that the structured use of research findings hampers creativity. In addition, some do not believe that academic research can be generalized to their product or market. Others are concerned that advertisers might become too dependent on structure. Fischhoff, Slovic and Lichtenstein (1978) found that an overdependence on structure is a real concern with the use of models; users may mechanically apply rules and ignore useful knowledge that they possess. Thus, if a mistake is made in the rules, it is likely to be followed blindly. A similar finidng was obtained for an expert system for lawyers (Dijkstra 1995). In fact, this had been our experience with users of this expert system for advertising. In one case, a mistake rendered an important section of the system to be inoperative. None of the 30 students realized that their inputs to this section had no effect on the ratings of ads. Research findings are expected to be especially useful when they conflict with advertisers current practice and beliefs and even more so if the findings have been supported by many studies. Conflicts between experts opinion and the research findings occur for guidelines such as the use of two-sided arguments, length of copy, humor, and comparative advertising. For example, Rogers and Williams (1989) summarized empirical findings on the proper use of comparative ads and showed how these conflicted with opinions of advertising practitioners. Development of an Expert System for Advertising Copy The development of ESAP involved using empirical findings about persuasion and supplementing this with expertise. It also involved the use of principles for the effective use of expert systems. Finally, there is a need to see whether the system performs as expected. These components of the development and testing process are illustrated in Exhibit 1. ESAP consists of a set of guidelines. A guideline is a conditioned-action statement; it tells what action is appropriate under specified conditions. Users of the expert system should independently describe their knowledge about the objectives, product, and target market. They should then try to reach agreement on these conditions prior to rating the features of the ad. If the raters do nor agree on the conditions, they are likely to reach different conclusions. For example, if one rater thought that the product had no relative advantage, she might use the distraction approach. Her ratings would differ from those for a rater who decided the product had a relative advantage.

Exhibit 1 Developing and Testing ESAP

Literature Review

Empirical Findings on Persuasion

Survey Authors

Expert Interviews

Expertise about Persuasion

Feedback Other expert systems

ESAP

Protocol Feedback from users Peer review

Test Outputs

Literature Review

Technologies for Expert Systems

This expert system assumes that the purpose of the advertisement is to elicit action from the target audience, either in the short- or long-run. Typically, this applies to changing sales response, but advertising can also be used to maintain behavior or to avoid change, such as resisting the temptation to switch to a competitors product. It could also involve other behavioral changes, such as donating ones time to a charitable cause. Prior to making ratings, interviews should be conducted with experts who have knowledge about the target market. This would reveal what the target market needs to know or what appeals might be effective. Ideally, this information would be supported by target market research. Experts such as manufacturers or retailers can provide information about the product. For example, they should describe what type of product information should be useful to the target market. Other conditions are also relevant. These include such things as the type of media that will be used, the strength of the prior evidence on the guidelines, and the existing target market research and copy testing information. Once the conditions are described, various actions are proposed. An example of a guideline is provided by Pechmans (1992) study of one-sided versus two-sided arguments. Using this and other information, I prepared the following guidelines for two-sided arguments: Two-sided arguments (positive and negative) are desirable given the following conditions: it is important to enhance believability, a long-term attitude change is sought, the opposing arguments are likely to become apparent to the buyer, the target market is intelligent, 5

the product is high-involvement for this target market, a long-exposure ad will be used. Given these conditions, one should use a two-sided ad having the following features: the positive argument should be regarded as very strong by the target market (based on target market research), the negative argument should support the primary, positive features (such as sure, it costs more, but quality always costs more), the negative argument should precede the positive, the negative argument should not be not trivial to the target market, and the two-sided ad should be delivered by a relevant celebrity. For example, a high price would be an important negative feature; the Bose corporation addresses this with the following print ad headline for their Wave table radio: Why you should pay $349 for this radio. The guidelines are grouped into three areas: First, what strategy can be used? Second, how can the ad get the attention of those in the target market? And third, what tactics can be used? The strategy section includes four subsections: information, emotion, influence, and mere exposure. The information subsection assumes that one of the primary bases for change is information about the product, price, distribution, benefits to the consumer, and unique selling proposition. The emotional subsection examines ways that emotional appeals can lead to action. The influence subsection, which draws heavily upon the six principles proposed by Cialdini (1993), is concerned with strategies for inducing people to change. For example, the social proof guideline suggests that those in the target market might be more likely to accept a recommended course of action if they believe that others who are similar to them are doing so. In addition, reasoned action means that an advertisement explicitly asks the target market to take an action and it provides a reason for doing so. Some research suggests that it is helpful to advertise even if the advertisement lacks information or emotional appeal. This is referred to as the mere exposure effect. Even if an advertisement was free of information, emotion, and influence strategies, it is still expected to have some, though small, impact. Given the strategy, one can then address how to best execute it to gain attention. The attention section contains some guidelines relevant to all media. Other guidelines are specific to the media, that is, whether the advertisement is for print or TV.

Assuming that the ad captures attention, what tactics can be used to try to convince the target market to take action? The persuasion section presents these tactical guidelines. The guidelines and features are summarized in Exhibit 2 (next page), which shows the number of features used to assess each guideline. The expert system currently consists of 24 guidelines for print advertisements; to assess these guidelines, ratings may be required for as many as 288 features. For TV, the corresponding figures are 68 guidelines and 256 features. Implementation of the Guidelines To quantify the extent to which a guideline has been successfully implemented, trained people rate each feature of the ad. Typically, the scale ranges from 0 (strongly disagree) to 10 (strongly agree). A zero means that it would have no effect, and positive ratings mean it would increase sales, with 10 implying a very strong increase. The range is restricted for some features, such as those needing only yes or no answers. Features that can have only negative effects, such as, a benefit that may be considered as a negative to a significant number of target consumers, are rated separately. Raters can also use a not applicable response for features that do not apply. Finally, some features might be relevant, but the rater may not have the necessary information; the rater can flag these with a ?. To obtain an overall measure of effectiveness, it is necessary to summarize the feature ratings. Equal weights has proven to be a successful summarizing strategy in a variety of previous forecasting and estimation problems (see Armstrong 2001). Accordingly, equal weights are used to combine feature ratings within a guideline. Summaries across guidelines are obtained by weighting each guideline. Thus, a guideline with four features rated would count twice as much as one with one feature rated. This scheme is used to reflect that more features represent more knowledge about the ad, and the reliability of a guidelines rating is expected to improve when it has more features. Summarizing is accomplished using additive elements. MacGregor (2001) shows that additive breakdowns are safer than multiplicative decomposition.

Exhibit 2: Effectiveness Components (parentheses indicate the number of guidelines) Strategy 1. Information 1. Product (3) 2. Price (5) 3. Distribution (3) 4. Benefits (7) 5. USP (6) 2. Emotion 1. Flow (1) 2. Integration (1) 3. Positive (1) 4. Intense (1) 5. Irritation (1) 6. Frustration (1) 7. Impact (1) 8. Action (1) 9. Satisfaction (1) 3. Influence 1. Liking (8) 2. Commitment (1) 3. Social proof (5) 4. Authority (5) 5. Scarcity (7) 6. Reciprocation (3) 7. Reasoned action (4) 8. Indirect (1) 4. Mere Exposure 1. Eye-catching (1) 2. Novel (1) 3. Brand is prominent (1) 4. Frequent (1)

5. Attention 5.1. All Media 1. Target market (3) 2. News/helpful information (3) 3. Humor (7) 4. Fear (6) 5. Classic attention-getting devices (4) 6. Editorial format (1) 7. Contrast (3) 8. Campaign consistency (5) 9. Tagline (3) 5.2. Print 1. Illustration (16) 2. Artistic quality (2) 3. Rhetorical device (2) 4. Technical quality (1) 5. Headline (16) 6. Long copy (1) 7. Readability (2) 8. Body text structure (3) 9. Body text writing (10) 10. Typeface (6) 11. Color (4) 12. Layout (9) 5.3. TV 1. Central message (7) 2. Key scenes (11) 3. Artistic quality (2) 4. Technical quality (4) 5. Story appeal (4) 6. Picture and voice (3) 7. Color (3) 8. Sound effects/music (5)

6. Persuasion 1. Rational arguments (9) 2. Framing (3) 3. Explicit conclusions (1) 4. Objective claims (2) 5. Problem/solution (5) 6. Demonstration (2) 7. Product attributes (3) 8. Relevance (2) 9. Primacy/recency (3) 10. Simplicity/clarity (6) 11. Repetition (2) 12. Tone (9) 13. Mnemonics (2) 14. Testimonial (5) 15. Spokesperson (17) 16. Customer involvement (3) 17. Company name (3) 18. Two-sided argument (6) 19. Comparison to competition (11) 20. Refutation (6) 21. Focus attention (1) 22. Call for action (6) 23. Response mode (1) 24. Word of mouth (3).

Identifying Research Needs ESAP is intended to support, not replace the advertiser. One way it does this is to identify areas that are in need of further research. When the rater believes a feature to be relevant, but lacks knowledge for making a rating, the feature is flagged by the rater with a ?. These guidelines are candidates for further research. The ESAP program provides a summary of these. Another way to identify uncertainty is to examine guidelines or features where the raters differ substantially. These items are flagged in the reliability section of the ESAP program. The expert system can be used as a place to summarize research on the product or target market. This information can then be examined when relevant to a given feature. The various sources of uncertainty are also summarized to show what information might be sought by target market studies, copy testing, or panels of domain experts. This can help to identify areas for further research. The weak aspects of an advertisement can also be summarized. By examining the features that contributed to the low score, one can obtain ideas for revisions. Effectiveness Scores Effectiveness scores reflect how well the advertisement copy does relative to the maximum that it could score. The scores can range from 0 to 100%. Scores below 20 percent would represent ads with low effectiveness. Scores above 60 percent were expected to be exceptionally effective. Effectiveness scores do not allow one to make statements about the potential return on investments of an ad. To address that, information is needed on the cost of reaching potential customers and on the potential gain from a successful advertisement. Thus, a low effectiveness advertisement that reaches a large number of prospective buyers at a low cost, and which offers a high profit per sale, can have a high return on investment. Comparisons are meaningful to the extent that the benchmark corresponds with the tested advertisement with respect to objectives, product, and market. Thus, the ideal comparison is with an earlier version of the same advertisement. Comparisons with the previous advertisements for this product are also meaningful. Another comparison, though a bit less meaningful, is between an advertisement for a brand and that for a leading competitive brand.

Testing the Expert System Testing of the expert system first involved assessing the validity of the guidelines followed by examining the functioning of the system as a whole, which included its reliability, concurrent validity, and predictive validity. Validity of the Guidelines In some cases I relied upon expert advice such as that by Ogilvy (1985), Antin (1993), and Roman and Mass (1992). In other cases, I consulted with colleagues or used my own judgment. Where possible, the expert system draws upon research by others and upon expertise. Summaries of research studies that were provided by Rossiter and Percy (1987, 1997) were especially useful. Also useful were Batra, Myers, and Aaker (1996), Pratkanis and Aronson (1992), Burke, et al. (1990). These sources led to other sources such as Stewart and Furse (1986), which were used extensively. In all, the expert system draws directly upon approximately 250 papers and books.3 These publications represent a much larger number of studies, because some of them are review papers, such as Barrys (1993) paper on comparative advertising, which reviewed 36 studies. Much judgment was required to translate findings from the studies into guidelines. In some cases these guidelines were suggested by others, such as Ogilvy or by Rossiter and Percy. In most cases they have been made by me. To help ensure that the interpretations match those that would be made by the original authors of the research, I sent guidelines to 95 of the researchers whose work is included in the system. They were asked to review the rules and conditions related to their work and to confirm that my interpretation was correct, or, if not, to suggest revisions. Typically the procedure called for them to read less than one page. Replies were received from 25 of the researchers. In almost all cases they agreed with the ESAP formulation of guidelines based on their research. In some cases they suggested revisions and alerted me to additional relevant research. Reliability of the Expert System In addition to looking at the overall score, reliability can be examined by feature, guideline, or sections. One would expect increasing reliability as aggregation of the ratings increases. To date, inter-rater reliability has been modest. Ratings of the Macintosh 1984 television commercial by the author and a research assistant differed on average by 0.8 points on the 65 features that were rated. (This was based on the version of ESAP as of April 1996.) For the same two raters, ratings of 68 features for a Reebok UBU commercial differed by 1.5 points on average.
3

The bibliography is provided in the ESAP Manual. The address is advertisingprinciples.com.. 10

Here the effectiveness scores differed substantially (15% versus 37%). However, substantial improvements have been made since these reliability tests were conducted. The failure to understand the research findings has been a major source of differences among raters. To address this issue, a 175-page training manual has been prepared for raters. It provides an expanded description of the guidelines, definitions, examples, and evidence. The latter includes both supporting and conflicting evidence. The ESAP is designed for use by trained experts. The training is not a simple task. ESAP has been used on twelve advertising courses to date and it seems that, initially, most users have failed to use the manual. They rush through the ratings without understanding the research findings, and sometimes blindly apply the system with no concern for the underlying conditions. In one class consisting of seven teams who reported extensive use of the system, none of them noticed that their version of the system contained a programming error such that their ratings on persuasion had no effect upon the overall effectiveness scores. A new version of the program is being developed to help ensure that the raters examine the conditions prior to making ratings. Even with trained experts, raters judgments of features will differ because of fatigue, biases, imposition of personal tastes, disagreements about conditions, differences of opinion about the value of a feature, cultural norms, and simply because the task is complex. Generalizing from research on the use of expert opinions, I recommend that at least five raters be used for each advertisement (Hogarth 1978; Libby and Blashfield 1978; Ashton 1986). Given the problems with reliability, the same raters should be used when making comparisons of a set of ads. Concurrent Validity Concurrent validity involves the effectiveness of a procedure to predict advertising effectiveness in cases that have already occurred. One of the major threats to this approach is that raters may be influenced by knowledge of what actually happened. However, in a study on personnel selection, concurrent validity produced conclusions that were similar to those from predictive validity (Barrett, Phillips, and Alexander 1981). To test concurrent validity, I sought ads that were highly successful in terms of their effectiveness on sales, as well as ads that are widely regarded as failures by many experts. The purpose was to test whether the successful ads would have higher ESAP effectiveness scores. To find these ads, I conducted literature searches and consulted with various advertising experts. I then chose eight successful and five unsuccessful ads for the test. No ads that were selected for this test were later discarded.

11

Empirical evidence was not available on the actual effectiveness of these 13 classic ads. The intent was merely to find ads where there was a consensus among experts. In some cases, there were reports from the company that they believed that the advertisement was successful (or unsuccessful). In the case of Benetton, retailers in Germany sued the company to force them to stop using the ads, because, they claimed, the ads were reducing sales. Preliminary results are presented here on classic ads by using ratings by the author and research assistants (see Exhibit 3). These ratings were made in late 1997 and early 1998 when we used a different scale. Different raters were used for the different ads, so this created a problem in making comparisons. These findings are only suggestive because the raters had some familiarity with the ads prior to this evaluation. In addition, the ratings were made without knowledge of the target market or product research. The results are promising. The eight successful advertisements had an average rating of 44%, while the five unsuccessful ads averaged 13%. All of the unsuccessful advertisements were rated lower than the poorest of the successful advertisements. As another approach to construct validity, I plan to compare ads produced by experts (professional advertisers) with revisions of the ads by novices (undergraduate students). The issue here is whether an outside panel would think that the revisions by novices produce superior advertisements. Exhibit 3 Relative Effectiveness Scores for Classic Ads: Preliminary Ratings* (R represents the number of raters) Successful Ads ESAP Effect TV Commercials Ansett Airlines (Puss in boot) Macintosh 1984 (Big brother) Waste Management (Driver) IKEA (Where are my socks?) Wendys (Wheres the beef) Print Ads Bose ($349 Wave radio) Rolls Royce (Clock noise) Volkswagen (Think small) Average Effectiveness * Uses the original -2 to +5 scale 55 48 45 35 34 64 42 31 44 1 2 2 2 2 4 3 2 Burger King (Herb the nerd) Reebock (UBU nonconformist) Infiniti (Birds) Subaru (SVX - What to drive) 7 11 18 23 2 2 2 2 R Unsuccessful Ads ESAP Effect R

Benetton (Croatian blood)

13

12

Acceptance A decision support system should be acceptable to the user. This was the only criterion used in Burke, et. al (1990). They said that the agency that collaborated on its development liked the system. Our initial impressions were that the system poses too heavy a cognitive strain on the user. Some users seem incapable of making reasonable ratings, even after training. Many users get frustrated in the early stages of learning. To address this issue, we reorganized the flow of information, removed irrelevant information from the rating screen, used key words for the features and conditions, and improved the descriptions. Additional efforts will involve keeping track of codings of conditions, making it easier to exa mine conditions, making the prior research more complete and more accessible, and reducing the physical effort (mouse travel distance). We have recently introduced a number of changes to simplify the process. Predictive Validity To assess the predictive value of an expert systems approach, I planned to develop an initial system, test it, improve it, test it again, improve it, and so forth. The emphasis is on testing the use of expert systems for advertising, not on whether a particular formulation of an expert system proved useful. While the primary criterion was whether an expert system can predict the effectiveness of an advertisement, I also wanted to test the ability of an expert system to provide useful copy testing scores. If so, it could serve as a supplemental low-cost copy testing procedure. To assess the value of our expert system for copy testing, I used ten pairs of print ads from Which Ad Pulled Best (Burton and Purvis 1993, 7th edition). The advertisements were selected by using a systematic stratified sampling plan with a randomly selected starting point. Five pairs were selected from the 40 consumer ads, and five from the ten industrial ads. Three pairs were excluded because the ads in each pair were aimed at different target markets or they advertised products that were substantially different from one another (which would introduce another source of variation in the copy testing score). In addition, I excluded one pair because they were image ads. (Although the ESAP can be used to evaluate image ads, it is better at judging ads that contain information.) Finally, one advertisement was excluded because the copy testing sample did not match the target market. The excluded pairs were replaced by ones that followed them.4

The seven ads using the Gallup test were tapes (Seal-tite and Duct tales), hotels (Best Western and Hyatt), stoves (GE and Whirlpool), Car stereos (Clarion and Sony), speakers (Technics and Infinity), paint (Glidden Wildlife and Glidden Corrosion), rental cars (Budget and Hertz). The Readex ads for persuasion included the above mentioned ads for tapes and paint plus two Smith ads for adhesives 13

The five consumer print advertisements had been tested by Gallup and Robinsons proved name registration (intrusiveness). This measure is similar to the ESAP Attention score. Two of the industrial product ads had been tested by Readex for remembered seeing. This criterion should also relate to Attention. Information about the criterion was withheld until after the ratings had been completed. Coding for these seven ads was done by two marketing professors and one novice. (This test used the May 1995 version of ESAP.) The ESAP correctly predicted the advertisement that received the most attention for six of the seven pairs. Three other ads were tested for their Readex found of interest score, a measure of persuasion that was expected to relate to the ESAP Effectiveness score. The expert system identified the higher scoring ad for two of three ads. In all, the expert system was correct on 8 of 10 predictions (p < .05 using a one-tail test). This test used the May 1995 version of ESAP. Reliability was a problem for these tests. For six of these ads, the attention scores as coded by two marketing professors had a correlation coefficient for guideline ratings of .12, a disappointing result. However, the correlation between an author and a novice rater was .86, which was encouraging. These tests used the May 1995 version of ESAP. Limitations The primary purpose of the ESAP is to enhance creativity in the search for effective advertising features. This was not tested. Although checklists are widely assumed to be effective for complex tasks, there is the danger that people might depend too heavily upon the system. Expert systems seem complex because of the need to specify the conditions under which each guideline operates. As further research is done, findings can be integrated into the expert system, which may make the system even more complex. Complexity poses little risk when predicting with basically additive models, such as the ESAP. This assumes that reliable ratings can be obtained. As noted, this has been a serious problem due to the high cognitive demands on raters when they have to consider multiple features that might each be subject to, say, eight conditions. Cultural variations are not considered because we lack evidence that techniques for convincing people vary by culture. However, the expectation is that the rating would be done by those who know the culture and that this would allow for relevant factors to be considered. For example, the layout rules for print ads have been prepared assuming that readers read from left to right.

14

Much of the research on persuasion has been done in studies where the subjects attention was assured, so it is sometimes difficult to generalize to a given advertising situation. Another limitation is that many of the studies did not directly study behavioral change, focusing instead on things such as whether the reader noted, read most, or liked the ad. Discussion Expert systems offer a number of benefits. First, they allow for a comprehensive summary of the knowledge on persuasion. While this can also be done in book format, an electronic version allows for easy updating and for ease in searching. This expert system provides the information as conditioned rule. The rules for this expert system are fully disclosed and are available to other researchers.5 As noted in the research, the expert system indicates areas in persuasion that are most in need of further research. For example, I was unable to find research related to the artistic and technical quality of the filming of a TV commercial. These are probably the most expensive elements in the production of a commercial. Expert systems can serve as an aid to education. They can provide knowledge when it is needed. For example, in deciding whether to use humor, the user can go directly to a summary of the research and references are available for the original sources. As a result, the expert system is ideally suited for project-oriented courses or for on-the-job learning. A negative result is that learners may approach the task in a passive and mechanistic manner. In effect, the design is contrary to that used for experiential exercises, so do the disadvantages of ESAP outweigh the benefits? Glover, Prawitt & Spilker (1997) provide some evidence that expert systems have deficiencies as a tool for learning. So to use ESAP as a learning aid one would presumably need to develop some prior experiential exercises. Conclusions An expert system can make research findings more accessible to advertisers by summarizing them as conditioned action statements and by making the information available as needed. This information should help advertisers to improve their effectiveness. The expert system is designed for developing and improving advertisements. Given their knowledge, advertisers should be able to do a better job if also given access to a decision support system that organizes knowledge on the product, target market, and copy testing, and provides a structured access to prior knowledge. ESAP can provide a copy testing score by showing the extent to which an advertisement complies with research Interested researchers and educators can download the latest version of the ESAP program and manual from the Internet (advertisingprinciples.com). The program is not available for commercial purposes. 15
5

findings. Such scores build upon target market and product research. The primary advantages of this approach are low costs and the fact that the system provides information on how to improve the advertisements. ESAP also allows for reasoned comparison of alternative ads. This study attempts to assess the reliability and predictive validity of an expert system for advertising persuasion. The preliminary results have been favorable. Further research will be conducted to determine whether this expert system can improve the effectiveness of ads and enhance the effectiveness of copy-testing.

16

References
Abelson, Herbert I. (1959), Persuasion. New York, NY: Springer Publishing. Armstrong, J. Scott (1982), The value of formal planning for strategic decisions: Review of empirical research, Strategic Management Journal, 3, 197-211. Armstrong, J. Scott (2001), Combining forecasts, in J. S. Armstrong (ed.), Principles of Forecasting. Norwell, MA: Kluwer Academic Publishers, 417-439. Ashton, Robert H. (1986), Combining the judgments of experts: How many and which ones? Organizational Behavior and Human Decision Processes, 39, 405-414. Barrett, Gerald V., Phillips, J. S. and Alexander, R. A. (1981), Concurrent and predictive validity designs: A critical reanalysis, Journal of Applied Psychology, 66, 1-6. Batra, Rajeev, Myers, J. G. and Aaker, D. A. (1996), Advertising Management. Upper Saddle River, N. J.: Prentice Hall. Blair, Margaret H. (1987), An empirical investigation of advertising wearin and wearout, Journal of Advertising Research, (Dec.-Jan), 45-50. Bonner, Libby and Nelson (1996) Burke, Raymond R., Rangaswamy, A., Wind, J., and Eliashberg, J. (1990), A knowledge-based system for advertising design, Marketing Science, 9, 212-229. There is a working paper version that contains additional studies. Burton, Philip W. and Purvis, S. L., (1993). Which Ad Pulled Best. Lincoln, IL:NTC Business Books. Cialdini, Robert B. (1993), Influence: The Psychology of Persuasion. New York: William Morrow. Collopy, Fred and Armstrong, J. S. (1992), Rule-based forecasting: Development and validation of an expert systems approach to combining time series extrapolations, Management Science, 38, 575-582. Dean, James W. and Mark P. Sharfman (1996), Does decision process matter? A study of strategic decisionmaking effectiveness, Academy of Management Journal, 39, 368-396. Dijkstra, J. J. (1995), The influence of an expert system on the users view: How to fool a lawyer, New review of Applied Expert Systems, 1, 123-138. Fischhoff, Baruch, Slovic, P. and Lichtenstein, S. (1978) Fault trees: Sensitivity of estimated failure probabilities to problem representation, Journal of Experimental Psychology: Human Perception and Performance, 4, 330-344. Glover, Steven M., Pravill, D. F. and Spilker, B.C. (1997), The influence of decision aids on user behaviors: Implications of knowledge acquisition and inappropriate reliance, Organizational Behavior and Human Decision Processes, 72, 232-255. Helgesen, T. (1994), Advertising awards and advertising agency performance criteria, Journal of Advertising Research, (July/August), 43-53. Hogarth, Robin (1978), A note on aggregating opinions, Organizational Behavior and Human Performance, 21, 40-46. Libby, Robert and Blashfield, R. K. (1978), Performance of a composite as a function of the number of judges, Organizational Behavior and Human Performance, 21, 121-129. 17

Lodish, Leonard, et al. (1995), How T.V. advertising works: A meta-analysis of 389 real world split cable T.V. advertising experiments, Journal of Marketing Research, 32, 125-139. MacGregor, Donald (2001), Decomposition for judgmental forecasting and estimation, in J. S. Armstrong (ed.) Principles of Forecasting. Norwell, MA: Kluwer Academic Publishers, 107-123. Meehl, Paul E. (1986), Causes and effects of my disturbing little book, Journal of Personality Assessment, 50, 370-375. Miller, C. Chet and Cardinal, L. B. (1994), Strategic planning and firm performance: A synthesis of more than two decades of research, Academy of Management Journal, 37, 1649-1665. Ogilvy, David, (1985), Ogilvy on Advertising. New York: Vintage Books. Pechman, Cornelia (1992), Predicting when two-sided ads will be more effective than one-sided ads: The role of correlational and correspondent influences, Journal of Marketing Research, 29, 441-453. Pratkanis, Anthony and Aronson, E. (1992), Age of Propaganda: The Everyday Use and Abuse of Persuasion. New York: W. H. Freeman. Raiffa, Howard (1968), Decision Analysis. Princeton, NJ.: Princeton University Press Rogers, John C. and Williams, Terrell G. (1989), Comparative advertising effectiveness: Practitioners perceptions versus academic research findings, Journal of Advertising Research, Oct/Nov., 22-37. Rossiter, John R. and Eagleson, G. (1994), Conclusions from the ARFs copy research validity project, Journal of Advertising Research, (May-June), 19-29. Rossiter, John R. and Percy, Larry (1987), Advertising and Promotion Management. New York: McGraw Hill. Rossiter, John R. and Percy, Larry (1997), Advertising Communications and Promotion Management. New York: McGraw Hill. Rothenberg, Randall (1994), Where the Suckers Moon: An Advertising Story. New York: Alfred Knopf. Scott, Cliff, Klein, D. M., and Bryant, J. (1990), Consumer response to humor in advertising; A series of field studies using behavioral observation, Journal of Consumer Research, 16, 498-501. Stewart, David W. and Furse, D. H. (1986), Effective Television Advertising: A Study of 1000 Commercials. Lexington Books. Valacich, Joseph S., Dennis, A. R., and Connolly, T. (1994), Idea generation in computer-based groups: A new ending to an old story, Organizational Behavior and Human Decision Processes, 57, 448-467.

18