You are on page 1of 25

Model based decision making in a context sensitive advertisement system

A component for practical application of Nomadic Information

Mark Mooij 0146196 + 31 (0) 6 28 256 026 mark@ai-applied.nl

Thesis Bachelor Artificial Intelligence

Universiteit van Amsterdam Faculteit Natuurwetenschappen, Wiskunde en Informatica

Science Park 904 Postbus 94216 1090GE Amsterdam The Netherlands

Version: 25-06-2009 Under supervision of: Dr. Frank Nack Naam begeleider/eerste beoordelaar > handtekening: >, < Naam tweede beoordelaar >, handtekening: ,

Abstract
In this paper a decision making process is described displaying commercial videos best fitting the current social context at a specific temporal and geographical location. The decision making process will be based on metadata representations for commercial videos. The representations are generated using expert knowledge. Based on the knowledge representations combined with real-time information the most appropriate commercial video is picked. During the presentation of the commercial video the interest of the appropriate audience is measured. This way commercials are better fitted to the intended target domain improving narrowcasting systems.

Keywords
Advertisement, narrowcasting, commercials, video, metadata, knowledge representation, decision making, decision model, social context modeling, interest measurement.

1

Acknowledgements
I would like to deeply thank the various people who, during this endeavour lasted, provided me with useful assistance. Without their care and consideration, this paper would likely not have matured. First I would like to thank my supervisor Dr. Frank Nack from the University of Amsterdam, who guided me through the process of preparing the necessary proposals, helped with the preparations, gave me advice and supported me while writing this paper. Second I would like to thank Bruno Jakic for conducting a parallel research in computer vision concerning age, gender and pose recognition. Combining the two researches resulted in the possibility to perform a live test validating this research. Again Bruno Jakic, and Zora Jurriens are thanked for our combined effort in outlining the first ideas for the system. Furthermore all the people who helped with the user tests are deeply thanked, without their help there would be no results and benchmark for the system. Their patience and voluntary exposure to more than a hour of commercial videos was deeply appreciated. Finally prof. dr. Bert Bredeweg and Wouter Beek assistance is much appreciated providing a support group about conducting scientific research, setting up scientific papers and providing guidance throughout the planning and conducting of this research.

2

Table Of Contents
Abstract ................................................................................................................................................... 1 Keywords ................................................................................................................................................ 1 Acknowledgements ................................................................................................................................. 2 Table Of Contents ................................................................................................................................... 3 1. Introduction ......................................................................................................................................... 4 1.1 Research Goals ........................................................................................................................ 4

2. Theoretical Background ...................................................................................................................... 6 2.1 Advertisement ............................................................................................................................... 6 2.2 Available systems .......................................................................................................................... 6 2.3 Requirements ................................................................................................................................. 7 2.4 Knowledge Representation............................................................................................................ 7 2.5 Multimedia Metadata .................................................................................................................... 8 2.6 Multimedia Metadata Standards .................................................................................................... 9 2.6.1 Dublin Core ............................................................................................................................ 9 2.6.2 The Semantic Web ............................................................................................................... 10 2.6.3 MPEG framework ................................................................................................................ 11 2.6.4 TV-Anytime ......................................................................................................................... 11 2.7 Conclusion ................................................................................................................................... 12 3. Implementation.................................................................................................................................. 13 3.1 General use .................................................................................................................................. 13 3.2 Interface description .................................................................................................................... 13 3.3 Contextual Model ........................................................................................................................ 14 3.4 Retention model .......................................................................................................................... 16 3.5 History model .............................................................................................................................. 17 3.6 Interest Measurement .................................................................................................................. 17 4 Experiments ........................................................................................................................................ 19 4.1 User test ....................................................................................................................................... 19 4.2 Results ......................................................................................................................................... 21 5 Conclusion .......................................................................................................................................... 22 5.1 Future work ................................................................................................................................. 22 6 References .......................................................................................................................................... 24

3

1. Introduction
In current markets, most consumer products are intended for a specific audience. In order to increase the awareness of the specific audiences with respect to the intended product, manufacturers resort to advertising. The ideal situation is one where the appropriate advertisement is displayed solely to an audience of potential clients for that product. For this ideal extra knowledge is needed about the social context and the content of the commercials to be presented. The problem is that most advertisements in public spaces cannot collect these types of knowledge. Take a bus stop as an example. In today’s world the advertisement here would be a large poster or billboard where the presented poster had been chosen based on a prior investigation of the potential audience at this type of space (statistical models and market surveys). Any change of the audience could not be adapted towards in this type of environment. A better solution would be the following: Imagine a bus stop with a billboard and an additionally camera, which is discretely hidden. The camera captures images of the local audience. A system is extracting a certain set of information from these images in certain short time intervals. The gathered information forms the basis of an inference mechanism that generates a social context model. Based upon the context model and knowledge about the available commercials the system selects advertisements to suit the local audience. In this paper, such a system is presented. The proposed system increases the amount of implicit interaction between advertising system and its target audience. Explicit interaction with the system is reserved for the advertisers, who feed the system with their expert knowledge about the commercials and their target group specifications. The improvements proposed in this paper are most applicable to the field of narrowcasting as defined by Flera et al [1] (p. 379): “Narrowcasting involves aiming media messages at specific segments of the public defined by values, preferences, or demographic attributes. Also called niche marketing or target marketing. Narrowcasting is based on the idea that mass audiences do not exist.” The improvement of narrowcasting systems in this paper consist of adding expert knowledge to the commercials so that a decision process can be started to match expert knowledge about commercials to a social context model available for the system based on data measured in real time. Real-time information is available for the system through interpreting data from research done by Jakic [2]. In this report we focus on the interpretation of this data and the processes needed to infer from this interpretation which type of advertisement needs to be presented to best fit the established social context.

1.1 Research Goals
The research reported in this paper addresses two related topics. First, we investigate what type of expert knowledge needs to be added to commercial videos in the form of metadata representations to best capture all the elements of interest for a possible target audience that advertisers might have in mind. In a literature research prior to this paper by Mooij [3], most of those elements have already been identified. These findings were used, from a computer vision point of view, to research if computer vision is able to provide these elements [2].

4

Second, this paper focuses on the decision making process in order to pick the commercial best fitting the current social context. In order to match the social context to the commercial a knowledge representation about the commercial videos is designed. Added to the social context model are models for retention and for history. The retention representation captures information about how often a commercial video has been shown in a certain time frame. This is done to prevent repetition and wearout of the commercial. The history model contains information from which the system can base decisions upon if no real-time information about the social context is available. This way a likely decision can still be made. Added to a social, retention and history model the interest from the audience to the presented media is measured. The measured interest data can be of interest for the advertisers. This can be reported as feedback whether intentions of the media campaign are met. To test the system, a real environment test was set up to test whether the commercial videos displayed to the audience are indeed commercials appealing to the individuals in the audience. The questions for the research to be conducted in this paper are: • What is a workable way to represent media in context of interest for advertisers? • How can we make a decision what commercial video to pick based upon a model for social context, retention and history? • Does the system’s commercial video presentation have a greater appeal to an audience in comparison to displaying random commercials? Together with the research conducted by Jakic [2] a broader research question can be formulated: • Can a system be made, that can act as a autonomous, direct, context sensitive information provider to individuals in an audience to a satisfactory level?

5

2. Theoretical Background
In this chapter the theoretical background is discussed for the adaptive advertisement system proposed in this report. In section 2.1 some constraints from the advertisement context for the system are set. Section 2.2 describes related work in advertisement and audience measurement, resulting in a set of requirements for the system discussed in section 2.3. The expert knowledge representation aspects are discussed in section 2.4. A description of multimedia metadata is given in section 2.5. Section 2.5 results in a discussion about different methods that are used to annotated information and knowledge for video in general on a metadata level in section 2.6. In 2.7 is concluded and outlined which representation is going to be used.

2.1 Advertisement
There are some constraints from within the advertisement domain to keep in mind designing the system. A negative impact on the credibility of advertisers is possible if ads unrelated to the user’s interest are exhibited. This negative impact could demise market share as described by Bhargava et al [4] (p.117-123). This makes investments in the quality of ad recommendation systems important to minimize the possibility of exhibiting ads unrelated to the users interests. By investing in ad systems, information gatekeepers are investing in the maintenance of their credibility and in the reinforcement of a positive user attitude towards the advertisers and their ads as described by Whang et al [5] (p.1143-1148). With fixed content and fixed location of advertising, people usually adapt to the presence of physical advertisements and filter them out of their vision, which is called the adblindness, described by Chang [6]. This proves the usefulness of a social context based system. Context-based advertising is seen mainly in online marketing, where keyword matching is the preferred technique. Our approach takes the next step: matching context not by keyword but by social context, preventing ad-blindness by models for social context and retention. Previous research by Mooij [3] found that advertisers main interest lies in the discrimination of the different gender groups, different age groups and the measuring of the amount of interest the targeted audience displays.

2.2 Available systems
There are some systems available that are related to this research. First Cognovision’s AIM View [7] and Quividi’s VidiReports [8] are pure measuring systems designed to measure the number of people, length of impressions, and demographic of audiences. Apart from measuring nothing is actively done with the information gathered by these systems apart from harvesting the data. The Wututu Person Counter [9] is also focused on counting: the number of viewers, the watching time and the number of viewers’ simulations. Trumedia’s iCapture [10] goes a step further by giving shoppers information as they stand in front of a particular product. None of these initiatives have published scientific documentation about how their results are achieved or validated. Probably this is done because of commercial reasons, keeping possible competition off the market. From the absence of this information it can be concluded that our system is the first to take a scientific approach towards social context-based advertisement validating results by experimentation. New in this paper is the modeling of information, combining the models with knowledge of commercials and drawing conclusions based on this process.

6

2.3 Requirements
From the previous paragraphs a list of requirements can be set. The proposed system has to be able to provide information about: 1. The social context in terms of gender 2. The social context in terms of age 3. The amount of interest the targeted audience displays. The information about gender and age (requirement 1 and 2) will be combined in a social context model. To provide the system with information to make it possible to match the commercial videos with this social context model additional knowledge has to be added to the commercials. The extra knowledge will be added by advertisement experts. The requirements described in this paragraph are implemented in Section 3.

2.4 Knowledge Representation
The classical approach towards constructing a model-based system is a rule-based “expert system”. Harmelen et al in [11] (p.396) describe in detail that this approach has its limitations in terms of flexibility in real world situations where experience is obtained in a specific context. In this paper a model for retention and history are added as experience. These models can’t be defined with rules but are generated through gathering data during the runtime of the system. Harmelen et al in [11] (p.397) define that the architectural principle of knowledge-based systems should have a separation and independence of the domain-specific and the task-specific knowledge. In our context the domainspecific knowledge is the knowledge about the appliance from different age and gender categories to the commercial videos. Another domain-specific knowledge is the interest the users show for a displayed commercial video. Knowledge about gender and age appliance is added to the commercial video representation by a commercial expert. The interest is measured and interpreted from real-time information. The task-specific knowledge is gathered through combination of the models for social context, retention, and history to best fit a displayed commercial to the current social context of the audience. To test and measure the system’s performance Harmelen et al in [11] (p. 438) suggest some standards. A test has, in order to gain discriminating information, specify: • • How to stimulate the system. What to observe of the system’s response to this stimulus.

In context to our system the stimulation would be a certain audience. The response is a presentation of a commercial video appropriate for that specific audience. The knowledge on how to stimulate the system is acquired through a social context model based on gathered information. The knowledge about the commercial videos added by an advertisement expert makes it possible to match these two. Knowledge is also gathered during the presentation of the video, data is gathered about the viewing angles the targeted audience displays. To assign a quantitative measurement for the displayed interest a scoring mechanism has been developed discussed in section 3.6. To validate the system a live user test was set up. The user test setup is laid out in section 4.

7

2.5 Multimedia Metadata
A commercial is a typical multimedia asset. We want to better fit commercials to their intended target domain by achieving more interactivity with the social environment where the media is displayed. To do this a higher level semantic meaning in the form of expert knowledge has to be added to the commercial, such as intended audience, repetition time, etc. To assign this type of information multimedia metadata can be used. Metadata doesn’t have one single clear definition. Instead a number of definitions are used. Wikipedia [12] defines metadata as: “Data about other data, of any sort in any media. An item of metadata may describe an individual datum, or content item, or a collection of data including multiple content items and hierarchical levels, for example a database schema.” In this way metadata says something about the actual data, becoming data itself. The information metadata provides can be a used in a wide variety of applications. The association between data and metadata is used to say something useful about the associated data in a certain context as described by Nack [13]: “Multimedia metadata is structured, encoded data that describes content and representation characteristics of information-bearing multimedia entities to facilitate the automatic or semiautomatic identification, discovery, assessment and management of the described entities, as well as their generation, manipulation and distribution.” Apart from basic information it’s also possible to enrich data with new higher-level semantic meaning. For creating metadata annotations are used. An annotation is an addition made to media giving extra information about the original subject. For the annotations of the commercial videos in this paper tags are used to model existing commercials for a certain target audience in terms of gender and age, giving higher-level semantic meaning to appliance to the target domain. Tags can also be used to access stored information gathered during the presentation of commercial videos. This information can be interpreted by the history model giving a higher-level semantic meaning relevant for the interest of the targeted audience. This paper concentrates on the modeling of social context and the adding of expert knowledge to commercial videos. By adding expert knowledge commercial videos can be better fitted to the target domain. By modeling the information gathered by Jakic in [2] an interpretation can be made for how a target group reacts on a displayed video. Disinterested targets, for example, will turn away from the screen. This could, for example, not be done with audio where the interest of a target can’t be measured by orientation. Because the displayed interest of targeted audience can change during the presentation a higher level semantic meaning is given for displayed interest by a targeted audience, the way this is done is described in section 3.6. A number of standards are developed for multimedia metadata. The information about the commercial video is extracted from tags containing expert knowledge from advertisers. Semantic features will be risky to use, different advertisers could use different key-words to describe the same intended concept. For the process to be as general as possible different advertisers can pick tags and scale the distinct target domain through a strict schema. In order to define a strict conceptual scheme an ontology could be used, but an ontology would hold all relevant entities for the domain. The domain this paper is focusing on has clear boundaries for the relevant entities, not complex enough to justify the

8

development of an ontology. The higher-level semantic meanings will be processed and ranked by the system. The structure and identification will be mainly defined by the unique id and the temporal runtime of the video. Management will be done by the system. A backlog will be generated for feedback for editors and modeling of environments.

2.6 Multimedia Metadata Standards
Having introduced the concept of Metadata it is now time to analyze existing multimedia metadata standards. The most fundamental set of annotation rules is the Dublin Core Set [14]. This standard is discussed in the first sub-paragraph. Next the two most commonly known approaches toward machine process-able and semantic-based content description will be discussed. These are The Semantic Web [15] by the W3C (World Wide Web Consortium) [16] discussed in section 2.6.2, and the Multimedia Content Description Interface (MPEG) [17] by the ISO (International Organization for Standardization) [18] discussed in section 2.6.3. In section 2.6.4 the metadata model for TV-Anytime is reviewed. 2.6.1 Dublin Core The most fundamental schema for annotation is the Dublin Core. Dublin Core is one of the standards that is widely used to describe digital materials such as video, sound, text and composite media like web pages. Dublin Core makes cross-domain information available as a resource description. With this simple set of conventions it makes it easier to find things online (as well as offline). Simple Dublin Core consists of fifteen elements: Title, Creator, Subject, Description, Publisher, Contributor, Date, Type, Format, Identifier, Source, Language, Relation, Coverage and Rights. The standard allows the elements to be optional and makes it possible for elements to be repeated. To formulate the elements in Dublin Core XML is used. For the representation used in this paper the following Dublin Core Elements are relevant: • • • • • • Title: The commercial has to have a title. This can simply be a brand name. Creator: We want to return the generated interest measurements to the editors and or creators of the advertisement as feedback. A creator should be known as a contact. Date: A date should be known by this system so the interest can be measured over time, the decline in interest (wear-out) of the commercial can be interesting for the editors/creators. Format: The file format of the video file should be known for playback compatibility. Identifier: A unique identifier is used for achieving. Language: If a specific language is used in the commercial we want the system to be able to use only videos in a language that is relevant to the location.

The Subject and Description elements are used in an extended way which is elaborated in section 3.3. The other elements are not going to be used because they are irrelevant for the advertisement context in which the representations are used. A Simple Dublin Core example implemented using XML for a commercial video may look like the example in Fragment 1.

9

<rdf:Description> <dc:title>Coca Cola Light Commercial 1</dc:title> <dc:creator>Coca Cola</dc:creator> <dc:date>10-05-2009</dc:date> <dc:format>mpg</dc:format> <dc:identifier>1</dc:identifier> <dc:language>NL</dc:language> </rdf:Description>
Fragment 1: A Dublin Core Example (based on example from dublincore.org) [19]

The advantages of using Dublin Core are its simplicity, the potential it offers for interoperability especially online through XML and the flexibility it provides for extensions to the basic elements if needed. There are however some disadvantages to Dublin Core. There are no cataloging rules to determine how data has to be entered in fields, this makes it possible for users to use whatever rules they see fit. The main disadvantage, the absence of cataloging rules can be dealt with by defining rules for the resulting Dublin Core XML file. In this research XML schema is used to consolidate the restraints set by the different models. Van Ossenbruggen et al [20] explained that creating a high quality metadata model comes with a number of problems. In our context a knowledge representation standard is needed for which the possibilities can be restricted to keep subjectivity as little as possible without losing effectiveness for the advertiser. The representation has to be easy to make to reduce effort making the representations. 2.6.2 The Semantic Web Another important standard for designing multimedia metadata is the Semantic Web [15]. Any metaproduction process extends an existing semantic network: It provides additional production information and describes a different use context for existing material. This is what we want to do with commercials. Commercials videos are widely used, in this paper additional expert knowledge is used to make the video more context relevant. To create a media-aware semantic network there are complex requirements. These requirements are discussed in by van Ossenbruggen et al in [20]. In our context the degree of complexity as for the Semantic Web is unnecessary. A linking mechanism is used to link the contextual model to the knowledge representation of the commercial video, but this is done by a simpler schema. The usage of the representations for commercial videos outside the semantic web context and no need to link the representation to the semantic web justify the simple approach. By using simple metadata schemas trust can be achieved, meaning that the system is able to find and use the most relevant information. This is type of trust the Semantic Web has as its goal, as can be seen in Tim Berner-Lee’s “layer cake” [21]. This can be done by using only one layer of the cake: the XML-layer. The XML layer from the semantic web is useful in context to this paper. XML schema is a description for a type of XML document, typically expressed in terms of constraints on the structure and content of documents of that type, above and beyond the basic syntactical constraints imposed by XML itself. With these schemas an expressive mechanism is provided for encoding the necessary metadata in a controlled way by the XML schemas. An XML schema provides a view of the document type at a relatively high level of abstraction. In section 3.3 we explain how the constraints can be set to create a target domain knowledge representation for commercial videos.

10

One of the biggest problems with the semantic web is the semantic gap as defined by Smeulders et al [22]: “The semantic gap characterizes the difference between two descriptions of an object by different linguistic representations, for instance languages or symbols. In computer science, the concept is relevant whenever ordinary human activities, observations, and tasks are transferred into a computational representation.” In this paper the expert knowledge needs to be translated into a computational representation to be fitted to the knowledge about the social context. With interest measurement human activities are observed and transferred in computational representations. A higher-level semantic meaning is given to the viewing direction of a human observer, resulting later on in a lower interest measure. The observations in this research are done by computer visions processing of observations from the audience translated into a social context model on which a decision process is based for picking the most applicable commercial. 2.6.3 MPEG framework The Moving Picture Expert Group was formed by the International Organization for Standardization (ISO) [18] to set standards for audio and video compression and transmission. In the context of semantic representation of media the MPEG-7 and MPEG-21 standard will be addressed. Previous standards from the Moving Pictures Expert Group [23][24][25] mainly focused on compressing and encoding audio and video files. MPEG-7 [26] is a standard developed to allow fast and efficient search for material that is of interest to the user. Thus, it is not a standard that deals with the actual encoding of moving pictures and audio. MPEG-7 uses XML schema to store metadata, and can be attached to the media to describe particular events of interest. MPEG-7 requires complex structures to allow users to search, browse, filter and interpret content using search engines, filter agents, or other applications. In order to do so MPEG-7 uses description tools in the form of descriptors (Ds) and description schemata (DS). The language to specify these schemes is called the Description Definition Language (DDL). Structuring the descriptors in schemata allows MPEG-7 to define relationships and create application specific content descriptions. These descriptions are outlined by van Ossenbruggen et al [20] (p. 43). In our context efficient searching and filtering is needed, however the complex Descriptors and Definition Language are only justified in a MPEG framework related context. In the proposed system no such context is relevant, making the usage of the complex Descriptors and Definition Language gives little advantage. The MPEG-21 [27] standard has the goal to describe an open framework that lets users integrate all delivery chain components necessary to generate, use, manipulate, manage and deliver multimedia content across a wide range of networks and devices in an open framework. MPEG-21 also uses an XML-based standard, and is designed to communicate machine-readable license information and do so in a “ubiquitous, unambiguous and secure” manner. The MPEG representations make use of the document description language (DDL). One of the disadvantages of DDL is that it doesn’t facilitate a diverse set of linking mechanisms between descriptions and the data being described. More about this standard can be found in the paper by van Ossenbruggen [20] (p. 43). The complex structures created for searching, browsing and interpret content are just as those described in MPEG-7 and thus are more complex than needed for this project and would be an overkill to our purposes. 2.6.4 TV-Anytime A good example of metadata generated to serve a clear purpose for a specific goal is the metadata TVAnytime standard. Following the definition by Wikipedia [29]:

11

“TV-Anytime, is a set of specifications for the controlled delivery of multimedia content to a user's digital video recorder (DVR). It seeks to exploit the evolution in convenient, high capacity storage of digital information to provide consumers with a highly personalized TV experience. Users will have access to content from a wide variety of sources, tailored to their needs and personal preferences.” A schematic overview of the metadata model used in TV-Anytime is outlined in Figure 1.

Figure 1: Schematic overview of the metadata model for TV-Anytime from SP003V10 [30]

Based on this schema a XML file can be created, which can be structured by a XML schema file for restrictions, creating an easy to find and filter metadata. In this research using a selection from the Dublin Core Elements and expanding upon these, a solid representation for the commercial videos in a similar way as TV-Anytime does is created. Expert knowledge is added for TV-Anytime in the form of reviews from critics. Unfortunately the TV-Anytime specification has yet to be implemented in practice.

2.7 Conclusion
In the previous section different possibilities for multimedia metadata have been reviewed. For the requirements for the systems set in section 2.3 we need information in real-time, this can be quite complex. However the constraints for this problem are well defined. To set an easy to use representation containing all the information that meet the set requirements an Advertisement Dublin Core in MPEG notation in XML will be used. The elements of interest are defined by a slimmed version of the basic elements in standard Dublin Core expanded with the contextual tagging of the expert knowledge provided by the advertisement expert.

12

3. Implementation
The implementation section consists of an outline of the complete system. In the following sections the different models are described in detail as well as the algorithms to perform the calculations. A schematic overview of the system can be found below in Figure 2

Figure 2: schematic overview of the system

The different models are discussed in the following sections. First in section 3.1 the general use of the system is outlined. The Interface where advertisers can add their expert knowledge is described in section 3.2. Next, in section 3.3 the Contextual Model is described. In section 3.4 the Contextual Model is updated with data gathered by the Retention Model. In section 3.5 the history model is described and in section 3.6 is concluded with the Interest Measurement.

3.1 General use
In the system there are two distinct zones. The advertisement zone can be found on the left side in figure 2. The commercial database is represented with the basic element as discussed in section 2. The basic elements are expanded with expert knowledge added by an advertisement expert. The interface for the adding of the knowledge by the expert is described in section 3.2. The other zone, found right in figure 2, is the advertising zone. Here the different models work together to output the commercial best fitting the social context, based on measuring and generating information about the targeted audience. This zone relates to the bus stop from our example in the introduction on page 4.

3.2 Interface description
To add expert knowledge to the representation the expert will have to annotate the commercial. In basis there are two sets of sliders to allow the user to capture the intended social context for the commercial. The first set of two sliders scales the commercial in what degree the commercial is intended for which gender. The constraints for the scale are discussed in section 3.3.

13

In Figure 3 an example can be found for Coca Cola [30], providing expert knowledge about Coca Cola: Coca Cola Light focused on female audiences, Coca Cola Zero focused on male audiences.

Figure 3: The Coca Cola Example

The second set of slider is used for scaling the degree in which the commercial is intended for a certain age group. Restrictions for these are also defined in section 3.3. The advertiser can move these sliders in a similar way as the gender sliders to capture the intended target domain.

3.3 Contextual Model
In the contextual model the requirements 1 and 2 set in section 2.3 are implemented. In the theoretical background (see section 2) we described a simple but powerful schema for the representation of the basic elements of a commercial video that is sufficient for our purposes. To give a higher semantic meaning to the representations expert knowledge about the commercials is added through the basic set we described before. These elements will have to contain the following knowledge: • • Knowledge about the relevance of the commercial video to different gender categories. Knowledge about the relevance of the commercial video to different age categories.

To allow the advertiser to add this knowledge additional elements are added to the representation of the video: a node for gender, containing relevance for both gender categories male and female (the first set of sliders in figure 3) and a node for age containing relevance for the different age-groups computer vision can provide information about as described by Jakic in [2] (to be found as the second set of sliders in figure 3). An example for the added elements in a XML structure is available in Fragment 2.

14

<gender> <male>0.65</male> <female>0.35</female> </gender> <age_category> <young>0.3</young> <young_adult>0.3</young_adult> <middle_aged>0.2</middle_aged> <senior>0.2</senior> </age_category>
Fragment 2: The added knowledge in XML structure

The values from Fragment 2 can directly be acquired from the interface in Figure 3 using the relevance formula in Formula 1 imposed by XML schema. The relevance for each sub-node is represented as formulated in Formula 1: 1
Formula 1: Relevance for sub-nodes in knowledge structure

Where n is the total number of sub-nodes the root-node (gender or age category) has. Each sub-node has a value between 0.0 and 1.0 scaling the relevance in a percentage for the particular group the sub-node represents. Looking back at the example in Fragment 2 this particular item would be relevant if the social context of the viewed commercial video consisted out of 65% (0.65 ) male and of 35% (0.35) female viewers. The added elements in the representation are exactly the same elements which can be modeled from the data computer vision can provide. Obviously an exact match is rarely found. Therefore a mechanism is developed for finding the commercial video closest to the social context model. This is done by adding an extra element to the model: a distance element. The distance element gives the absolute distance to the acquired context model for each commercial. The formula to add this element is described in Formula 2.

Formula 2: Distance to context model as an extra element

Where n is the total number of sub-nodes the root-node has, x represents the different age nodes, y is the different gender nodes, and m is the found social context model with the same structure as the representation. Formula 2 denotes for each commercial in the database an element with the absolute distance from a certain commercial to the acquired social context model. If the commercial is picked with the minimum total difference to the context model the best fitting commercial is returned. This is done by applying Formula 3.

15

Formula 3: Getting the commercial nearest the context model

Where commercial is the commercial best fitting the social context model and context model distance is the collection of al context model distances from the database.

3.4 Retention model
The Retention Model is applied after the contextual model to prevent repetition of the same commercial video to a level wear-out is reached. To prevent this from happening a backlog is kept with retention information. To store data XML is used to maximize compatibility with the rest of the system. An example of a typical backlog entry is given in Fragment 4.
<time timestamp="16:10:14"> <id>28</id> <model_information> <interestindex>0.5</interestindex> </model_information> <gender> <male>0.65</male> <female>0.35</female> </gender> <age_category> <young>0.3</young> <young_adult>0.3</young_adult> <middle_aged>0.2</middle_aged> <senior>0.2</senior> </age_category> </time>
Fragment 4: An example of a backlog entry

From this log information about time, id, interest, and social context model can be gathered. The interest information will be used in the interest measurement, which will be discussed in section 3.6, for the Retention Model the elements time and identifier (id) are used. To prevent repetition from the same commercial videos the distance between the contextual model and the commercials as found in the previous section needs to be increased for the commercial videos that have just ended. This is done in the retention model by the discount generated by Formula 3:

Formula 3: Discount formula for the retention model

Where starttime and endtime define the timeframe over witch to count. Count is number of commercials counted within that timeframe and retention discount parameter is the discount value. By adding the discount from Formula 3 to the distance generated by the context model in Formula 2, the distance to the found contextual model is increased for commercial videos played more often. The distance is increased by multiplying the variable retention discount parameter for each time the

16

commercial video has been played. The identifiers are found within a variable timeframe, the timeframe parameter can be set to best fit the wear-out parameters for a certain location. By making the parameters for retention variable the system is able to be flexible to the wear-out constraints in different environments. By conduction again the formula in Formula 3 the commercial with minimum distance to the context model expanded with the retention model is found.

3.5 History model
The history model is a model only used as backup. This model takes over the presentation of commercial videos when no social context model can be generated. This can happen in case of a camera malfunction, vandalism resulting in the blocking of the camera or if nobody is in the observable surrounding of the camera. This doesn’t have to mean no potentially interesting audience is present. The result of absence of information doesn’t automatically have to mean there has to be no presentation at all. A likely model of the social context can still be made using data gathered from previous days (weeks, months or years). In the ideal case the system has been running for a substantial time and social context data is gathered in the backlog as we described in section 3.4. In Fragment 4, besides the elements used in the retention model, elements are available that describe the context model in terms of gender and age category within the playtime of the commercial. Generating information for a point in time where no information is available through a variable threshold. Data can be gathered any number of minutes ahead and any number of minutes backward resulting in a series of social context models. From these series an average model can be calculated. The number of minutes is kept variable to allow the system to be flexible in different locations, this threshold is likely to be kept short in dynamic social contexts where the number of people is changing fast and is set relatively longer in more static environments. By calculating an average model over these series a most likely model can be created to be relevant for the current point in time. The average model is acquired through applying Formula 4.

Formula 4: Calculating the average context model

In this formula the average context model is found within a certain threshold starting at the current time – threshold and ending at current time + threshold. Each node in the context model is added and divided by the total number of models found.

3.6 Interest Measurement
In this section requirement 3 as defined in section 2.3 is addressed. While the commercial is playing apart from information about the social context the system also gathers data about the interest the audience displays for the presented commercial video. The data gathered consists of the elements displayed in fragment 5.

17

<pose> <frontal>0.15</frontal> <fifteen>0.35</fifteen> <fifteen_min>0.10</fifteen_min> <fourty_five>0.25</fourty_five> <fouty_five_min>0.20</fourty_five_min> </pose>
Fragment 5: An example of interest measurement data

The elements in this fragment denote different viewing angles (to the screen) computer vision has reported. More information about the different subcategories can be found in [2]. To make this information comparable the formula in Formula 5 is used: ∑

Formula 5: Calculating the interest index

In this formula the interestindex is a value that can be compared to other interest indices. The angleinformation consists of all values from the elements in Fragment 5, these values are multiplied by a weight value. The weight value is chosen according to the precision these elements can be acquired. From the research by Jakic in [2] accordingly a weight is chosen for each viewing angle direction. The distribution is chosen to give higher values to the more frontal angles and lower to the more sideway viewing angles. In this way the interest index becomes higher if the audience shows through their head position a clear interest for the commercial. This value is multiplied by the number of samples found during the commercial. Finally this value is divided by the length of the commercial, which results the interest rate per second. These measurement will be stored in the backlog and used later in the user test which will be described in section 4.1

18

4 Experiments
It is difficult to validate the decision making process in this paper against existing or known methods because related systems haven’t been scientifically validated and decision making based on the acquired models hasn’t been done before in this context. In order to do test the system and validate results a test was set up in order to compare this work to “classic advertisement”: the random display of commercials (possibly narrowcasted only to general characteristics of a location). The user test is described in section 4.1. In section 4.2 the results are presented.

4.1 User test
To test the system three display units are positioned. Each unit consists of a screen to display commercial videos, a process capacity (laptop) to process the information and a webcam. One of these units is equipped to be the control system displaying commercials videos at random. This unit is equipped with a webcam to log the interest the users display for the presented commercial video. The interest is measured as described in section 3.6. The other two systems are equipped with the complete social context directed system. Displaying commercial videos following the model interface described in section 3. The two directed units also log the interest measure as described in section 3.6. For the test a database of 62 commercials was generated. The basic elements for the representation are filled using information from the Internet. The expert knowledge is formulated with assistance of an expert on commercial videos and narrowcasting. Commercial videos are acquired from YouTube[31]. The commercial database is filled with evenly distributed commercials over the different age and gender categories. An evenly distribution is chosen to let the database hold commercials which should appeal to the different age and gender categories. Next to these age and gender specific commercials some commercials are added to appeal to both gender groups and/or more than one specific age group to better serve the audience in cases of a various audience. An overview of the commercial database can be found in Figure 4.

Figure 4: Overview of the commercial database

19

The test-group consisted out of 10 persons, of whom 4 were female and 6 male. The age categories were distributed with 3 people in the senior age category, 6 people in the young adult age category and 1 in the middle aged category. The test-setup is portrayed in Figure 5.

Figure 5: The test-setup

For the experiment 2 sets of tests were conducted, each set consisted of 2 subsets: one for age validation and one for gender validation. Each set was started with one test person starting at each directed unit to prime the unit. The person in front of the directed unit is representative for the age or gender category for which to start the directed system with in order to prevent the system to generate (too) similar content. The rest of the test subjects in the user test were allowed to move freely between units. The test subjects were asked to move to the unit displaying a video commercial best suiting their age and gender characteristics. In Figure 6 an example can be found, the woman in the audience where concentrated on the left side of the room as can be seen on the left side of the figure. The commercial typically appealing to woman played on the unit can be found at the right side of the figure. In the entire test the interest was measured, the played commercials as well as the context model where the decision was based upon were stored. The second set of tests was the same as the first, set up to see the difference in interest decrease for the audience because some commercials were likely to be played again and wear-out played a bigger role.

Figure 6: User test example

20

4.2 Results
In the tables below the results of the user test can be found. In Table 1 we can see the average distance to the context model for each test where the lower the distance, the better the commercial fits the audience. The average value is calculated over all displayed commercials during the timeframe of the test. The presented resulting values are the summation over the relative distances for all the elements in the representation. The Random display shows values all above 1.23. The directed displays kept substantial closer to the context model with a maximum just above 1. In the later tests: test 2 and 4 the average distances is slightly higher because the retention model played a bigger role. Between test 2 and 3 a break period of 15 minutes was held in this period the retention model (having a threshold of 5 minutes) was reset. The primed groups for each display are displayed most left column. Random display Test 1 {starting 1: female, starting 2: male} Test 2 {starting 1: senior, starting 2: young} Test 3 {starting 1: senior, starting 2: young} Test 4 {starting 1: male,
starting 2: female} Table 1: Average distance to context model, resulting values as summation of relative distances

Directed display 1 0.71 0.74 0.80 0.91

Directed display 2 0.69 0.69 1.02 1.04

1.83 2.01 1.46 1.23

In Table 2 the interest measurement can be found. The values where rescaled on a scale from 1 to 10, with 10 being the highest. Rescaling was done because the different processing power of the different machines used was resulting in different readings for the interest measurement (higher processing power generates more images per second). The highest interest was reached by the Random display, nearly followed by the directed displays. Random display Test 1 {starting 1: female, starting 2: male} Test 2 {starting 1: senior, starting 2: young} Test 3 {starting 1: senior,
starting 2: young}

Directed display 1 7.08 3.15 2.70 2.29

Directed display 2 2.97 5.48 4.44 2.92

3.48 7.20 6.04 3.50

Test 4 {starting 1: male, starting 2: female}

Table 2: Interest measurement scaled from 1-10

The interest was measured using the techniques described in section 3.6. For Test 2 and test 3 we can see a higher measured interest for the Random display. In Test 1 the Directed display got a higher interest rate. In test 4 the differences were small.

21

5 Conclusion
In this paper a workable way to represent expert knowledge for commercials was described. The representations hold expert knowledge about gender and age appliance for the commercial. From the results in section 4 can be concluded that a context based on a decision-making system generates content better fitting the social context. The distances from the displayed commercial to the context model where substantial smaller then for the random displayed commercials. This proves that the system is able to pick a commercial video based on the expert constraints better fitting the local audience at that time. It has proven to be useful to add the extra knowledge from experts to the advertisement system to better fit commercials. The described model based context sensitive decision process has proven to be valid. The results didn’t show a higher interest rate for the directed units. The directed units were expected to generate a higher interest-rate. The low interest rate could have different reasons. Analyzing the data showed that the average interest displayed for the random commercials was substantially altered by a few high interest-rates for certain commercials. The few high measurements can be explained by the subjectivity of the commercials in the database. Some of the commercials in the database appealed more to the targeted audience in terms of humor no matter whether the product was relevant for the targeted audience. The commercial dataset was a little subjective, consisting mostly of commercials most people found humorous. The displaying of funny commercials resulted in the switching of a lot of test-subjects even if the commercial wasn’t specifically targeted at their domain. Another reason could be the wear-out the test-subjects showed over time. Some of the test subjects had seen enough of a certain type of commercial that made them switch to another screen. From this interest measurement the system has not proven to have a greater appeal to an audience in comparison to the random display of commercials, this wasn’t expected but the mentioned problems could explain the results. In future research a better dataset and a more diverse user group could be researched, expecting a higher interest rate for the directed units. Another solution would be to find a way to compensate for the humorous commercials by correcting the measurements. The placing of different test persons before the test started to prime age and gender had little effect on the average display. The small effect on the average proves the adaptability of the system, from the backlog can be learned that after 1 or 2 commercials the content was fitted to the audience who joined the seeded persons. A problem with the test was the test-group being slightly out of balance. Because older people were in the minority and young adults were in the majority in general most commercials were targeted at that group. The more displayed commercials targeted at young adults proves the validity of the system, the commercial is picked best suiting the general characteristics of the audience. There were however a few moments when senior people were in the majority as can be seen in the context model backlog, the displayed commercial in those cases were targeted at this group proving again the flexibility of the system. Another problem with the commercial database was the small amount of similar appealing commercials for senior people. Most commercials targeted at the senior age group were annoying even for the members in the test audience for who these commercials were intended. Generating a neutral database with commercials proved difficult as commercials tend to subjectively address people. Concluding, a system could be implemented that act as an autonomous, directive, context sensitive information provider. The commercials displayed in the tests better fitted the targets in the test group. Future research has to prove higher interest rates for the displayed items through finding more neutral commercials for the user test.

5.1 Future work
A possible future improvement for the decision-making processes described in this work is adding an Interest Model to the decision process. This model could use information from the interest

22

measurement as described in section 3.6. Using this measurement the distance from a commercial video can be altered in the form of a discount (or benefit). A useful alteration could be to increase the distance when the interest measurement for a certain commercial video is low. This means in general few persons in the audience are interested in the commercial so ideally the commercial should be played less. Except for an Interest Model a model should be created for Economics. This is a commercial more relevant addition to allow advertisers to override the system by paying more to display their commercial more often. This would go against the intended decision making process but would commercially be relevant. Another possible improvement could be increasing the number of camera’s, this is mainly an improvement for the recognition rates for the research by Jakic in [2], but in this research the measurement of the viewing angles should be extended to combine information from multiple camera’s calculating the subsequent angles to the other camera’s to create an improved interest measurement (and possible improved Interest Model). In the user test appeared some irritation about the repetition of the same commercials on different displays, while this is prevented by the retention model no link between the units was present. This way it was possible if both units found a similar model the same choice for commercial video was made. A possible sensible improvement would be to link nearby systems to one retention model. With one combined model for retention this kind of annoyance could be prevented. In the conclusion the wear-out of test-subject was discussed, some test-subjects displayed a wear-out for certain product categories. The product categories could be added to the system in order to prevent the same type of product (in different videos) repeating.

23

6 References
1. Flera, Aguie. Mass Media Communication in Canada. Thompson Nelson. Scarborough: 2003. 2. Jakic . Human Facial Features Extraction and Classification using Domain-Sensitive HMMs. Bachelor Thesis: 2009 3. Mooij. Distinct features of interest for advertisers and measuring interest of media on intended targets. Literature Research: 2009 4. Bhargava, Feng. Paid placement strategies for internet search engines. Proceeding of the eleventh internation conferece on World Wide Web. In proceedings of the 5th international conference on Electronic commerce. ACM Press, 2003 5. Whang, Zhang, Choi, Daeredita. Understanding consumers attitude toward advertising. Eighth Americas Conference on Information Systems, August 2002 6. Chang, Hsieh, Chung, Wu. VISA: Visrtual Spotlighted Advertising. MM 2008 7. Homepage of Cognovision http://www.cognovision.com/solutions.php 8. Homepage of Quividi http://www.quividi.com/ 9. Homepage of Wututu http://www.wututu.com/en/ 10. Homepage of Trumedia http://trumedia.co.il/default.asp 11. Harmelen, Lifschitz, Porter. Handbook of Knowledge Representation. Elsevier: 2008 12. Wikipedia page on Metadata http://en.wikipedia.org/wiki/Metadata 13. Nack. Multimedia Metadata. Multimedia Metadata The Encyclopedia of Database Systems’. Springer Verlag, Heidelberg.2007 14. The Dublin Core metadata initiative http://dublincore.org/ 15. Homepage of the W3C Semantic Web Activity http://www.w3.org/2001/sw/ 16. Homepage of the World Wide Web Consortium http://www.w3.org/ 17. ISO MPEG-7 Overview http://www.chiariglione.org/mpeg/standards/mpeg-7/mpeg-7.htm 18. Home of the International Organization for Standardization http://www.iso.org/iso/home.htm 19. Expressing Simple Dublin Core in XML http://dublincore.org/documents/dcmes-xml/ 20. van Ossenbruggen, J., Nack, F. & Hardman, L. (2004) That Obscure Object of Desire: Multimedia Metadata on the Web (Part I). IEEE MultiMedia, Vol 11, No. 4. 21. Berners-Lee, Hendler, Lassila. "The Semantic Web". Scientific American Magazine. 2001 22. Smeulders, Worring, Santini, Gupta, Jain. Content-Based Image Retrieval at the End of the Early Years. IEEE Trans Pattern Anal Mach Intell (2000) 23. Wikipedia page on MPEG-1 Standard http://en.wikipedia.org/wiki/MPEG-1 24. Wikipedia page on MPEG-2 Standard http://en.wikipedia.org/wiki/MPEG-2 25. Wikipedia page on MPEG-4 Standard http://en.wikipedia.org/wiki/MPEG-4 26. Wikipedia page on MPEG-7 Standard http://en.wikipedia.org/wiki/MPEG-7 27. Wikipedia page on MPEG-21 Standard http://en.wikipedia.org/wiki/MPEG-21 28. Wikipedia page on TV-Anytime http://en.wikipedia.org/wiki/TV-Anytime 29. The TV-Anytime Forum, Specification Series: S-3 On: Metadata (Normative) 2001 30. Homepage of The Coca Cola Company http://www.coca-cola.com 31. Youtube http://www.youtube.com

24