You are on page 1of 31

1 !

The Future of Journalism: Artificial Intelligence And Digital Identities

Noam Lemelshtrich Latar
Sammy Ofer School of Communications IDC Herzliya Israel

David Nordfors
Stanford Center for Innovation and Communication Stanford University

Feb. 2011

2 ! Table of Contents
1 INTRODUCTION....................................................................................... ......... 3 2 DEFINING JOURNALISM IN THE DIGITAL AGE ....................................... 7 3 ESTABLISHING THE DNA OF JOURNALISTIC CONTENT........................... 9 3.1 CONTENT BASED IMAGE RETRIEVAL (CBIR) 3.2 VIDEO INFORMATION RETRIEVAL 3.4 THE DNA OF LITERATURE 3.3 HUMAN CENTERED CONTENT ANALYSIS .................................................... 10 ......................................................... 11 ...................................................................... 11 ............................................................................ 12 .......................................................... 12 .................................... 15 ............................................... 15


................................................................................ 16 ...... 17


.................................................... 18 ....................................................... 19

5.4 BEHAVIORAL TARGETING AI ENGINE BASED ON JOURNALISTIC CONTENT AND ................................................................................ 20 6 DIGITAL IDENTITIES AND W EBLINING......................................................

REFERENCES..................................................................................................... 27 ABOUT THE AUTHORS.................................................................................. 30 END NOTES.......................................................................................................... 30

3 !

The Future of Journalism: Artificial Intelligence and Digital Identities

Interaction between journalism, the Internet and social communities is familiar and intensely discussed, helping us understand how journalism can raise our collective intelligence. We discuss how artificial intelligence (AI) will add to that picture and thus influence the future of journalism. We describe 'Digital Identities' and their future interaction with journalism. We summarize state-of-the-art AI methods usable to establish the 'DNA' of journalistic content, how matching that content with digital identities enables behavioral targeting for consumer engagement. We review the driving forces such procedures may introduce to journalism and show an example of a journalistic behavioral-targeting engine. We highlight some concerns and discuss how using digital identities and AI can be complex versus current journalistic principles. We stress the need for ethical principles in using digital identities in journalism, and suggest examples of such principles. We issue a call for stakeholders to jointly explore the potential effects of AI algorithms on the journalism profession and journalism's role in a democratic society and suggest questions to be explored.

1. Introduction
Computer-assisted intelligence is part of life: augmented intelligencei of individuals using personal computers and collective intelligence of groups when networking. Finally, there is Artificial Intelligence (AI), when computers act intelligently without human interaction, mimicking human intelligence (Turing, 1950) These intelligences are blending and converging. Augmented individual intelligence, Collective intelligence and AI are co-evolving. The Internet is becoming part of our minds and our minds are becoming part of the Internet. Journalism is part of IT-assisted intelligence. Personal computing entered journalism in the 80s, the Internet in the 90s, and we are now seeing the explosion of social interaction enter journalism, ranging from reader comments to crowdsourcing.

4 !

The interaction between journalism, the Internet and social interaction is familiar and intensely discussed, helping us understand how journalism can help increase our collective intelligence. Here we study how AI may contribute through algorithms being developed for rating news, based on mixing systems for aggregating crowd opinions (collective intelligence) and smart algorithms for contextual analysis (AI). Ratings help control societal systems. Any recognized rating method influences societal development; people will try to improve their ratings. A rating that changes peoples lives represents a complex issue. Even if everyone finds a rating annoying and counterproductive, it will still influence the system, given that people think others recognize the rating. Indeed, journalism must scrutinize and challenge rating systems and explore alternatives. Intelligent algorithms rating journalism, such as TechMemeii, strive to share in public perception of which tech journalism matters more than others. This may incent journalism to optimize stories to rank high. This is only one example of how AI is co-evolving with journalism. Journalisms role is to focus attention on stories that interest the public. For journalism to remain meaningful, it should also empower the audience. So how does it interact with individuals augmented intelligence, societys collective intelligence and machine AI? Ideally, journalism raises intelligenceempowering the audienceas it uses higher intelligence around it, i.e. the audience and the machines. AI algorithms are changing professional journalism and related academic research. AI is penetrating journalisms traditional pillars: journalistic content (via automatic content analysis in all media formats and delivery systems) and advertising (by measuring consumer attention and targeting ads per user digital identity or personality, measured by behavior). Both content and advertising are changing dramatically. The new media and AI technology based on computings growing power are the change agents. Interactive new media is permitting, for the first time, accurate measurement of the attention each user gives to journalistic content. Advertisers will demand full validation of consumer ratings. Existing measuring methods will

5 ! vanish. Fierce competition will arise in selling consumer attention to advertisers, whose ROI (Return On Investment) will determine the fate of channels for advertising, including journalism paid by ads, across all media formats. Journalistic content is undergoing major changes via interactive platforms that make media content available continually, everywhere. Until recently, the mass media for distributing content were controlled by the same companies that produced content. The traditional business model for news and entertainment included controlling and bundling both medium and content. But with the Internet, a new generation of media incumbents is arising. Companies such as Twitter, Facebook or Google consciously avoid producing content. They do not do journalism; they only provide access to journalism. Journalism is separating from the media (Nordfors, 2008)iii. The latest generation of producers of journalism is no longer involved in the processes or infrastructures of mass communication. They focus on producing content and publishing on-line, delivering it via the infrastructures of the new content-neutral media entities. The Huffington Post and TechCrunch, started as blogs, are now large and important publications, without controlling the infrastructure for spreading content. Traditional media spend hugely to measure readerships, estimating their sizes and attention probabilities and creating statistics and probabilities for advertisers. On the Internet, the new media offer content producers and advertisers not probabilities but hard data: which user looked at what, where, when and for how long. Advertisers know if a reader clicked their ad. Traditional media ads are indiscriminate, broadcast to all consumers and costing the same, regardless of how many people pay attention or act. The Internet enables contextual advertising, where advertisements shown to each user are selected and served by automated systems based on content displayed to the user. Monitoring users and adapting content and ads to individuals is revolutionizing content, media and marketing. In a digital-interactive world, marketing must account for media spending. The ROI in advertising and targeting content is becoming a science, driving development of advertising, media and content. In 2007, global ad spending was estimated at $385Biv, equal to the 2008 GDP of the worlds 26th largest economyv.

6 !

Targeting content per consumer digital identity will require AI engines to analyze multi-dimensional content vs. attributes of the engaging experience and a consumers total beingrelate human DNA, content DNA and context DNA (attempts to identify successful music and literature DNA already exist). Research in biology, genetics and psychology that explore and identify links between individuals genetic codes, cognitive attributes and pro-/anti-social behavior is merging with data mining relating to Web 2.0 social-network activities aimed at consumer profiling. Digital Identities will integrate a person's genetic code with data derived from web clicks. People will pay with privacy for social networks benefits. New AI algorithms analyze contenttext, video, audio and still imagesto annotate (tag) content automatically. Global efforts are creating unified digitalidentity standards to individuals and use AI engines to target, code and annotate content automatically vs. digital identity. This will affect journalistic content significantly and may revolutionize journalism and its academic research. Journalism must adapt and investigate new business models (Lemelshtrich Latar & Nordfors, 2009)vi. In this article, we describe digital identities and new global standards for digital identities, the use of social networks, genetics and virtual worlds for creating digital identities and the new AI research being used for adapting content to digital identities. Scientists are converting journalistic content to math formulations (signatures) to understand content and context. We probe the popular concept of media engagement and its derivativesbehavioral targeting, contextual targeting, and how AI is used in social networks to target content and ads. An AI engine that can filter and target journalistic content based on the consumers digital identity, to maximize the ROI of every dollar spent on advertising will be described. We highlight some concerns and discuss how using digital identities and AI can be complex versus current journalistic principles. We stress the need for ethical principles in using digital identities in journalism, and suggest examples of such principles. We issue a call for stakeholders to jointly explore the potential effects of AI algorithms on the journalism profession and journalism's role in a democratic

7 ! society and suggest questions to be explored.

2. Defining Journalism in the Digital Age

What is happening to journalism in the digital age? Until now journalism and the media were synonyms. Journalism was symbolized by the infrastructure for mass communication and vice versa. Stop the presses meant breaking news. Organizations controlling the infrastructure for mass communication also controlled the content being broadcast. This is reflected in the dictionary definitions of journalism, as in the Compact Oxford English Dictionary, published on-line on the Internet through AskOxford.comvii:

journalist noun, a person who writes for newspapers or magazines or prepares news or features to be broadcast on radio or television.
Ironically, the on-line dictionary does not include the Internet in the list of media. But merely including the Internet would not save the definition. Now everybody can broadcast news over the Internet, but that does not make everybody who does it a journalist. Until now, there have been communication infrastructures for one-to-many communicationmedia, and one-to-one communicationtelephone. One-to-many communication has been seen as the media, mainly journalism and entertainment, where publishers are responsible for broadcasting and consumers have no responsibilitythey can choose to receive the broadcast or not. One-to-one communication, mainly telephone, has not been considered media but personal conversations, mediated by an impartial infrastructure and telecom service provider. Nobody is responsible for the entire communication, the responsibility lies between the interacting parties. With the Internet there is no longer a difference between infrastructure used for one-to-one or one-to-many communication. Whats more, it enables many-to-many communication. Web 1.0 spread the one-to-many communication possibility beyond the media. Everybody could publish. Web 2.0 introduces many-to-many communication. Now the crowd can publish together. The new media companies the ones not providing their own contenthave no problems with this. In trying to preserve their practices and identities, old media companies tend to

8 ! hold on to dated one-to-many media technologies. Their business models, based on controlling the medium and the content, have been difficult to move to the Internet. In cases where new business models for ads have succeeded, such as Google, eBay or Craigslist, the brokering of ads is not integrated with the practice of journalism. Journalisms essence is described in principles of journalism, as suggested by the Pew Research Centers Project for Excellence in Journalism (PEJ) and the Committee of Concerned Journalistsviii: 1. Journalisms first obligation is to the truth; 2. Its first loyalty is to the citizens; 3. Its essence is a discipline of verification; 4. Its practitioners must maintain independence from those they cover; 5. It must serve as an independent monitor of power; 6. It must provide a forum for public criticism and compromise; 7. It must strive to make the significant interesting and relevant; 8. It must keep the news comprehensive and proportional; 9. Its practitioners must be allowed to practice their personal conscience. These principles remain, even when we no longer know what the media are. Consider a new, short definition of journalism, separating it from the media, connecting journalistic principles based on the relation between journalism and its audience, rather than on its relation to the communications medium it uses (which is what is causing the confusion today). Take, for example, the following suggestion (Nordfors, 2009)

Journalism is the production of news and feature stories, bringing public attention to issues that interest the public. Journalism gets its mandate from the audience.
Journalism must act on behalf of its audience, not its sources or advertisers. Journalism often has business models based on attention work (Nordfors, 2006). generating and brokering attention, such as selling ads. Therefore, much journalism is attention work performed with a mandate from the audience. When attention work is done with a mandate from the sources, it is public relations and publicity, not

9 ! journalism. Journalisms role as agenda-setter of public debate (as described by McCombs and Shaw in their agenda-setting theory [1972]) depends on journalisms ability to focus public attention on issues that interest the public. This is for many a raison-detre for journalism; it requires a business model that incentivizes focusing public attention. Business models for journalism based on brokering knowledge, not attention - such as newsletters - do not necessarily incentivize broad public attention and may dis-incentivize it. Who will pay a high price for a newsletter containing information already known to the public, and probably available on the Internet? The central question for journalisms survival as an independent business is not how to adapt to the Internet but what new business models will satisfy journalistic principles in the innovation economy.

3. Establishing the DNA of Journalistic Content

From the early nineties major interdisciplinary research efforts have been invested in developing efficient ways to automatically retrieve information and knowledge from multi-medial journalistic content. The main objective is to let consumers find information they seek quickly and effectively. Today, major search engines such as Google, Yahoo and others yield millions of links to any request and cannot answer consumer requests expressed by simple keywords. The community of researchers involved in this multimedia information retrieval research (MIR) covers Human Computer Interaction (HCI), Information Theory (IT), Statistics, Pattern Recognition, Psychology and recently, the Social Sciences. Recent papers in these separate fields offer citations and borrow research methods and tools from the other fields. Substantial multi-disciplinary research into information retrieval from journalistic content is done from the perspective of consumer-initiated search for information and knowledge. Our objective is to study the implications of the research prospective, analyzing the significance of a new journalistic phenomenon where

10 ! content automatically searches for consumers based on their digital identities. But first we describe current research on automatic knowledge retrieval from journalism multimedia content. Most research tools used by the different communities aim at dividing content into small content digital units, analyzing them, tagging these sub-content units and then carrying an integrative analysis to conceptualize the entire content meaningfully for the consumers. Some researchers convert the visual content into mathematical formulations that can then be subjected to analysis employing AI algorithms!(Jeon, Laverenko, and Mammatha, 2007).

3.1 Content Based Image Retrieval (CBIR)

The primary method used in the search for image retrieval or automatically conceptualizing visual content is dividing the visual frames into smaller sections/regions termed blobs. This is achieved by using statistical tools such as clustering. Each blob is annotated with text. The visual image is described by employing categories such as color, texture, shapes, and structures. Statistical theories are used to associate words with image regions that are then compared with human manual annotations of similar images (Smeulders et al., 2000). Attempts are made to describe images using vocabulary of blobs as proposed by Duygulu et al. (2002). Jeon et al. (2007) proposed a method for using training set of annotated images for cross media relevance model for images. CBIR researchers are developing mathematical descriptions of images defined as signatures. The signatures describe an image in mathematical formulations to let researchers measure content similarities between image frames. Statistical methods such as clustering and classification form image signatures that will allow automatic similarity measurements by machines. Images are segmented by features such as color, texture differences, shapes and other salient points.

11 !

3.2 Video Information Retrieval

In video retrieval, researchers have attempted to develop automatic retrieval methods that do not rely on subjective human analysis. This required developing techniques that identify thresholds between color histograms corresponding to consecutive video frames (Flickner et al. 2003). A search engine, ImageSpace, was developed where users could direct queries for multiple visual objects, such as sky, trees, water etc. These tools were used for several video searches including automatic detection of pornographic content (ibid).

3.3 Human Centered Content Analysis

It has been long recognized that human satisfaction with search for knowledge and information in multimedia content involves several dimensions: a mixture of rational as well as emotional dimensions. The consumer search takes place in a certain context and emotional state and identical search results may be viewed differently by the same person based on his or her emotional state at the time of the search. A persons background, education and values affect his or her satisfaction with the search results. An important dimension is to study the emotions that a certain piece of content evokes in people. Datta et al refer to this dimension as aesthetics; Aesthetics is the kind of emotion a picture arouses in people. Emotions are subjective reactions and should be measured as such. This is referred to at times as affective computing which focuses on understanding the users emotional state that affects his satisfaction with the information retrieval.!(Barnard et al. 2003). CBIR researchers realize that integrating human feedback and involvement in the automatic multi-model content analysis is crucial in reducing errors and increasing user satisfaction. This new direction in research is called human-centered computing. Some researchers attempt to define images according to emotional categories. Salway et al developed a way to extract character emotions from films based on a model that links character emotions to events in their environment!(Salway, and Graham, 2003)

12 !

3.4 The DNA of Literature

Before the invention of computers (but after Boolian Logic and Bayes Theorems laid the mathematical foundations of modern computers and algorithms), in the mid 19th century, the French writer George Polti, (b.1868) analyzed the elements of successful literature, its DNA. Polti listed thirty-six dramatic situations in good drama, including prayer to the supernatural, crime pursued by vengeance, loss or recovery of a lost one, disaster, remorse, revolt against a tyrant, enigma and others. Poltis list remains popular and writers often use it in developing stories. Terry Rusio, the Shrek scriptwriter, said he referred to Poltis list to resolve a situation in the films plot. To create his list, Polti analyzed classical Greek texts and French literature. His analysis of the DNA of drama was followed by other writers. These attempts to discover good story elements and persuasive drama were not written from an information-retrieval perspective but provide the literary blobs that will let researchers dissect, in our case, a journalism story along its content elements. These content elements, found in texts, should receive mathematical formulations that will allow computer-based analysis. One should be able to retrieve content based, for example, on Poltis thirty six situations or preferably other sets of situations, which may be determined by modern data analysis of journalism stories, for comparative analysis or for marketing news stories to consumers based on their digital identities.

4. Journalism Content and Consumer Engagement

4.1. The Concept of Media Engagement
The economic engine driving journalism in the non-public service media has until now been advertising-based. Journalism companies, regardless of media platform (paper, video, audio) have sold consumer attention to advertisers. Though inaccurate, rating was the key measuring tool until the Internet. No pre-Internet rating technique can measure real attention by individual consumers to specific content. New media platforms, with the advance of interactive new media, make

13 ! the competition for consumer attention fierce and complex. The journalism industry now needs to develop new ways to measure consumer attention in multiple parameters, including the consumer cognitive and behavioral profiles and context parameters. The interactive nature of the new media platforms begins to allow for scientific measurement of consumer attention along personal dimensions. In this new battle for consumer attention the concept of engagement, a relatively new term, is being used to describe the new relations between consumers and journalistic content. The Advertising Research Council (ARC) has devised the following definition of media engagement: Engagement is turning on a prospect to a brand idea enhanced by the surrounding context the working definition proposed by ARC encapsulates the ultimate objective of linking positive effects towards a brand with brand advertising within the environment of the program content" (Kilger, and Romer, 2007). Context within which content is delivered is becoming of prime importance. Kilger

and Romer identified three mechanisms that enhance consumer engagement in a

journalistic content: Cognitive (relevance of the program and advertisement to the consumer) Emotional (the extent to which one likes the content and advertising) Behavioral (paying attention to the program and advertising content) (ibid). The main hypothesis is that the more engaged consumers are the more they will spend on the advertised product. This recognition by the advertising world, that engagement in journalistic content involves consumer cognition, emotional profile and behavior, provides relevance to computer-based information retrieval as applied to content analysis. Research by Kilger and Romer (ibid) about the relationship between media engagement and product-purchase likelihood reveals that as engagement measures increased so did the mean likelihood of products advertised in the media to be purchased. Three media platforms were studiedtelevision, Internet and printed magazines. All three exhibited similar findings. Internet and magazines exhibited very close response curves, while TV followed a similar path but slightly lower mean of purchase likelihood." (ibid).

14 ! The personal parameters Kilger and Romer examined were those traditionally used in socialscience research: gender, age, education, income, race and marital status. Age, income and race mattered. In the TV and Internet, people with lower education expressed higher levels of trust in the media and older people reported lower engagement. The finding that personal attributes affect media engagement is, as can be expected, of great relevance regarding digital identities. Digital identities are valuable to advertisers who will not hesitate to take advantage of them once available on a large scale and accessible automatically. The road to influence journalistic content in the direction of higher consumer engagement is short. Kilger

and Romer considered a limited number of personal parameters, as a larger number was too
many to fill within the space constraints of this article." (ibid). The Internet offers ways to measure and broker not only consumer attention and engagement but also consumer interaction. A pay-per-click model does this, as advertisers will pay not for being visible but for consumer clicks, an action. This can be taken further. For example, a click on an ad will usually lead to a sales site and may result in further interaction between the consumer and the vendor, including a purchase. So ads, thus journalistic content, could in principle be paid by finders fees. This could, however, introduce business incentives for journalists that might jeopardize journalistic principles. To convert the content-engagement/product-purchase relations into a science requires the analysis of many variables including contextual ones, and requires automation and the introduction of artificial intelligence: Excelling during an era of frugality in high expectations requires digital marketers to be accountable for every dollarThe ROI focus will force agencies to improve effectiveness and we see increased dependence on automation recent shifts [in the liberal direction] in user privacy perceptions have created a window for marketers to use AI to run efficient campaigns."!(ibid). The ultimate goal of engagement as perceived by the advertising industry is to target advertisements to consumers based on contextual and personal parameters as listed by the Kilger group: cognition, emotions and behavior. This today is being done and researched in the new media channels, termed Behavioral Targeting' by academic researchers, journalists and the advertising industry.

15 !

4.2 Behavioral Targeting and Journalistic Content

In the late 90s a new marketing field gained academic and industry attention: Behavioral Targeting. Recent advances in Internet and Web 2.0 interactivity, characterized by consumers becoming content creators and providers, have opened new frontiers for targeting ads to consumers based on interactive behavior. Behavioral Targeting is the ability to deliver ads to consumers based on their behavior while viewing web pages, shopping on-line for products and services, typing keywords into search engines or combinations of all three!(Aho Williamson, 2005). Many Internet companies are involved in behavioral targeting, including Google, Microsoft and Yahoo. M. Kassner (2009) surveyed Googles extensive use of behavioral targeting. Google confirms this in its official website. Google uses two separate systems, Adwords and AdSense. Adwords targets ads based on the search subject matter by identifying search keywords. AdSense targets ads based on website content the consumer views for example if you visit a gardening site, ads on that site may be related to gardening."!(ibid). AdSense was extended to searching annotated images and videos in YouTube. According to Kassner, Google is also trying to present relevant advertisements in the Gmail applicationby scanning every Gmail message for spam and sending ads based on the keywords the whole process is automated and involves no human matching ads to the Gmail content." (ibid). Googles rationale is that by making ads more relevant to customers it brings them more value. So far, AdSense and Adwords, in all their applications, are still based on text analysis. Once image and video content are analyzed and annotated automatically, behavioral targeting will likely be applied to all journalistic content.

4.3 Behavioral Targeting in Social Networks

Social networks characterized by voluntary profiling by members uploading personal data in texts, pictures and videos are ripe for behavioral targeting. Social network members profiles include lists of friends, hobbies, demographics and other interests. Behavioral targeting is growing rapidly in social networks. Startups are devising behavioral-targeting technologies developed for social networks. Stefanie Olson (2008) describes one example: The New York-based companys algorithms can follow consumer behavior patterns in social networks,

16 ! identify sociograms among members and identify for advertisers the more influential members and the viral propagators by studying message dynamics. Universal Pictures is using 33Across to study how people share studio trailers or content with their friends (ibid). Other companies that started to use behavioral targeting on social networks for marketing advertisements include Reverence Science and Tacoda Systems (bought by AOL, now a full subsidiary). Yahoo launched SmartAds, to combine behavioral information with demographic data for targeting ads. Behavioral targeting ad spending is projected at $1B in 2010, growing to $3.8B by 2011 (Mills, 2007). Behavioral targeting raises serious privacy issues discussed extensively in academic literature and political circles. The issue of privacy vis-a-vis consumer profiling is beyond the scope of this paper. Tim Berners-Lee, credited with inventing the World Wide Web, spoke before the U.K. parliament on privacy and the Internet. He said that he came to raise awareness to the technical, legal and ethical implications of the interception and profiling by ISPs in collaboration with behavioral targeting companies.!(Watson, 2009). He continued: It is very important that when you click, you click without a thought that a third party knows what we are clicking on I have come here to defend the Internet as a medium. (ibid). But surveys by TRUSTe (a privacy company) shows that the public show a willingness to submit to monitoring and enhanced content delivery.!(Olsen, 2008). This is a remarkable finding that should be followed.

4.4 Project Smart Push

Davitz of SRI applies machine-learning techniques to study communications in social networks as part of a multimillion dollar project funded by the Defense Advance Research Project Agency (DARPA) of the U.S. Department of Defense. Davitzs objective was to automatically monitor peoples interest and influence in military communities to identify the influencers then to ensure that they see relevant information in news feed to that topic." (Oslen, 2008) Davitz calls this targeting of news according to members interest profiles Smart Push. According to Olsen, SRI is looking at commercial applications for it not related to advertising you can already learn more about people from MySpace and Facebook." (ibid).

17 ! When a powerful research institute like SRI promotes concepts like Smart Push, news media, when rating is king, will adjust journalistic content to fit consumers digital profiles. This may be done by using an AI engine to filter or webline services based on digital identities.

5. AI: Digital Identities and Behavioral Targeting Engine

5.1. Managing Digital Identities Developing a Universal Standard
The consumers digital identity is a vital component in this process and will directly affect the type of services and information he or she will receive. Today, the global knowledge industry invests great resources in developing and improving management techniques of digital identities. Digital-identity management is developing rapidly and is called federated identity management. The term federated identity refers to various components of users profiles gathered while they surf on different sites and consolidated into uniform profiles according to a global standard. The term is also used for adoption of standards for the consumer-identification process on the various platforms. Currently, the most acclaimed standard for constructing digital identity is called SAML2, Security Assertions Markup Language 2.0; it enables consolidation of digital identities of

surfers on various platforms and management of those identities; and it allows mobilizing various parts of the surfers identity definition, defined on different social networks, and merging them into one virtual profile. The standard was successfully assimilated in financial organizations, academic institutions, the American electronic government and more. Adoption of international standards for defining digital identities is significant. It will enable researchers to follow surfers in any site in cyberspace and carry out widespread studies on the connection between the users digital identities and their personalities, fields of interest and cognitive abilities. Every surfer has a uniquely dynamic way of surfingderived from the person's ability to make decisions,

18 ! memory and additional cognitive factorsrendered to automatic cognitive diagnosis through AI algorithms. Soon AI algorithms will be able to construct a personal digital identity for every person performing actions on the Internet. Data-mining robots will be able to analyze texts, video and audio contents and transform them into sociological DNA (SDNA) that will describe the individual personality (Lemelshtrich Latar, 2004). Constructing the digital identity is a dynamic process updated as long as the person is active on the Web.

5.2 Digital Identities and Social Networks

One of the main Internet uses is activity in social networks. Today, millions of people belong to social networks that answer many needs, social, economical and political. A social network is a group that maintains connection to exchange information in text, video, photos or voice or for social purposes. Every network member must give personal details about themselves, and these are exposed to the other network members or part of them, according to the users choice(Boyd and!Ellison, 2007). Some major networks, originally constructed as reservoirs for content to serve the surfers, see their purpose today in providing services, information and products adapted to members digital identities. In September 2007 the network Myspace informed its shareholders that it intended to undertake data mining, using the profiles and blogs of approximately one hundred million of its members, to direct advertisements and services to them. Thus, this is the start of a screening system that will provide services and information to members according to their digital identity!(Abramovitch, 2007). The declared objective is to improve the membership experience on the network, to add value to the user experience (almost a paraphrase of Aldous Huxley in Brave New World). Social networks create a substantial and dangerous expansion of the digital-identity notion to include complete mapping of surfers social and professional connections. This mapping will accompany the surfers in all human activities and may become a powerful filter that will limit the information and possibilities presented to them, without them being aware of it.

5.3 Socio-Genetics and Digital Identity

19 ! The mind and the body hang together, and science is constantly improving the knowledge about it. We know today that social behavior is linked to genetics. Understanding these connections, and how they work in a social context, is powerful for constructing digital identities and can be valuable for analyzing the body, mind and ecosystem surrounding them: society. So information about peoples genetic codes may be as rewarding for constructing digital identities as the information from social networks. Research and instrumentation for mapping mans genetic code, gene sequencing, are developing rapidly at leading research institutes and large commercial companies worldwide. Their main objective is to identify genes associated with hereditary diseases and to develop medication based on genetic treatment. Since the completion of the Human Genome Project in 2001, commercial competition has arisen between companies for producing machines that map the genetic code of man. The main research project in this field is the Personal Genome Project.

The connection between genes and human traits, and the entry of information-age giants such as Google and leading research centers such as Harvard and Cornell into the field of genetic research, should close the knowledge research gaps much faster. The large volume of participants in these studies, the vast databases holding participants digital identities and data mining peoples social behavior on the Internet, together with the use of smart algorithms is helping science to begin to predict social behavior, both pro-social and anti-social, according to the genetic mapping of humanity.

20 !

5.4 Behavioral Targeting AI Engine Based on Journalistic Content and Consumer Digital Identity

The behavioral-targeting AI engine above outlines the basic information-flow elements that will automatically analyze journalistic content in all platforms and transmit relevant content and advertisements to consumers per their digital Identities.

21 !

The model shows a dynamic learning model constantly updated as it learns the consumer profile and content preferences. Unknown factors are expressed by probabilities constantly updated in the learning process. Journalistic content will be monitored constantly as consumers interact and make choices. The AI engine will also monitor context parameters and consumers emotional state during interaction by analyzing verbal or other reactions. A brief description of the information flow: Step One: All journalistic content is analyzed by AI smart algorithms and receive automatic annotations (tags); Step Two: Consumers digital identities and annotated content are fed to the Assessment Rule Engine for initial content determination; proper ads are sent to consumers based on their profiles; Step Three: Consumers interact with the content and advertisements; this interactivity is monitored constantly and consumer attention measured; Step Four: The Learning Engine analyzes consumer feedback and automatically adjusts the probabilities to better describe consumer behavior; new content is sent to consumers; Step Five: The Learning Engine transmits updated information to a Personal Memory database where a consumer media profile is created and constantly updated; Steps four and five continue indefinitely to allow the AI engine to accurately predict consumer content and product interests/choices in varying contextsthe Learning Process section.

6. Digital Identities and Weblining

Filtering journalistic content vs. consumer profiles could lead to serious social inequality. Marcia Stepanek coined weblining to describe this phenomenon: Call it weblining, an information-age version of that nasty practice of red lining,

22 ! where lenders and other businesses mark neighborhoods off limits. Cyber space doesnt have geography but thats no impediment to weblining [] weblining may permanently close doors to you or your business."!(Sterpanek, 2000). New York University sociologist Marshall Blonsky adds to the meaning of weblining: If I am weblined and judged to be of minimum value, I will never have the product and services channeled to me or the economic opportunities that flow to others over the net." (ibid). Digital identity is at the core of weblining. Though the emphasis of Stepanek and Blonsky is on economic aspects of commercial organizations, the described phenomenon is also true in spreading journalistic content based on profiling. The economic forcesadvertisers and the journalism organizationscannot be expected to show altruism and create mechanisms to protect our right to equal accessibility to content. More seriously, no one can protect us from the effects of the need to target content per consumer profiles on the quality of journalistic content.

7. Digital Identities and the Practice of Journalism

From the point of view of journalism practice, the emergence of digital identities suggests that publishers and journalists will be able to simulate and measure what their news stories will do for audiences and the other stakeholders in their storytelling, while they are developing the story. They would be able to test run stories before publication, much as advertisers now do with new product tests. This will introduce interesting opportunities and challenges for journalism. Simple on-textual advertising need not threaten journalistic principles of separation between content production and selling audience attention to advertisers (who may have stakes in the stories). But as contextual advertising starts to understand content, context and audience better, ads will be placed precisely. Present pay-perclick business models will create an incentive for publishers to focus on stories that match ads. If so, it will threaten journalistic freedom. The classic separation of Church and State, the metaphor used by publishers to distinguish between news

23 ! stories and paid advertising, will blur. For example, consider a situation where readers use their digital identities, combined with a series of filters, to select news stories they want to be brought to their attention. Lets say the quality of filters and digital identities is good enough to estimate both the chance that a story will catch the reader attention and the chance it will lead to action by the reader. Now consider a set of contextual advertisers (these can also be digital identities) that will pay for attention and interaction with readers. Consider a journalist with access to these digital identities and filters, as well as access to the contextual advertisers, when writing a story. The journalist can test the story on digital identities representing both audience and advertisers as the story is written. The journalist can adjust the writing to receive the best results, a combination of what the journalist, the audience wants and the advertisers want. Consider, finally, that the journalists own digital identity will be included in the interaction, The journalists digital identity is combined with a set of filters for selecting themes that the journalist wishes to cover, connected to readers and advertisers digital identities, and exposed to a news ticker type flow of events, e.g. all the twitter feeds, the blogosphere and all the other news feeds on the Internet. It can be data flow from stock markets, sensors measuring weather or earthquakes etc. The journalist can then be tipped off about events that will produce suitable matching between his/her own interests, and the interests of the audience and advertisers. Thus producing a successful story is equal to solving a dynamic equation involving the journalist, the audience and the business model, e.g. the advertiser. Producing a journalistic story while guided by the interaction between the digital identities and the filters can be seen as an iterative, heuristic solution of the equation, identifying overlapping interests and optimizing the combined actions into a result maximizing value for each party. In each interaction, real-life users behind the digital identities give feedback, reinforcing or modifying digital identities and filters actions, to improve the outcome in the next round.

24 !

8. Principles of Journalism and Digital Identities

The interaction between digital identities, as discussed above, may improve the outcome for all parties involved. But it is a hazardous scenario. It needs to be discussed among the actors who care about journalism and its role in society. Looking at existing journalistic principles, at least the following can be strongly affected by the above scenario: Journalisms first loyalty is to the citizens: Journalists can be pressured to show loyalty to citizens digital identities rather than to the citizens themselves. If each story is coupled directly to the business model, and if the business model builds on selling audience attention/interaction to advertisers, this can be a problem. It will be difficult to maintain a loyalty to the audience of citizens if the journalist will earn more money by adapting to the [digital identities of the] advertisers. Its practitioners must maintain independence from those they cover: It may be possible to involve behavioral models of those covered in the stories in the equation. This will improve the journalists chances to plan a series of stories, knowing how the outcome of one story opens for the next. It will give journalists a tool for projecting the effects the story will have on stakeholders. Those covered in the story may also be advertisers or have strong, shared interests with advertisers. This makes the web of co-dependencies more visible to the journalist. In some cases this can help a journalist to be independent but in many other cases it will make it difficult to maintain independence. Its practitioners must be allowed to practice their personal conscience: If the business model and the system of digital identities and filters permits projecting how much profit a story can produce as it is written, or if it will offer predictions of how the story will influence stakeholders in the journalism organization, probability increases that the journalists personal consciences may conflict with businesses or other stakeholders interests. In short: if I write the story the way I want, my publisher will know that I chose to earn less money. Or: If I write the story the way I want, my publisher will know that I chose to increase the risk of us getting in conflict with the advertisers.

25 !

These are only quick, simple examples of types of issues that need to be considered while developing systems of digital identities and filters for journalism.

8.1 Principles for Using Digital Identities for Journalism

We suggest the need for principles for using digital identities in journalism. Some such may be: 1. Peoples needs are more important than the needs of digital identities. Digital Identities can never be identical to a persons whole being. Some measure of error should always be considered. People are more important than digital identities. Digital identities should adapt to people, not viceversa; 2. Using digital identities in journalism should not compromise journalisms loyalty to the audience or its independence from sources; 3. Using digital identities in journalism should not compromise the journalists freedom to practice his/her personal conscience.

8.2 Need for Further Discussion Between Stakeholders in Society

A group of computer scientists, AI researchers and roboticists met in Asilomar Conference Grounds on Monterey Bay in California to debate whether there should be limits on research that might lead to the loss of human control over computer-based systems that carry a growing share of societys workloadtheir concern is that further advances could create profound social disruptions and even have dangerous consequencesand force humans to learn to live with machines that increasingly copy human behaviors"!(Markoff, 2009). The scientists were concerned about job loss or criminals accessing these tools. No reference was made to the possible devastating effects that using AI tools may have on journalistic content. The conference was organized by the Association for the

26 ! Advancement of Artificial Intelligence (AAAI). Dr Horvitz of Microsoft, who organized the meeting, said he believed computer scientists must respond to the notions of superintelligent machines and artificial intelligence run amok the panel was seeking ways to guide research so that technology improved society rather than move it toward technological catastrophe" (ibid). It is time to organize a similar conference with computer scientists, AI experts, academic researchers in the area of multimedia information retrieval, journalism professionals and experts, social communication experts and economists who specialize in media business models, to explore the potential effects of AI algorithms on the journalism profession and its role in a democratic society. Some of the questions to be explored: 1. Will people control or be controlled by their digital identities? 2. How will the definition of journalism be influenced by digital identities? 3. With the Internet, journalism is no longer only broadcasting but also interacting with readerships and facilitating public discussions. What is the role of journalism in society? 4. How will journalistic principles be affected by interaction between digital identities? 5. Which business models are enabled by digital identities? To what extent will journalists be attention workers, paid by brokering the readership attention to advertisers; to what extent will they be knowledge workers, paid by brokering knowledge? 6. What are suitable principles for journalism, in a situation where interaction with and between digital identities guides the production of journalism, the ways it generates value for people, and the ways it creates profits for the journalism industry? 7. What is the match between journalism and journalistic business models? 8. How will journalistic principles and matching business models be updated? 9. How are journalistic principles, and the process for updating them, be implemented in an environment of digital identities?

27 !

Abramovitch, G. (2007). Myspace has data mining plans,, Sept. 24. Aho Williamson, D. (2005). White Paper on Behavioral Targeting, Wall Street Journal and

eMarketer, May 11
Barnard, K., Duygulu, P., Forsyth, D., et al. (2003). Matching Words and Pictures. Journal

of Machine Learning Research, 3, pp. 1107-1135.

Boyd, Dana, M and Ellison, Nicole B. (2007). Social network sites: definition, history and scholarship., school of information, uc Berkeley, and., Dep. Of telecommunications and information studies, Michigan state university. Journal of computer mediated

communications, 13(1), article 11, 2007.

Duygulu, P., Barnard, K., de Freitas, N., and Forsyth, D. (2002). Object Recognition as Machine Translation: Learning a Lexicon for Fixed Image Vocabulary, Seventh

European Conference on Computer Vision, pages 97-112.

Flickner, M., Sawhney, H., Niblack, W., et al. (1995). Query by Image and Video Content: The QBIC System, Computer, 28(9), pp. 23-32. Jeon, J. Laverenko, V. and Mammatha, R. (2007). Automatic Image Annotation and Retrieval Using Cross Media Relevance Models. Proceedings of the 26th annual

international ACM SIGIR conference on Research and development in information retrieval.

Kassner, M.(2009). Google Quitely Starts behavioral targeting, ZDNetAsia, April 21. Kilger, M. and Romer, E. (2007). Do Measures of Media Engagement Correlate with Product Purchase Likelihood?, Journal of Advertising Research, 47(3), pp. 313-325. Lemelshtrich Latar, N. (2004). Personal web social DNA and cybernetic decision making, Hubert burda center for innovative communications, BGU, Feb 2004, ICA conference,

New Orleans, 2004.

28 !

Lemelshtrich Latar, N. & Nordfors, D. (2009). "Digital Identities and Journalism Content", Innovation Journalism, 6(7), Nov. 11, Stanford
Markoff, J. (2009). Scientists Worry Machines May Outsmart Man,, July 26. McCombs, M.E., and Shaw, D.L. (1972). The Agenda-Setting Function of Mass Media.

Public Opinion Quarterly, 36, p.176-187.

Mills, E. (2007). AOL buys ads from Tocada ZDNetAsia, July 25 Nordfors, D. (2006). PR and the Innovation Communication System, Innovation

Journalism,3(5). , also

published by Strategic Innovators ( July - Sept 2007, Volume I | Issue 3) Nordfors, D. (2009). Innovation Journalism, Attention Work and the Innovation Economy. A Review of the Innovation Journalism Initiative 2003-2009, Innovation Journalism,

6(1),, Retrieved Sep 9 2009. Olsen, S. (2008). 33Across: The Next Generation of Behavioral ad Targeting,, June 23.

Salway, A. and Graham, M. (2003). Extracting Information about Emotions in Films,

Proceedings of the Eleventh ACM International Conference on Multimedia, November 2-8,

pp. 299-302 Smeulders, A.W.M., Worring, M., Santini, S., Gupta, A., and Jain, R. (2000). Content Based Image Retrieval at the End of the Early Years, IEEE Transactions on Pattern Analysis

and Machine Intelligence, 22(12), pp. 1349-1380.

Sterpanek, M. (2000). Weblining, businessweek on-line, April 3. Turing, Alan (October 1950), "Computing Machinery and Intelligence", Mind, LIX (236): 433460

29 ! Watson, F. (2009). Behavioral Targeting: Profiling or Projecting User Experience, Search

Engine Watch, Mar 13.

30 !

About the authors

Noam Lemelshtrich Latar is the Founding Dean of the Sammy Ofer School of Communications at IDC Herzliya (the first private academic institution in Israel), and serves since 2009 as the Chairperson of the Israel Communications Association, which groups all media researchers in the Israeli Universities and Colleges. Lemelshtrich Latar received a Ph.D. in communications from MIT in 1974 and MSc. in engineering systems at Stanford in 1971. He was among the founders of the Community Dialog Project at MIT, experimenting with interactive TV programs involving communities through electronic means. From 1975 to 2005 Lemelshtrich Latar pioneered the teaching and research of new media at the Hebrew and Tel Aviv Universities. From 1999 to 2005 he was involved in the Israeli high-tech industry as a venture-capital chairman, helping to establish several communications start ups in cognitive enhancement, data mining of consumer choices and home networking. In 2005 he joined IDC Herzliya Israel as founding Dean of a new school of communications, emphasizing new media. His current research interest is in digital identities and the effect of AI on journalism. David Nordfors is co-founding Executive Director of the Center for Innovation and Communication at Stanford University. He coined Innovation Journalism and Attention Work and started the first innovation journalism initiatives, in Sweden and at Stanford. He is a member of the World Economic Forum Global Agenda Council on the Future of Journalism. Nordfors is adjunct professor at IDC Herzliya and visiting professor at the Monterrey Institute of Technology and Higher Education (Tech Monterrey). Dr. Nordfors has a Ph.D. in molecular quantum physics from the Uppsala University, and did his postdoctoral research in theoretical chemistry at the University of Heidelberg. He was the initial Director of Research Funding of the Knowledge Foundation in Sweden (KK-stiftelsen). He was the first Science Editor of Datateknik, a Swedish IT magazine, from where he initiated and headed the first hearing about the Internet to be held by the Swedish Parliament.! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

The expression augmented intelligence is attributed to Engelbart, D.C. (Oct 1962). "Augmenting

Human Intellect: A Conceptual Framework", Summary Report AFOSR-3233, Stanford Research

Institute, Menlo Park, CA. Related concepts: 1) IA or Intelligence Amplification by Ashby, W.R. (1956), An Introduction to Cybernetics, Chapman and Hall, London, UK. Reprinted, Methuen and Company, London, UK, 1964. 2) Man-Computer-Symbiosis Licklider, J.C.R. (1960). "Man-

31 ! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Computer Symbiosis", IRE Transactions on Human Factors in Electronics, vol. HFE-1, 4-11.

Techmeme,, arranges tech journalism story links into a single page.

Techmeme works by scraping news websites and blogs, and then compiles a list of links to the most popular technology-related news of the day, which is continuously updated. The stories selected are all chosen by an automated process. (11 Jan 2010)

Nordfors, D. (2008). Separating Journalism and the Media, EJC Magazine, 4 Dec 2008, European

Journalism Centre


Wikipedia. (Aug 29 2009). "Global

Entertainment and Media Outlook: 20062010, a report issued by global accounting firm PricewaterhouseCoopers". Retrieved 2009-04-20.

"The World Bank: World Development Indicators database, 1 July 2009. Gross domestic product


Lemelshtrich Latar, N. & Nordfors, D. (2009). "Digital Identities and Journalism Content", The

Innovation Journalism Publication Series, 6(7), Nov. 11. VINNOVA-Stanford Research Center of
Innovation Journalism, Wallenberg Hall, Stanford University.

Compact Oxford English Dictionary, published on-line by Retrieved Sep 6 2009. PEJ and CCJ Principles of Journalism, published 1997. Available at

#""" Retrieved 6 Sep 2009.


Madsen, Paul, SAML2: The building blocks of federated identity, Jan 2005,