June  

24th  

New  way  for  online  content   moderation  of  user  generated  content    
Prof.  Guillermo  De  Haro     Deepak  Agarwal    

I E   B u s i n e s s   S c h o o l  

TABLE OF CONTENTS 1.  INTRODUCTION .......................................................................................................... 3   2.  BUSINESS  OPPORTUNITY ............................................................................................ 4   3.  MODERATION  OF  USER  GENERATED  CONTENT ........................................................... 6   3.1  USER  MODERATION  TECHNIQUES ......................................................................................6   3.2  IMPOSED  MODERATION  TECHNIQUES................................................................................6   4.  TECHNOLOGICAL  APPROACH ...................................................................................... 7   5.  CONCLUSIONS ............................................................................................................ 8   APPENDIX....................................................................................................................... 9  

IE  Business  School  

2  

1. INTRODUCTION The rise of social computing and online communities has ushered in a new era of content delivery, where information can be easily shared and accessed. A large number of applications have emerged that facilitate collective actions for content generation and knowledge sharing. Examples include blogs, online product reviews, wiki applications such as Wikipedia.com, and online forums such as slashdot.org. Due to the anonymity of Internet users, however, how to ensure information quality or induce quality content remains a challenge. Information sharing and user-generated content have become ubiquitous online phenomena. For example, Wikipedia, a free online encyclopedia, is dedicated to massive distributed collaboration by allowing visitors to add, remove, edit and change content. In online product reviews, like the ones on Amazon.com, any user can post reviews on any item, even if he or she has not bought it on Amazon. As these applications have gained popularity and importance, the quality of content has become a concern. In Wikipedia, readers may be provided with content that is misleading or even incorrect. Product reviews on Amazon can be manipulated by sellers or book publishers to boost their products. On Slashdot, commentators may post some biased or useless comments; e.g., advertisers from hardware companies may post biased comments to promote their products. People everywhere are getting together via the Internet in unprecedented ways. Millions create content, inform each other about global issues, and build new communications channels in a connected, always-on society. The rise of user-generated journalism is generally attributed to blogs. Users and readers want to be heard. They want to influence what they are reading. Unfortunately this revolution in user generated content means that not all information on the Internet is suitable for all of its users. While many companies maintain high standards of decency and refuse to allow any indecent material on their computer network, not all companies have the capabilities to do so. Internet malcontents have turned too many of the “group sites” into cyber-­‐graffiti walls, filled with offensive comments. Malcontent on the websites threaten to spoil the members’ experience on the

IE  Business  School  

3  

websites. To deal with this menace, this report introduces a new way to automate the user generated content moderation system. This system not only identifies the bad behavior of people but also identifies the good behavior. 2. BUSINESS OPPORTUNITY Publishers and blog owners are looking for ways to effectively monitor the content on their websites. Effective monitoring not only maintains the editorial standard of the publishing house but also positively engages users on the website. As the market for user-generated content set to rocket over the next few years, the important question for publishers and advertisers remains: how best to monetize the rapid growth and demand in UGC whilst ensuring a brand-safe environment? The importance of this market, in comparison to the traditional advertising market is clear. As budgets tighten due to global financial uncertainties, brand marketers are looking to maximize the ROI potential of social media ahead of other digital media channels. Rich media such as user-generated content that is uploaded and viewed from social media communities is going to be a major global trend for brands to adopt and commercialize, and for advertisers to take advantage of in terms of reaching new audiences. Major enterprise publishers will look to harness this potentially, highly lucrative revenue opportunity. Every major brand will create and manage its own digital community and every consumer will be able to share their voice. However, the potential of this demand will only be realized if advertisers can be assured that their brand is safe within the user generated content market.

IE  Business  School  

4  

Addi<onal  revenue   by  monitezing  the   UGC  

User  Generated   Content  

Content  modera<on   improves  the  end   user  experience  

Adver<sers  

User  Community  

Enhanced  user   experience  

Figure 1: Cycle when effective moderation improves the user experience and generates additional revenue from content monetizing. Indeed, for any user generated content to be of significant value to the advertising community and enable the advertising community to harness the rapid growth in usergenerated content, it is absolutely critical that the environment is moderated and therefore, brand safe. Moving forward, the owners/publishers of social media and social networking sites should be compelled to take responsibility for the content that is displayed on their sites, especially if effective monetization is a business aim. A lack of moderation may be acceptable amongst the 'not for profit' sector but as soon as sites look to monetize their inventory through advertising, the question of moral and corporate responsibility is very real. World market for user generated content predicted to grow rapidly in the coming years -rising from $200m at present to $2.46bn by 2012 (according to ABI Research), the necessity for publishers to offer advertisers the safeguard of a moderated, brand-safe method of harnessing this growth is clear. Magnified by the current economic concerns, the notion of ignoring this potentially critical revenue stream whilst competitors snatch the opportunity is unthinkable.

IE  Business  School  

5  

3. MODERATION OF USER GENERATED CONTENT Nearly every UGC project has built-in tools for moderating content that allow the owner, the client, or a trained team of outsourcers to remove offensive content, and perhaps even drive participation. The moderation technique can be classified into two categories. 1. User moderation – In this, publishers set up their owns editorial standards and communicate it to users through user agreement. These moderation techniques rely on users to post comments that respect the community values and follow the term and conditions of the publisher’s website. 2. Imposed moderation – In moderation technique publishers uses other tools such as filters, human moderation to effectively monitor and control the quality of usergenerated content.

3.1 USER MODERATION TECHNIQUES • Craft the guidelines – Publishing websites often have their own term and conditions that users have to accept before registering on the website. These term and conditions set the rules and regulation that users are bound to adhere. • Enlist the users - In social groups the majority of the group’s members are interested only in a positive experience. Given the opportunity, many community members are more than willing to lend a hand and help protect the safety and quality of a project.

3.2 IMPOSED MODERATION TECHNIQUES • Make moderation action visible - When moderation controls are completely hidden, an implicit invitation is given to online trolls to try to abuse the system, which in turn creates extra work for the moderators. If the community knows that inappropriate content will be removed quickly because they’ve seen clear signs of that very thing happening, there will be less reason to test the boundaries.

IE  Business  School  

6  

Automated filters - The first line of defense against malcontent is automating the moderation process through smart filters. Filters can help ensure that certain types of content never even appear to users of the site. Human moderation - In order to assure more complete brand protection, a human is going to have to verify most or even all of a site’s content. Some sites operate on a pre-moderated method (nothing goes live before specifically being approved), while others operate on a post-moderation method (content goes live immediately, but is ultimately reviewed by a moderator who may accept, reject or edit the content according to client guidelines).

Right now, most publishers are relying on a combination of keyword filters and human moderators to maintain their editorial standards. Unfortunately, there are problems with both of these approaches. Keyword filters are a notoriously poor defense, as user can beat them by simply replacing a letter with a symbol. Human moderators, on the other hand, are expensive and occasionally biased. Given this poor choice of options, many publishers choose to avoid UGC entirely. However, due to the need for publishers to maintain relevance in an increasingly competitive marketplace, this is no longer an option, as UGC is quickly becoming a necessity. 4. TECHNOLOGICAL APPROACH CoMo (content moderator) specializes in sentiment analysis, powered by machine learning methods (See Appendix). From major blog networks, to social networks, to community reviews sites, CoMo offers automated comment moderation, user profiling, and a series of reporting tools to enhance and streamline the entire community moderation process. CoMo monitors live comments on the online publications, flagging and storing the most abusive comments that made it through filtering systems. For a number of reasons, the task of identifying abusiveness is very challenging. The semantics of abusiveness are much more subtle and complex, and this is compounded by the fact that abusive users often obfuscate their comments in order to beat the standard keyword filters. In order to address these issues the machine learning algorithm break up the identification task into several sub-tasks. Abusiveness is classified in various sub-categories such as

IE  Business  School  

7  

"Discriminatory", "Inflammatory", "Violent Threats", etc that are easier for a classified. The same approach can be used to identify quality contributions as well with sub-categories such as "Congenial", "Insightful", and "Informative". CoMo has following technological features: • • • CoMo needs training according to client's editorial standards using historical data. CoMo is updated on a regular basis using feedback from the client's own users community. CoMo moderates content in real time and tends to eliminate backlogged and pending content on even the busiest sites. 5. CONCLUSIONS User-generated content has exploded in recent times. UGC has not only created challenges of maintaining the editorial standard for the publishers but also generated opportunity for the publishers by monetizing the UGC for advertisers. Successful monetization requires content moderation to protect not only the publisher’s brand but also effectively control the menace of malcontent. In fact, the brand-protection has always served as a challenge for the content moderation system. Finding users with a positive intent can help to reduce costs as companies may find that users with a positive reputation do not need to be moderated as intensively as those with no history or a less than stellar reputation. Users will get a seriously reduced rating if he or she abuses the rules of a website by uploading offensive or abusive material onto a community which could result in a user's online reputation being tarnished. Automated filter and human moderation are the ways to moderate the content but they suffer from several limitations. Automated system such as one discuss here will not only safeguard the online brand but also provide a more positive and safer online experience for consumers and improve online safety while rewarding responsible users. It not only helps in deterring cyber bullying and online abuse, but also enables companies identify users who take a positive, active role in their communities.

IE  Business  School  

8  

APPENDIX 1. Bayesian Filtering Bayesian filtering is a statistical technique of keywords filtering. It makes use of a naive Bayes classifier to identify unwanted content. Bayesian spam filtering has become a popular mechanism to distinguish between illegitimate from legitimate content. Particular words have high probabilities of occurring in malcontent than in legitimate content. For instance, most email users will frequently encounter the word "Sucks" in unwanted content, but will seldom see it in informative content. The filter doesn't know these probabilities in advance, and must first be trained so it can build them up. To train the filter, the user must manually indicate whether new user-generated content (UGC) is abusive or not. For all words in training UGC, the filter will adjust the probabilities that each word will appear in unwanted UGC or legitimate UGC in its database. For instance, Bayesian filters will typically have learned a very high spam probability for the words "Sucks" and "Black male", but a very low probability for words seen only in legitimate content, such as the names of friends and family members. After training, the word probabilities (also known as likelihood functions) are used to compute the probability that a UGC with a particular set of words belongs to either good or bad category. Each word in the UGC contributes to the probability of good or abusive content. This contribution is called the posterior probability and is computed using Bayes' theorem. Then, the email's spam probability is computed over all words in the UGC, and if the total exceeds a certain threshold (say 95%), then the filter will mark the UGC as a malcontent and deletes it. The initial training can usually be refined when wrong judgments from the software are identified (false positives or false negatives). That allows the software to dynamically adapt to the ever-evolving nature of abusive content.

IE  Business  School  

9  

Sign up to vote on this title
UsefulNot useful