Optional Quantitative Synopsis I Tube, You Tube, Everybody Tubes

Ian Beatty CS 82 9/24/09

This article is a summary of an in-depth effort to understand the rules that govern Usergenerated content (UGC) video sites, most notably YouTube. Specifically, the researchers looked into the distribution of popularity among videos and the popularity life-cycle of individual videos, with several objectives in mind. First, the researchers wanted to find reasons for the distribution of video views, with the goal of understanding the differences in popularity distributions between UGC videos and more conventional studio productions from Video on Demand (VoD) sites like Netflix. Secondly, the researchers wanted to use their data to help design more efficient methods of delivering bandwidthheavy video content more efficiently to end users. Finally, the researchers made initial efforts to analyze the prevalence of illegal and duplicate content, both of which are obstacles to the smooth running of UGC video sites. For me, this study confirmed many intuitive thoughts about YouTube, and also provided useful suggestions that will nevertheless be difficult to implement. To avoid the prohibitive waste of time caused by manually conducting this sort of wise study, the team or researchers used a data-harvesting program that crawled YouTube and Daum UCC, YouTube’s Korean analog. The program concentrated on videos in a small number of categories, and harvested information such as popularity, ratings, and length. By crawling over the same videos several times at medium intervals, the researchers we able to create a sort of time-lapse image of YouTube video popularity. In comparing UGC videos with VoD videos, several factors were considered. These included the faster rate of production, lower production value, and shorter length of UGC videos. Furthermore, UGC videos are more likely to be found through links or over the internet, while VoD videos are more often advertised or spread by more conventional means. Either way, videos which merit recommendation or

acclaim should tend to be seen many more times, and so the researchers posited that the distribution of video popularity should follow a rough power-law distribution, meaning that there should be a tiny number of videos with the most views and exponentially more for every further step down in popularity, leading to asymptotic behavior as videos with the lower numbers of views become overwhelmingly common. For VoD content, the researchers suggested that the long tail of the power-law graph should be somewhat truncated, due to the fact that high standards for production would eliminate many of the more niche videos that make up the tail. Because YouTube videos are so much easier to make, this truncation of the tail should not be present. But in contrast to their expectations, the researchers found that the truncation of the tail was even more prevalent for YouTube, meaning that the drop-off between popular videos and videos with negligible amounts of views was unexpectedly precipitous. Because of this, the researchers looked into the possibility of lessening this steep truncation, which would allow more niche videos to thrive and give advertisers more possibilities. Most to blame for this truncation was poor search technology, which returns the most popular videos first and shuts out other videos from deserved exposure. Similarly, suggestion engines also skew towards more popular videos. After solving these problems, the researchers believe that YouTube may be able to unlock more profit potential through these niche videos. Next, the researchers looked into more efficient ways to deliver this video content. Currently YouTube delivers videos from its servers, which results in near-prohibitive bandwidth costs. Two solutions were suggested, the first being local caching of the videos most likely to be viewed, which would be a combination of the all-time most popular videos and the most popular recent videos. Similarly, the implementation of peer-to-peer video hosting of videos recently viewed would also serve to greatly diminish YouTube’s bandwidth costs.

Finally, the researcher used their data for a study of the prevalence of aliased or illegal content on YouTube. They found that in general, most similar or identical videos did less well than the originally published video, although they served to skew down the popularity ranking of the topic as a whole. Any more substantive research into aliased videos was left for future work. As for illegal content, only 5% of videos removed from YouTube were removed because of copyright violation, suggesting that such violations are less prevalent than had been expected. Illegal or aliased videos are problems for YouTube, but much smaller problems than their inefficient content delivery and the truncated tail of the video popularity curve. This article raised several issues which I had never given much thought to before, especially the popularity of videos. If asked to draw a graph of video popularity, I would have probably given some sort of power-law line without knowing what it meant, and I certainly would not have expected the tail of the graph to be as truncated as the researchers found. For this if nothing else, it was a valuable article. I was less struck by their analysis of more efficient ways to deliver content, because they seem to ignore the difficulties of implementing these other methods, both of which would probably necessitate software installation by reluctant users. While these solutions make sense for YouTube, they must offer some corresponding bonus, such as better quality or exclusive videos, to users who download their product. Finally, the analysis of aliases and illegal videos surprised me, because I have encountered both types of videos before. I had given no thought to the effects of aliased videos on popularity rankings, nor to their prevalence. I took issue with the methods used to find the frequency of illegal videos, which offered them as a proportion of videos removed, but not as a proportion of all videos uploaded. More work is needed on this topic, but this team of researchers is off to a promising beginning.

Sign up to vote on this title
UsefulNot useful

Master Your Semester with Scribd & The New York Times

Special offer for students: Only $4.99/month.

Master Your Semester with a Special Offer from Scribd & The New York Times

Cancel anytime.