Content Analysis: How to Digg an election

By. Gabe McGuinness

Introduction:
The internet is the guiding force in the development of the new media. Recent technologies are beginning to revolutionize every aspect of the news cycle. As the 2008 presidential campaigns are ramping up it is clear that rich online media such as video, blogs, podcasts, social networks, and even massively multiplayer online role playing games (MMORPG’s) will have a dramatic impact on the election’s outcome. Given the new media’s recent ascension, and its ever-expanding reach, it is obvious that it its role in shaping the public’s political opinions will continue to grow. Traditional means of measuring the media’s impact on public opinion (circulation, demographics, polling etc.) have proven less reliable in the era of new media. Accordingly, new tools are beginning to emerge. Using Yahoo! Pipes™, an advanced RSS-feed manipulation tool, I will analyze content appearing on leading social news website Digg.com. Digg has over a million registered users, and is the leader among a new wave of social news websites who all relinquish (in varying degrees) editorial control of content to their user base. Users are permitted to submit stories from any source (new media or old), which are then voted on by other users to increase or decrease their visibility on the site (digging up, or burying), with the most popular stories eventually reaching the front page. I cannot think of a better way to observe the impact of

new media on the democratic election process, than analyzing only democratically selected stories. Because not all stories will be from new media sources, I will have the added advantage of being able to gauge the old media’s presence and relationship to new media within the social news construct.

The general lens through which I’ll be analyzing my data may be more contextual than most content analyses, yet I intend to focus specifically on how candidates fare within the greater context of social news. Early on in my research, I realized that given the non-traditional nature of my research, traditional means and methods would not always be applicable. Rather than coding stories all relating to a specific topic, my data collection required me to develop a coding system capable of analyzing data on a wide variety of subjects, and from a wide variety of sources. The fragmented nature of new media described by Doris A. Graber in her Mass Media and American Politics (7th edition) is echoed in the variety of stories produced by my research. In chapter eight, “Elections in the Internet Age” (p.218-245) of Grabber’s aforementioned text, the large majority of the chapter is devoted to cable, and broadcast television coverage of elections. Very little (about 23 pages) of the chapter on the “Internet Age” is actually devoted to the internet. I was surprised to find such a lack of information about new technology, specifically the internet in a book bearing a 2006 copyright date. I can’t in good conscious, however blame her for this oversight. Indeed, she disclaims her discussion of the internet’s impact on page 220:

“…Internet messages of all types are a new and constantly changing territory where much analysis remains to be done.”

Grabber does go into slightly more detail in chapter twelve where she specifically discusses the Internet as a trend in media policy (p.362-365). It is fascinating how quickly things she discusses change. “There is, as yet, no widely available solution to the problem of finding one’s way through the Internet’s lush jungles of information where search engines like Google and Yahoo provide guidance, but often perspectives skewed to business interests. Moreover, the stock of information that requires searching doubles every few months.” (p.363) Although Graber alludes to RSS-feed technology’s ability to automatically deliver relevant information in her discussion of Blogs (p.364), her 7th edition of Mass Media and American Politics (2006) bears no discussion of the phenomenon of Social News, or Web 2.0 services that are poised to, and have indeed begun to drastically alter the relationship between old and new media. The impact of YouTube.com alone can be seen in its $1.65 Billion acquisition by Google, and its subsequent legal battles, and strategic alliances with many mainstream media outlets and parent companies. Given that this previous research is, as grabber points out still in its infancy, I feel it has little to contribute in guiding my research. I feel that my base of knowledge, and personal interest in being familiar with most of the cutting edge developments in new media provide a good foundation for my research. Indeed I feel that my membership in the demographic traditionally

associated with early adoption of new technology (white males 18-24) provides me with unique perspective with which I can approach the impact of these new technologies on political candidacy. Knowing that regular Digg.com users are likely to be young males such as myself, I expect the types of stories made popular to reflect the more liberal stance normally attributed to our demographic.

Method/Definitions:
Due to the inherently technical nature of this analysis, some definitions are necessary. I’ll be referring often to “new” media, and “old” media. For the purposes of this analysis, I’ll define “old media” as newspapers, newsmagazines, terrestrial and satellite radio, broadcast and cable television, as well as their respective websites. I define “new” media as blogs (weblogs), vlogs (video blogs), podcasts, social networks (MySpace, Facebook), email, forum posts, RSS feeds, chat rooms, instant messages, SMS text messages, and online games (Second Life). As a note, many old media websites have begun to bridge into what I consider “new media” employing staff bloggers, and featuring user generated content and social elements. Such endeavors are affable but, due to the fact that they are still governed by old media ownership and management, I’ve still classified them as such. One term used already several times is RSS, which stands for Really Simple Syndication. It is a standardized set of technologies that has evolved

to allow the sharing and proliferation of content across different websites. Very simply, its feeds are streams of data listing titles and descriptions of data, that link to that particular document. Another term I’ll use throughout my discussion is “Web 2.0” described by Wikipedia.com as, “a perceived second-generation of Web-based services—such as social networking sites, wikis, communication tools, and folksonomies—that emphasize online collaboration and sharing among users.” The term has become a bit of a buzz word, but essentially refers to a recent, noticeable shift to service oriented web sites many of which act like programs that would traditionally run on a computer’s desktop, and almost all of which involve a social or collaborative element. In the spirit of innovation with witch much of the new online media presents itself, I’ve decided to use online tools and techniques to guide my research and data collection wherever possible. I’ve constructed a program using Yahoo!’s new Pipes™ customizable web based RSS-feed manipulation service that takes data from four different Digg.com categories, and filters the results for popularity, and for the names of the 2008 candidates. The result is a custom RSS-feed that displays only the most popular stories featuring one or more of the candidates in the title and/or the description of the story. The custom RSS-Feed draws popular stories from the main page of Digg.com, as well as from Digg categories specific to the 2008 election, politics, and political opinion. Only stories with more than 25 “diggs” will make it through the filter. Stories not bearing one or more of the candidates’

names in the title and/or description are filtered out. I use Google’s webbased “Reader” app to subscribe to the custom feed, which allows me to save all stories the feed produces for later analysis. For the purposes of this analysis, the media outlet I’ll be analyzing is Digg.com. It is important to note, however, that none of the stories analyzed are published by digg.com, and that it simply acts as a means of connecting users with interesting content elsewhere on the web. As a result, the news stories I’ve aggregated for analysis are from a variety of media outlets including blogs, YouTube, and old media websites. I began aggregating data on March 13th around 12:00p.m. I stopped aggregation April 16th, also around 12:00p.m. Initially I included in my customized feed stories from other social news sites such as Netscape.com, Reddit.com, and Newsvine.com. I let the feed continue to automatically aggregate data for over a month. Upon completion, I realized that I’d aggregated over 500 stories. Looking closer, I realized that the vast majority of stories had come from Newsvine.com. I realized that the feeds from Newsvine were not being restricted by the popularity filter I had in place, and the several hundred stories I received as a result were basically a direct feed of Associated Press stories that had passed my candidate’s name filter, the majority of which had not reached “popularity” by community voting. I decided to keep only stories obtained from Digg.com, because they consistently featured a high number of votes, ensuring a respectable sample

of people viewed them. In total, I received 41 qualified stories, of which I omitted 3 because the candidates they concerned have yet to announce their candidacy, leaving me 38 stories to code and analyze. The unit of analysis for this project is the paragraph. Overall, coding by paragraph was effective, with a few caveats. It was hard to code for transcripts of videos, and for interviews. I did not code actual videos that were embedded in the stories (or were the stories themselves) I chose to analyze their content only if it was transcribed within the article (as it was in several cases). In the event that there was no transcription, I coded the paragraphs detailing the video, or captions for it. When transcripts of interviews were involved, I treated the first paragraph of the question, and first paragraph of the answer as one coding unit. I decided to use the classic four content categories for coding political election content, with very little modification. They are, The Horse Race, The Campaign Trail, Personality Traits, and Policy Issues. As a rule, I decided to allow for contextual judgment to place difficult units into a proper category, the few units to which this applied were easy to place when taking into account the source, surrounding units, and overall thrust of the story. The rules for units coded as part of, “The Horse Race” required they dealt explicitly with issues of who’s ahead and who’s behind, polling numbers, and comparative economic data. An example from a story on WashingtonPost.com:

“Clinton, of New York, continues to lead Obama and other rivals in the Democratic contest, according to the latest Washington Post-ABC News poll. But her once-sizeable margin over the freshman senator from Illinois was sliced in half during the past month largely because of Obama’s growing support among black voters.”

The rules for units coded for, “The Campaign Trail” required they include problems, controversies, strategies, and political maneuvering associated with the running of the candidate’s political campaign. For example, when John McCain fumbled a question on H.I.V. Prevention in Africa: “What followed was a long series of awkward pauses, glances up to the ceiling and the image of one of Mr. McCain’s aides, standing off to the back, urgently motioning his press secretary to come to Mr. McCain’s side.” (thecaucus.blogs.nytimes.com)

The rules for units coded for, “Personality Traits” required that units expressed information regarding who a particular candidate is as a person, including their religious, moral, and ethical beliefs, as well as likes and dislikes. One such article, entitled, “Ten Things You Didn’t Know about Barak Obama” from the U.S. News and World Report website was quite popular, and included gems such as: “10. His heroes are Martin Luther King Jr., Mohandas Gandhi, Pablo Picasso, and John Coltrane.”

The rules regarding units Coded for, “Policy Issues” were that they be directly indicative of a candidate’s position on a particular issue or issues,

including their voting record, and their personal views and statements on a political issue. For instance, the SmallGovTimes.com piece on Ron Paul’s announcement of candidacy details his position on many issues: “Paul stands as one of the last remaining believers in strict enforcement of the Constitution and a limited federal government in Washington D.C. Paul ran unsuccessfully for the White House in 1988 under the Libertarian ticket, but now caucuses with the Republican Party. His political platform includes low taxes, individual liberties, and a principled belief in the right to life.”

I found the process of coding for content to be much easier than that of tone. When coding for content, it didn’t matter that the specific topics of the stories being coded were all different. When it came time to code for tone, I had some issues to resolve. The only thing that all of the stories had in common was that they featured at least one of the 2008 presidential candidates, and that they were popular on Digg between March 13th and April 16th. How could I code a unit that was equally positive for Barak Obama, and Negative for Hillary Clinton? Because not all stories were about a particular topic, or candidate, I had to develop a system of coding for tone that would effectively quantify each candidate’s treatment by the Digg community. To do so, I made coding rules that allowed for each unit (paragraph) to be coded positive, negative, or neutral for whichever candidate was mentioned in that paragraph. Rules for coding tone:

1. Each unit can have only one code per candidate mentioned. Positive,

negative, or neutral.
2. If no candidate is mentioned in a particular unit, the code goes to the

candidate or candidates mentioned in the Digg.com title or summary of the story.
3. Each story’s Digg.com title and summary are to be coded separately

from the actual story, with one code for the title and one for the description. If more than one candidate is mentioned in the title and summary, each candidate can be coded as per rule 1.
4. If a unit’s code is not easily discernable from the first 3 rules, a

judgment call based on contextual analysis may be permitted.

Results:
I was a bit surprised, and not a little overwhelmed by the sheer volume of data I was able to obtain for this analysis. Making sense of it all was no small task. The coding process proved both challenging initially, and invaluable ultimately. I anticipated a large variety of stories, from at least as large a variety of sources, and was not disappointed in that regard. I was pleased to see that the majority of stories were sourced from blogs in some fashion. Whether liberal, conservative, personal, or professional, the sheer volume of blogs vs. old media was encouraging a new media evangelist such as me. Nearly two thirds of the stories were sourced from new media outlets

in the form of blogs, many with videos embedded from YouTube. Informally, I did notice that on issues of fact, bloggers often linked, quoted, or referred to old media sources most likely to lend authenticity to their stories, and gain the reader’s trust. The content categories I chose proved to be an effective means of analyzing the data, and did not have a very hard time placing units into their categories, with a minimal number of units leaving me on the fence. I attribute this to simple coding rules that allowed me to make judgments based on context when necessary. I was a bit surprised by the relatively few units that coded for “Personality Traits” (5.14%). I expected that given the wide open republican field, and the general lack of knowledge about all but the most well established candidates (Clinton, McCain) that there would be more information regarding candidates as people.

The Largest category coded was “Policy Issues” (38.11%), followed closely by “Campaign Issues” (33.24%). The data suggests that Digg users are far

more interested in policy or campaign issues, than the personal lives of candidates. That is not to imply that Diggers are less interested in scandal, or intrigue than the rest of us. Many of the stories that dealt primarily with campaign issues were centered on problems/controversies, and gaffes made by candidates. For instance, several stories relating to Senator McCain’s “Baghdad Stoll” and his comments on the safety of some Baghdad neighborhoods were among the top coded for “Campaign Issues” because they were framed as a campaign blunder on the part of McCain. Articles concerning the, “Horse Race” made up nearly a quarter of coded units (23.51%) and were for the most part from old media outlets. I attribute this to the fact that old media conducts sponsored polls, and has the first access to polling data. I found the coding of each unit for its tone to be more difficult than it was for content. There are obviously fewer coding options, and it is a more subjective classification. Compounding the difficulties was the fact that my coding rules for tone allowed me to code each unit multiple times, should the unit involve multiple candidates. While there were not a lot of these paragraphs, there were enough to necessitate a rule. One article I found particularly difficult to code was from The Washington Post’s website, the Digg headline read. “New Poll: Blacks Shift To Barack Obama, McCain Falls.” The article’s content was coded primarily for “Horse Race” yet most units featured both positive and negative statements for different candidates. The unit below was coded positive for Giuliani, and negative for McCain:

“In the Republican race, former New York mayor Rudolph W. Giuliani, who recently made clear his intentions to seek the presidency, has expanded his lead over Sen. John McCain of Arizona. Giuliani holds a 2 to 1 advantage over McCain among Republicans, according to the poll, more than tripling his margin of a month ago.”

This type of code was by far the exception, rather than the rule. Most units were easy to code, with a large majority of them (62.38%) falling into neutral territory, as was expected. Several big stories made the cycle of news, and had a large impact on the balance of positive vs. negative codes. These stories got a lot of coverage, and were promoted on Digg multiple times in different capacities; one was McCain’s aforementioned Baghdad issue, bearing a decidedly negative tone and frame, Giuliani’s uneasy relationship with the firefighter’s union, also negative, and bigoted Comments Made by Ann Coulter about John Edwards.

It was interesting to note, that across all the articles relating to Ann Coulter’s bigoted remarks about John Edwards, there was a much more neutral net effect than the other major issues that played negatively. I attribute this to the fact that most of the stories reporting the comments were links to blogs that provided little or no commentary on the video clip they were displaying, and that republican candidates were quick to distance themselves from Coulter, and condemn the remarks. One of the stories ended up being coded negatively for John Edwards, because the negative remarks were transcribed multiple times. The “Coulter Case” if you will goes to show that despite an apparent liberal bias in Digg’s readership, a well handled controversy can end up being a positive. Still, negative units outweighed the positive ones 24.27% to 13.35%. Coding for positive/negative/neutral tone by candidate allowed me to assess the popularity of individual candidates, and political parties within the Digg community. It became obvious upon doing so that my initial suspicion of a liberal bias was confirmed, and then some. I was actually surprised by how liberal the coding results indicated the stories were. While most stories were not excessively liberal or conservatively biased (save for a few), in total, the division and preferences are clear.

As the above figure illustrates, there were nearly twice as many positive codes (18.03%) for democrats as there were negative (9.87%), with the majority (72.1%) being neutral. While the positive/negative ratio is not entirely surprising, the sheer volume of neutral codes was interesting. As it happens, many of the neutral democratic codes deal with “Horse Race” type issues, which were not positive or negative for any candidate. The republican coded units were just as surprising as those of their democratic counterparts.

43.02% of all units coded republican were coded negative as well. Even more shocking is that only 7.26% were coded positive, about half were neutral. It seems that not only do Diggers tend to promote positive stories about democrats; they also seem to actively promote negative material on republicans. It is clear that if an unbiased look at politics is your goal, Digg should not be the first place you turn. Having the benefit of automatic story selection for this analysis afforded me to analyze other sets of data, related to the coding. To further illustrate this division between Liberal and Conservative issues on Digg, I tabulated the total number of diggs (votes) each of the positively, negatively, and neutrally coded stories from both parties received. As you can see, negative stories about republicans received about 1300 more diggs than positive stories about democrats. It is also interesting to note that the stories that were expressly positive for democrats received more diggs than

the negative and neutral stories combined, despite the fact that they coded much more neutral than positive or negative. The same effect is seen on the republican side of the coin.

While it is unlikely that these figures would change significantly over time, it should be noted that they represent a snapshot in time of the digg count. Because once a story’s popularity has peaked, it can still receive diggs, the more recent stories I’ve analyzed may not yet have reached their full potential.

Perhaps the most useful matrix of analysis for the data collected, is to look at how each of the candidates fared at the whim of Digg’s masses. The most popular candidate was Barack Obama (31.25% positive) although his negative coding (a respectable 5.21%) was higher than two other candidates, Bill Richardson, and Ron Paul, who both had no negative codes primarily because they were each only mentioned a couple of times. I guess there is something to be said for the strong, silent types…

While Hillary narrowly edged out Paul as the third most positively coded candidate (behind Richardson) her negative ratings were the highest of all the democrats. I attribute this to the fact that many of the units mentioning her were in a negative comparison to Obama.

Rounding out the democratic field of candidates are John Edwards, and Dennis Kucinich, who both, along with Ron Paul enjoyed an outrageously large neutral percentage. This is indicative of a lack of both stories, and strong feelings on these candidates.

Interestingly enough, Edwards is the candidate who has put forth the greatest effort to embrace new media, announcing his candidacy on the video blog Rocketboom, and then on YouTube the following morning. He was also the candidate who seemed to be part of the most bizarre stories; including the vandalism of his virtual campaign headquarters in the massively multiplayer online role-playing game Second Life, the evacuation of his actual campaign headquarters upon receipt of an envelope full of white powder, and the below the belt blow by conservative pundit Ann Coulter calling him a, “faggot.” This hodgepodge of stories earned him a nearly 2:1 positive-negative ratio. The republican field of candidates looks about how you would expect. The candidate with the highest positive rating is also the one least likely to earn the nomination. Mitt Romney fared almost as positive as Bill Richardson, yet with a 46.15% negative code rating. I can attribute this result to a lack of stories mentioning Romney as well.

The remaining two Republican candidates only truly seemed to be competitive in which could have the most negative stories about them promoted on Digg. Ultimately I’ll have to crown Giuliani the winner in that respect, as it seems reports couldn’t help but mention repeatedly how his liberal views are hurting him, and how big a hurdle his ties to 9/11 are for his campaign. He came in with a whopping 58.82% negative, and only 7.35% positive thus earning him the title of most polarizing candidate, with only 38.46% of codes neutral.

Despite Giuliani’s strong showing at sucking in the eyes of diggers, John McCain was able to save face, with the lowest positive rating among major candidates (3.39%) Kucinich barely edged him out for the absolute title by 0.16%. McCain’s various blunders have already been mentioned, and

contributed primarily to the fact that he received only 2 units coded positive for him.

Summary & Discussion
While my initial suspicions on the liberal slant of Digg users was confirmed, I was surprised at how one sided the results proved to be. Perhaps over a longer time period, or different news cycles the results would have been slightly more balanced. It does seem informally that republicans have been getting a lot of negative press lately, and perhaps that is echoed and amplified in my findings. While the data I’ve gathered provides a good exploration of how social news can be used as a means of information gathering, and analysis much more work remains to be done. The readership of Digg is growing at a strong rate, having recently passed the one million user mark. While early adopters have proven to be younger and more liberal, it will be interesting to see how those demographics shift as the idea catches on. Already we’ve begun to see spin offs of digg, that while not quite ready to be included in this report, are capable of providing the same sort of user-driven editorial control as digg, to perhaps different demographics. It will be interesting to watch the progress of social news as a medium in the new media. While the methods I’ve used for this analysis have yet to be proven against the test of time, and repetition. I hope I’ve provided a base for continuing research to build upon when studying new media. I encourage

those that will follow to embrace the newest tools to study the newest mediums, and further their progression. With an investment in understanding of all Web 2.0 media has to offer, political parties, as well as corporations, and organizations will better be able to manage their image in the eyes of the cutting edge public, a valued demographic.

Bibliography
1. Grabber, Doris A. Mass Media & American Politics. 7th ed. Washington, DC: CQ P, 2006.

2. "RSS." Wikipedia, The Free Encyclopedia. 18 Apr 2007, 05:23 UTC. Wikimedia Foundation, Inc. 19 Apr 2007 <http://en.wikipedia.org/w/index.php?title=RSS&oldid=123736511>.

3. "Web 2.0." Wikipedia, The Free Encyclopedia. 18 Apr 2007, 16:31 UTC. Wikimedia Foundation, Inc. 19 Apr 2007 <http://en.wikipedia.org/w/index.php?title=Web_2.0&oldid=123842365>.

Stories Coded: Candidate : Positive Units Negative Units Neutral Units % positive %negative %neutral

38 Richards on 3 0 16 15.79% 0.00% 84.21% Kucini ch 1 3 27 3.23% 9.68% 87.10 % Edwar ds 2 1 28 6.45% 3.23% 90.32 % O bama 30 5 61 31.25 % 5.21 % 63.54 % Clinto n 6 14 36 10.71 % 25.00 % 64.29 % Giulia ni 5 40 23 7.35 % 58.82 % 33.82 % McCai n 2 31 26 3.39 % 52.54 % 44.07 % Romn ey 2 6 5 15.38 % 46.15 % 38.46 % Total s 55 100 257 13.35 % 24.27 % 62.38 %

Paul 4 0 35 10.26 % 0.00 % 89.74 %

Coding Tables

Content Categories Horse Race Campaign Issues Personal Traits Policy Issues

Units 87 123 19 141

Percent age 23.51% 33.24% 5.14% 38.11%

Old Media Stories New Media

# 14 24

% 36.8% 63.2%

Stories

Units Tone: By party Democrats Positive Units Negative Units Neutral Units Total Units 42 23 168 233 Republicans 13 77 89 179 Democrats % Republicans % 18.03% 7.26% 9.87% 43.02% 72.10% 49.72%

Number of Diggs By Tone/Party:

Story type/party Number of diggs: Number of stories: Avg diggs/story Democratic Stories Republican Stories

Positive/ D 11802 12 983.5

Negative /D 5088 4 1272

Neutral/ D 4826 5 965.2

Positive/ R 2453 2 1226.5

Negative /R 13182 12 1098.5

Neutral/ R 2618 3 872.666 67

21 17