Twitter Archiving Using Twapper Keeper: Technical And Policy Challenges

by Brian Kelly (UKOLN), Martin Hawksey (JISC RSC Scotland N&E), John O¶Brien (Twapper Keeper), Marieke Guy (UKOLN) and Matthew Rowe (University of Sheffield). iPres 2010, Vienna, 19±24th September 2010

About this Poster
Twitter is widely used in a range of different contexts from social communication to supporting teaching, learning and research. The growth in use has led to recognition of the need to ensure that tweets can be accessed and reused by a variety of 3rd party applications. This poster looks at development work to the Twapper Keeper Twitter archiving service to support use of Twitter in education and research.

Requests, Approaches and Challenges
The project adopted an open approach to development to gain buy-in from the user community and promote its use in other web services. As well as approaching individual users of the service to solicit their views, semi-structured question were published blogs and a open comment system was used to collect responses. Suggested developments included: the ability to group collections of archives, delete tweets from the Twapper Keeper's archives and opt-out of being archived, provision of APIs to the Twapper Keeper service and access to archives provided in multiple formats (e.g. RSS, Atom and JSON). The main challenges were technical issues (due to the evolving of the Twitter API and ecosystem), policy issues (e.g. open sourcing components and open content for documentation through use of Creative Commons licences for the project blog, technical documentation, FAQs, etc; and service level issues regard to deletion of tweets and ownership, ) and sustainability issues (the need for more disk space and requiring migration to a more stable platform).

Need for a Twitter Archiving Service
The Twitter search API only provides access to recently posted tweets which has led to the development of a number of Twitter archiving services. The JISC (who fund innovative use of digital technologies in UK Further and Higher Education (FHE)) decided that rather than commissioning development of a new service a more cost-effective approach for would be to support development of an existing service to ensure that needs of the UK's FHE sector were addressed. The Twapper Keeper service was selected which creates an archive based on hashtag, keyword or person.

Captioning Videos using Twitter
The increasing use of Twitter to support events has resulted in the development of Twitter captioning services (iTitle). Tweets posted during a live event are extracted from Twapper Keeper and converted into a compatible caption file format which then can be replayed with audio or video clips. This allows users to replay conference sessions augmented with the original backchannel communication. At UKOLN¶s Institutional Web Management Workshop (IWMW) the footage of plenary talks was combined with the #iwmw10 event archive using iTitle. This example illustrates the benefits that can be gained by providing APIs to a service which can then be exploited by other applications.

Summarizr: Use of Twapper Keeper APIs
The Summarizr service was developed independently of Twapper Keeper at Eduserv. It makes use of Twapper Keeper APIs and provides summaries and graphs of Twitter usage based on the data. This service vindicates the decision to encourage the take-up of the APIs by others. As well as providing statistics on the total tweets and users for a hashtag the service also displays graphs showing the top Twitterers, @reply recipients, related hashtags and URLs tweeted.

Twapper Keeper Statistics
As of July 2010 the Twapper Keeper archive contains 1,243 user archives, 1,263 keyword archives and 7,683 hashtag archives. There are a total of 321,351,085 tweets stored. The average number of tweets ingested per second is from 50 to 3,000 per minute. Since Twitter itself processes about 65 million tweets per day the Twapper Keeper service is currently processing about 6-7% of the total public traffic. Summaries of geo-located Twitter data is also available.

