6.The idea of applying Bayesian algorithms to feed filtering is not new: a Perl implementation wasdescribed in 2004
, there exist commercial web start-ups implementing the idea, e.g. FilteredRSS,Feedscrub and Feedzero
[though the first of these appears to be defunct, and the second is in invite-only beta testing and the website for the third occasionally doesn't work]. There are also a handful of open source projects implementing Bayesian filtering for RSS feeds, e.g. AmphetaRate, Feedisto, andsux0r
.The commercial offerings don't seem to fit well with the JISC information environment: we wouldlike to see a service that from the outset can be used remotely via an API, whereas they will want to drivetraffic to their site; we would want to allow users to access the data used to filter their feeds (i.e. theinformation on which terms characterised the items they were interested in) so that the same informationcould potentially be used by other feed filters, whereas they are likely to guard this information as part of their commercial interest. For these reasons we shall use one of the open source implementations,probably sux0r since an initial evaluation indicates that it provides the functionality we would need, isunder active development, and is written in a language our developer understands.
2Workplan
2.1Aims
7.1, To test the potential of Bayesian filtering of RSS and ATOM feeds for providing a personalised alertingservice; and2, should the filtering be shown to work, to raise awareness of the potential of this approach among theJISC community (developers, service managers, policy makers).
2.2 Work package 1: Technical development
8.
Objective,
to develop a demonstrator service that can be used by an individual to aggregate selectedRSS and ATOM feeds and which, when provided with sufficient information concerning the user'sinterests, will use a naïve Bayesian filtering algorithm to indicate which new items from the feeds beingaggregated are likely to be of interest to the user.9.
Deliverable.
open source software and a demonstrator service for aggregating and filtering feeds, withan open API, and the ability for users to import and export information about the feeds being aggregated(i.e. OPML files) and the information infered about their interests (i.e. the information used for the filtering,perhaps as an APML file). This software and service will be available to any user who wishes to try it.10.
Details.
The demonstrator service will be built, as far as is practicable, out of existing open sourcesoftware modules, for example the Bayesian filtering routine used by sux0r, and the RSS aggregator andthe user interfaces from sux0r and ticTOCs. All software will be developed as open source software, i.e.using open source applications such as Apache, mySQL, PHP, with code hosted on SourceForge or Google Code, and available through an open source licence. The API is intended to allow users tointeract remotely with the filtering mechanism, i.e. by indicating which items are and are not relevant totheir interests. A typical use for the API would be a widget to display those items that the systemsuggested as of interest on a site such as iGoogle or Netvibes, and through this widget to be able toindicate any items which actually weren't of interest. Santiago Chumbe will be responsible for executingthis workpackage.
2.3Work package 2: Trialling
11.
Objective.
To test the ability of the recommender service to identify new journal papers of interest toresearchers based on a knowledge of the papers which they have recently read.12.
Deliverable.
Documented trials of the recommender service with a group of researchers.13.
Details.
We will guide a group of approximately 20 researchers through the use the system, training theBayesian filter with information about their interests. RSS feeds for the tables of contents of journals
4
See Simon Cozens (2004) "Bayesian Analysis for RSS Reading",
Doctor Dobb's
URL: http://www.ddj.com/web-development/184416095
5
http://www.filteredrss.com/ , http://www.feedscrub.com/ and http://www.feedzero.com
6
http://sourceforge.net/projects/amphetarate/ http://feedisto.berlios.de/ and http://sourceforge.net/projects/sux0r/
3
Leave a Comment