You are on page 1of 7

ePaper - the Personalized Mobile Newspaper

Bracha Shapira, Peretz Shoval, Joachim Meyer, Noam Tractinsky, Dudu Mimran
Deutsche Telekom Laboratories at Ben-Gurion University
P.O.B. 653, Beer-Sheva, Israel
bshapira@bgu.ac.il, shoval@bgu.ac.il, Joachim@bgu.ac.il, noamt@bgu.ac.il,
dudu@strategicboard.com

ABSTRACT in direct sunlight and at a nearly 180-degree angle.


This paper provides an overview of the ePaper project. The
project aims to provide an end-to-end solution for the future The ePaper project, performed at the Deutsche Telekom
mobile personalized newspaper. The ePaper aggregates Laboratories at Ben-Gurion University, aims at providing
content (i.e., news items) from various news providers, and an end–to-end solution for the future newspaper, targeting
delivers personalized newspapers on dedicated mobile, the above mentioned electronic paper devices. The ePaper
electronic newspaper-like, devices. The ePaper can provide is not meant to be another application on a PDA or mobile
to each subscribed user a personalized newspaper, phone. Rather, it is projected as a substitute to the
according to the user's preferences, as well as a "standard newspaper that is run on a medium-format portable digital
edition" of a selected newspaper. The layout of the device, with several notable advantages over a paper
newspaper is adapted to the device's specifications and the newspaper, such as the possibility to provide up-to-date
user's preferences. The ePaper is expected to change the information, aggregate news items from many news
reading experience of newspapers and magazines, coupling providers, easy browsing and navigation, and a
innovative paper-like display with novel personalization personalized edition that best fits each user's preferences.
algorithms, intuitive interface and new adaptation methods The ePaper system is a client-server application that
of content to device. provides an end-to-end solution for newspaper reading. On
the server side, the ePaper includes aggregation and
Author Keywords classification of news from multiple sources, flexible
Electronic newspaper, collaborative filtering, ontology- delivering services, personalization and content adaptation.
content-based filtering, personalization. On the client side, readers enjoy intuitive interface enabling
easy navigation and browsing and advanced content
ACM Classification Keywords adaptation capabilities, enabling the reader to switch and
H5.2. Information interfaces and presentation, H.3.3. configure layouts.
Information Storage and Retrieval; information filtering,
H.3.1. Content Analysis and Indexing. The rest of this paper is structured as follows: Section 2
describes some related projects; Section 3 presents the
1. INTRODUCTION ePaper general architecture; Section 4 elaborates on the
The publishing world is undergoing a digital revolution. novel personalization and content adaptation algorithms.
After decades in the laboratory, electronic paper Section 5 concludes with the status of the project and future
technologies seem posed for commercialization starting as issues.
early as 2006. Electronic paper based on e-ink technology
offers a visual impression close to print on paper, being 2. RELATED WORK
very thin; readable, and consuming power only when The digital revolution of the publishing industry and
updating the screen. The result is a reading experience that specifically the digitization of newspapers have gained lot
is similar to paper - high contrast, high resolution, viewable of research attention and is the subject of many recent
studies and projects. We briefly mention here some of the
main recent projects.
The Electronic Newspaper Initiative [1, 2] is aimed at
producing an advanced multimedia news electronic
newspaper. It provides personalization of news, and
allowing interactive features like multimedia news on
demand. One of the main goals of ELIN was to develop an
authoring tool for journalists and editors of news that are
using the MPEG-4 and MPEG-7 standards. The scope of
the ePaper system does not include authoring tools.
However, ePaper is flexible in terms of the news items x Content reuse: finding information needed for re-
format. The ePaper design consists of interpreters that publishing of content, e.g. in a second publishing
might be added for every known format that a publisher channel.
wishes to use without forcing them to adopt a specific
standard (like the MPEG framework). ELIN is aimed x Information augmentation: finding new information that
mainly for users at home, who use PC-based devices. In the can serve as background for the news to be published.
ELIN context mobile devices are mainly used to send news x Story chain management: managing the relations between
related SMS and MMS. The ePaper is targeted to dedicated different stories in the news.
e-ink mobile devices, considering their limited capabilities.
CoMet deals with matching news-items metadata and user-
Personalization of contents in the ELIN project is based profile metadata, CoMet did not concentrate on newspaper
mainly on rather basic, collaborative, memory-based delivery to mobile devices in general, and to e-paper
algorithms that are known to have scalability disadvantages. devices in particular, though it mentions that mobile
They consider the enhancement and improvement of the devices delivery may benefit from its results. Contrarily,
filtering and personalization algorithms as an issue for their our ePaper Project concentrates, as far as output devices are
future developments. The ePaper personalization engine concerned, on delivery to mobile devices, especially e-
integrates ontology-driven content-based and novel paper-like devices.
collaborative filtering algorithms to provide high quality
personalization of content. 3. GENERAL ARCHITECTURE OF EPAPER SYSTEM
MINDS project (Mobile Information and News Data Figure 1 presents an overview of the ePaper architecture.
Services) for 3G (http://www.minds-project.net), is aimed
at optimizing processes in the value chains of mobile Content Providers
services. The group developed innovative mobile media
services and defined European metadata standards for news.
MINDS project concentrates on promoting the mobile
S ystem M anagem ent T ools
ePaper S ystem - Server

channel for news delivery, including business issues, Aggregator


metadata issues, alerting issues and technological issues.
Our ePaper project aims to deliver a whole newspaper
Content Manager ePaper Client
product/service, rather than specific stand-alone news
services or alerts about these services.
DigiNews [5] is a European research and development Personalization
project in the electronic media domain. It aims at finding
new ways of distributing and consuming the future
Content Delivery Services
electronic newspaper. More specifically, it aims at
combining useful features of printed newspapers, such as
simplicity of use, high accessibility and high mobility, with Figure 1. ePaper architecture
important features of electronic media, such as the ability to
The ePaper system was implemented based on client-server
update news continually and the options of multimedia
architecture. The server side consists of four layers. The
news. The possibilities of personalizing the delivered
first, Content Layer, including the Aggregator and the
newspaper's contents are also examined. The DigiNews
Content Manager. The aggregator interacts with content
project deals with personalization too, but with regard to
(news) providers and collects news item to a local storage.
adaptation of the user interface to users' preferences, while
The content manager processes the content received and
content personalization is done only as part of the
prepares it for delivery to users. The system maintains
augmented uses, and therefore only on a limited scale. Also,
hierarchical news ontology based on the NewsML subject
there doesn't appear to be any information in DigiNews
codes defined by IPTC (www.iptc.org). The content
public articles about the existence of any user's search
manager classifies each news item to relevant ontology
engine. Our ePaper project, in contrast, pays much attention
concepts The Personalization layer consists of a novel
to users' ability to browse and search for relevant news.
personalization engine which prepares ranked lists of news
The CoMet project [11] was carried as a successor of items to be delivered to users. The personalization engine
SmartPush which built a personated delivery system for combines an ontology-driven content-based filtering
economic news items. Four kinds of services have been algorithm with a time-aware collaborative filtering
defined in CoMet: algorithm. The Content Delivery Services layer orchestrates
the processes of the system. It interacts with the
x Personalizing: filtering incoming information according Personalization layer, submits requests for personalized
to users' content-based profiles. news and sends the ranked news items it receives to the
client. It also receives feedback from the client (tracking
user's behavior data) and sends this data to the from the Aggregator and sends them to the Interpreter
Personalization layer, which updates the user's profile to Manager. The Content Manager also receives interpreted
reflect the recent user's reading preferences. The System and classified data back from the Classifier and sends it to
Management Tools layer provides standard system tools the other functional units. After the content item passes all
such as logging and reporting, as well as special tools for the functional services, it is ready to be sent to the
the ePaper application, including the Ontology Editor that repository and used by the Personalization layer.
enables maintenance of the ontology. Also included is a
3.1.2. Interpreter Manager
registration system where a user can register and define to The ePaper system is able to handle news items coming
the ePaper services he will subscribe. The user provides from multiple sources and in multiple formats. The
information about the content providers from whom to Interpreter Manager is responsible for identifying and
receive content, demographic and billing information. The activating the appropriate interpreter for each content item,
user can also opt to define explicitly his areas of interest, i.e., the interpreter that is able to "understand" the specific
choosing concepts from the news ontology. The Client sub- content item's format and extract the relevant fields. The
system interacts with the content delivery for receiving input for the Interpreter Manager is a content item received
data. The data includes profile information on the profiles from the Content Manager. The result of the Interpreter
registered to the device, and a ranked list of news items that Manager's activity is the execution of the proper interpreter.
suits the user preferences. The client is in charge of
rendering the content and adapting it to preferred layout and 3.1.3. NewsML Interpreter
presenting the content to the user. To manage the variety The NewsML Interpreter is an example to an interpreter
and constraints of different mobile devices, the system implemented in the ePaper system. It receives content items
supports dynamic content adaptation mechanisms based on in NewsML format, extracts relevant meta-data fields (e.g.
the device that the user owns, the user's preferences and the the newspaper, language, authors, etc.), and passes back the
local customizations made by each user. Thus, the parsed content to the Classifier.
presentation of content functionality is loosely coupled with The ePaper may handle any other standard formats by
the content preparation process, a capability that may scale developing dedicated interpreters to each standard.
the number and variety of devices supporting this service
easily. In the following section we provide more detail 3.1.4 Classifier
about the main components, namely the content The ePaper system uses an ontology, which is a small and
management, the client system, and the content delivery. limited hierarchy of the NewsCodes concepts. The
The personalization layer is detailed in Section 4. Classifier component is responsible for determining the
ontology concepts which will represent each news item, i.e.,
3.1 Content Management to define the content-based profile of each item. For this
Figure 2 presents the Content Management layer of the purpose, the Classifier component utilizes a hierarchical
ePaper system. It receives pre-processed content from the multi-label classification algorithm.
aggregator and stores processed content to the repository The hierarchical multi-label classification algorithm
layer. The major responsibility of the Content Management implemented in ePaper uses flat multi-class classification
layer is to classify (map) each news item to ontology provided by LingPipe open source software
[http://www.alias-i.com/lingpipe/index.html]. LingPipe
Content classification method is based on statistical language
Manager
modeling techniques and uses Bayesian decision theory.
Interpreter
Manager
We apply top-down level-based approach for hierarchical
repository
classification. According to this approach, separate
classification models are constructed at each level of the
Interpreter 2
Similarity to
NewsML
Similarity
items in other category tree. There is a separate model for classification
Computational
Interpreter
Component
sources for each concept of the ontology at every level of the
Temporal hierarchy. Hence, the number of generated models is
Media similarity
Manager Identifier identical to the number of concepts in the hierarchy.
The classification process is performed in top-down, level-
Classifier Ontology
manager based approach. First, the content is classified into one or
News Editor more high level categories. Then, it is further classified into
one or more child concepts of the categories assigned at the
concepts. previous stage. Then, if one or more of second sub level
concepts were assigned to the content, it is further classified
Figure 2. Content Management layer into their child concepts. The classification process stops
3.1.1 Content Manager when classification to the detailed concept is not confident
The Content Manager is orchestrating the content enough. The confidence thresholds are defined by
management layer processes. It receives raw content items configuration parameters defined by empirical runs.
Once the results of the classifications at each level are categories, or retrieves requested items without
obtained, the final classification is determined according to personalization.
the received concepts’ weights and configuration
parameters. The most specific concept is assigned if its Client System
score is above the pre-defined multi-label threshold
Application (Layout/API) Upgrade
parameter; else, the concept with the highest score is Push Based Content Delivery
Server
Breaking News / Alerts Server Server
assigned to the content.
3.1.5 Similarity Computation
This component computes the similarity between a new XML Based Newspaper Application
incoming content item and other "active" items existing in
the repository. If the new item is deemed "very similar" to Newspaper Runtime API
an existing item, two different situations are distinguished:
a) that the new item is very similar to an existing item that
Ontology Directory Content Delivery Favorites Archives Local
came from another source; b) that the new item is very Services Services Management Management
Multimedia Viewer
Settings

similar to an existing item that came from the same source.


The objective is to prevent sending to a user a news item XML based local storage
Offline Proxy CommServices
that is very similar to an item that the user already read - “read later”, clicks cache Web Services IFS
Content, Archives, Favorites,
Profile, Settings, History
unless the user opted to obtain such "redundant" news. But
if the new item came from the same source, it is assumed
that it contains more recent/updated information and will Operating System – Communication Services – UI Services – Portable Embedded Virtual
therefore be delivered to the user. To identify similarities Machine
between items, we use the vector-based classifier [8].
3.1.6 Media Manager
Media Manager is responsible to manage the processing of
all media that arrives to the ePaper system including Figure 3. Client subsystem architecture
conversion to the ePaper format and generation of a new
item instance. A client may have several kinds of requests to the Content
Delivery subsystem: a request for news items, a request for
3.2 Client Subsystem ranking of items, a request for a "standard edition" of a
The client sub subsystem surrounds every functional unit newspaper, and a request for the user's profiles according to
planned to support the mobile application activity on the the device. A request for news items returns to the client a
device, as well as the mobile application itself. This sub set of the requested items without their ranking. A request
system includes the following functionality areas: for ranking returns to the client a set of items, based on the
x Local servers receiving breaking news, alerts and number of items requested and the requested categories to
software upgrades which those items belong,, to be presented in the client.
Another process in the Content Delivery module is the
x Newspaper runtime environment, the platform on which clicks setting process. It receives from the client the ID of
mobile applications designed are supposed to run the user and the clicked item, and updates the user's profiles
x Infrastructure services such as: offline proxy for accordingly.
maintaining connection-less environment, remote
communication services and local XML persistence layer 4. PERSONALIZATION AND CONTENT ADAPTATION
In this section, some of the innovative ideas of the ePaper
x Client application service, including favorites project are described, namely the personalization and the
management, multimedia viewing, user settings content adaptation algorithms.
management and remote ontology browsing services
4.1 Personalization
Figure 3 presents the client sub-system architecture. The Personalization engine of the ePaper system should
consider the special characteristics of a mobile newspaper
environment:
3.3 Content Delivery
The Content Delivery subsystem intervenes before content x Item relevancy over time – the relevancy of different
is delivered to the client side and mediates between the types of news items decreases differently over time
client side and the Personalization subsystem. In order to x Items are presented to a user using hierarchical
send the relevant content to a specific user, the Content navigation scheme - the engine should provide ranked
Delivery interacts with the Personalization subsystem and lists of relevant items within any level of concepts
requests the personalized ranked items for specific hierarchy
x New news items are continuously incoming to the and similar concepts, a similarity score is computed for
system and stay active for a short period of time. The each item.
cold start (new item) and sparsity problem should be
Step 3: Use the collaborative filter to rank the "active"
well addressed
items in the repository for a user. We adjusted K-nn
To address these challenges, we developed a hybrid algorithm to consider a decaying factor of item’s relevancy
filtering method which combines ontology- content-based over time, considering different decaying factors for
filtering with time-aware collaborative filtering. The different ontological concepts. We compute the following:
decrease in item's relevancy is addressed by a time-aware
x Find the user neighborhood: compute the user
collaborative process. The use of the ontology enables
similarity score (USS) for the user with all the other
representing the items and the users with concepts from the
users
same vocabulary, and measuring the similarity between
item and user profiles considering the hierarchical distance x Compute a time-discount weight for each click on the
between concepts in the two profiles. The combination of item to be ranked
the collaborative and content-based filtering techniques
enables to overcome the problems of "cold start" and x Compute the weighted average of all the clicks on the
sparsity, as it uses the content-based filter for new item, item: considers how similar is the “clicking user”
which still have no reading history, and dynamically (USS) and the time he clicked the item (the time
increases the weight of the collaborative filter, as a read discount)
item accumulates more "clicks" (i.e. it is read by more x The result so far is two lists of ranked items, one
users). according to the content-based filter, and the other –
Few ontological-based profiling models exist in which user according to the collaborative filter.
profiles are represented with ontology concepts as well as Step 4: Use a weighted combination scheme considering the
the item profile [7, 11]. However, in those studies, the “maturity” of each item, i.e. how many rates (clicks) it has.
computation of similarity between the user and item does The more rates, the more weight is given to the
not consider concept level hierarchy; the ontology hierarchy collaborative filter. Hence, a new item is ranked based on
is used naively only for profile update via feedback, e.g.; the content-based filter only; as time passes and an item
fractional interest in a higher-level concept is inferred when gets more clicks, the weight of the collaborative filter
a specific topic is added. In the ePaper, the content-based increases
filter considers the distance between concepts in the item's
profile and the user's profile, according to their location in 4.2 Content Adaptation
the hierarchal ontology. Exact details on the content-based The content adaptation challenge of the ePaper project deals
filtering method can be found in [9]. with the question of how to adjust news content collected to
The collaborative filter of ePaper includes a dynamic time- the ePaper database for presentation to the individual
decay factor, which is determined according to the age of reader. It assumes that readers differ in their preferences
the item. The intuition is that news items lose relevancy regarding the density and style of information presentation,
over time. We plan to learn and use different decay factors as well as in their interests. It also aims to develop an
for different concepts (e.g., political related news might automated system that can generate content adaptations and
lose their relevancy faster than technology related news). screen layouts without the intervention of a human editor.
Some collaborative systems use a decay factor usually To address this challenge we first conducted empirical
decreasing the user interest level in a concept rather than research to study the various aspects of the problem. The
reducing the item weight. No consideration of different empirical part consists of three interrelated lines of
decaying factors [10]. research. One series of experiments dealt with the
Here is a brief overview of the main steps of the filtering arrangement of pages in a newspaper, focusing mainly on
process: the comparison of serial and hierarchic navigation. The
research demonstrates the advantages of each type of
Step 1: Get a request to provide a ranked list of relevant structure and develops a model to determine the relative
news items for a user. benefits of each.
Step 2: Use the content-based filter to rank the "active" The second line of research looked at user's preferences
items in the repository for a user. In essence, the relevancy regarding the layout of news sites as a function of
estimation function measures the similarity (distance) information density and structure of the layout.
between each concept in the item's profile to respective
concepts in the user's profile, considering not only the co- The third line of research consists of a series of experiments
occurring concepts but also occurrences of neighboring that aimed to generate a function for predicting the apparent
(parent and child) concepts, according to the hierarchical importance of an item on a page as a function of visual
ontology. Based on the number of co-occurring concepts properties of the item. The experiments showed that
importance perception is a very rapid process (even after a
0.5 second exposure to a page,, people generate stable
assessments of the relative importance of items). We can
now predict with a high degree of accuracy the perceived
importance of a certain item according to its dimensions
relatively to other items and location on screen.
We developed an algorithm for the automatic generation of
screen layouts. The purpose of the algorithm, and its main
innovation compared to previous work on automated layout
generation, is the attempt to develop a system that can
create layouts, based on a very limited set of parameters, for
a wide range of devices that differ in display resolution,
screen size, and screen dimensions. Existing approaches for
achieving this goal are originated in different
methodologies such as ‘stock cutting’ problems
(Elmaghraby, Abdelhafiz & Hassan, 2000) and ‘floorplan
area’ of circuits in VLSI manufacturing (Knog, Hong &
Qiao, 1997). In most cases these approaches aim generate a
layout that minimize the area consumption, for predefined
number of items. In addition these items usually have a set
of positioning constraints between themselves (e.g. order, Figure 4a. Teenager layout
or adjacent items). Our algorithm goal is to populate an
entire given area, with undefined number of items, having
no display constrains between them, rather then having
individual display constrains to user only.
The proposed algorithm uses an iterative division method to
create smaller and smaller areas of the screen. The system
stops the division process when the generated areas reach
the minimal area size. Following the division process, areas
with one dimension smaller than the minimal area are
merged according different policies. The layout, generated
by the division and merging processes, is then populated by
items according to their importance. The output of the
algorithm is an xml-based description of the layout that is
then used by the client system as basics for generation of
actual layouts. The user can switch between layout, and the
user selection of layout is saved to his profile.
Figures 4a and 4b present different layouts generated for
the ePaper system.

Figure 4b – Business layout

SUMMARY AND FUTURE ISSUES


The full version of the ePaper prototype system is now
undergoing usability tests, aimed at examining the users'
reactions to the service and tests the navigation and
browsing capabilities.
Concurrently, we conduct evaluations the personalization
and content adaptation algorithms, as well as intuitive
interface related research.
For the personalization algorithms we examine the effect of
various parameters of the filtering performance; e.g.:
- optimal scores of partially similar concepts, according to 2. Dummer, G., Casademont, j., Einhoff, M., Boyer, A.,
their hierarchical distance, and their marginal and Perdrix, F. (2005). ELIN: A MPEG based news
contribution delivery framework Cunningham, P.: Innovation and the
Knowledge Economy. Part 2: Issues, Applications, Case
- optimal number of concepts to consider in the user’s
Studies. Amsterdam: IOS Press, 2005, pp. 959-966
profile and the items’ profile
3. Elmaghraby, A. S., Abdelhafiz, E. and Hassan, M. F.
- schemes to analyze the user feedback (e.g. clicks; time of (2000). An intelligent approach to stock cutting
reading, ranking of clicked item) optimization. Univarsity of Louisville Multimedia
- optimal decaying factor (the impact of time on the Research Lab, Louisville, KY
collaborative and combined filters) 4. Hyung Jun A. (2008). A new similarity measure for
We are running controlled experiments with users to collaborative filtering to alleviate the new user cold-
evaluate the relevancy of news items, compared to the starting problem, Information Sciences Volume 178,
system’s ranking of those items based on the filtering Issue 1, Pages 37-51
methods. We are running simulations manipulating various 5. Ihlström, C., Sabelström Möller, K. and Maria Åkesson,
parameters, and calculate standard and novel filtering M., (2005). Diginews - The challenge of production in
measures, e,g, MAE, precision, recall, PIP (Hyung, 2008). e-paper publishing - from new consumption to new
workflows. Presented at TAGA 2005
We are currently in the midst of conducting laboratory
experiments about the design of the ePaper to understand: 6. Knog, T., Hong, X., and Qiao, C. (1997). VEAP: Global
Optimization based Efficient Algorithm for VLSI
x How the aesthetic design of online news sites affect Placement. Asia and South Pacific Design Automation
users’ emotions and attitudes towards the product Conference (ASP-DAC’ 97), Chiba, Japan, pp 277-280.
x The effects of typical vs. novel designs of the ePaper 7. Middleton, S.E., Alani, H., Shadbolt, N.R., Roure
(relative to other news sources) on users’ preferences (2002). D.C.D.: Exploiting synergy between ontologies
of those designs and recommender systems. In: The Eleventh
International World Wide Web Conference
The results of these experiments will provide further
(WWW2002).
guidelines regarding the design of the ePaper and similar
products. 8. Salton, G, Wong, A., and Yang, C. S. (1975), "A Vector
Space Model for Automatic Indexing," Communications
of the ACM, volume 18, issue 11, pages 613–620.
ACKNOWLEDGMENTS 9. Shoval, P, Maidel, V., and Shapira, B. (2008). An
The ePaper project is sponsored by Deutsche Telekom Co. ontology content based filtering method. Int'l Journal of
and is performed at Deutsche Telekom Laboratories at Ben- Information Theories and Applications, pp. 51-63.
Gurion University
10. Tong-Queue L., Young P. (2006). A Time-Based
REFERENCES
Recommender System Using Implicit Feedback.
1. Casademont, J., Perdrix, F., Einhoff, M, Dummer, G., CSREA IEEE 2006, pp. 309-315.
and Boyer, A (2005). ELIN: A Framework to deliver 11. Yli-Koivisto, J., and Puustjarvi, J. (2001) Using
media content in an efficient way based in MPEG Ontologies in CoMet,. Proceedings of the ONTO-2001
standards, IEEE International Conference on Web Workshop on Ontologies, Vienna, Austria, September
Services (ICWS'05) pp. 841-842 18, 2001, pp. 1-15.

You might also like