Professional Documents
Culture Documents
1 Introduction
T. Huang et al. (Eds.): ICONIP 2012, Part III, LNCS 7665, pp. 324–331, 2012.
© Springer-Verlag Berlin Heidelberg 2012
A Contextual-Bandit Algorithm for Mobile Context-Aware Recommender System 325
information about the documents’ average rewards (i.e., exploration) can refine B’s
estimate of the documents’ rewards and in turn increases long-term user’s
satisfaction. Clearly, neither a purely exploring nor a purely exploiting algorithm
works well, and a good tradeoff is needed. One classical solution to the multi-armed
bandit problem is the ε-greedy strategy [12]. With the probability 1-ε, this algorithm
chooses the best documents based on current knowledge; and with the probability ε, it
uniformly chooses any other documents uniformly. The ε parameter controls
essentially the exp/exr tradeoff between exploitation and exploration. One drawback
of this algorithm is that it is difficult to decide in advance the optimal value. Instead,
we introduce an algorithm named Contextual-ε-greedy that achieves this goal by
balancing adaptively the exp/exr tradeoff according to the user’s situation. This
algorithm extends the ε-greedy strategy with an update of the exr/exp-tradeoff by
selecting suitable user’s situations for either exploration or exploitation.
The rest of the paper is organized as follows. Section 2 gives the key notions used
throughout this paper. Section 3 reviews some related works. Section 4 presents our
MCRS model and describes the algorithms involved in the proposed approach. The
experimental evaluation is illustrated in Section 5. The last section concludes the
paper and points out possible directions for future work.
2 Key Notions
In this section, we briefly sketch the key notions that will be of use in this paper.
The User’s Model: The user’s model is structured as a case based, which is composed of
a set of situations with their corresponding user’s preferences, denoted U = {(Si; UPi)},
where Si is the user’s situation and UPi its corresponding user’s preferences.
The User’s Preferences: The user’s preferences are deduced during the user’s
navigation activities, for example the number of clicks on the visited documents or
the time spent on a document. Let UP be the preferences submitted by a specific user
in the system at a given situation. Each document in UP is represented as a single
vector d=(c1,...,cn), where ci (i=1, .., n) is the value of a component characterizing the
preferences of d. We consider the following components: the total number of clicks
on d, the total time spent reading d and the number of times d was recommended.
Context: A user’s context C is a multi-ontology representation where each ontology
corresponds to a context dimension C=(OLocation, OTime, OSocial). Each dimension
models and manages a context information type. We focus on these three dimensions
since they cover all needed information. These ontologies are described in [1].
Situation: A situation is an instantiation of the user’s context. We consider a situation
as a triple S = (OLocation.xi, OTime.xj, OSocial.xk) where xi, xj and xk are ontology concepts
or instances. Suppose the following data are sensed from the user’s mobile phone: the
GPS shows the latitude and longitude of a point "48.89, 2.23"; the local time is
"Oct_3_12:10_2012" and the calendar states "meeting with Paul Gerard". The
corresponding situation is: S=("48.89,2.23","Oct_3_12:10_2012","Paul_Gerard"). To
build a more abstracted situation, we interpret the user’s behavior from this low-level
326 D. Bouneffouf, A. Bouzeghoub, and A.L. Gançarski
multimodal sensor data using ontologies reasoning means. For example, from S, we
obtain the following situation: Meeting=(Restaurant, Work_day, Financial_client).
Among the set of captured situations, some of them are characterized as High-Level
Critical Situations.
High-Level Critical Situations (HLCS): A HLCS is a class of situations where the
user needs the best information that can be recommended by the system, for instance,
during a professional meeting. In such a situation, the system must exclusively perform
exploitation rather than exploration-oriented learning. In the other case, where the user
is for instance using his/her information system at home, on vacation with friends, the
system can make some exploration by recommending some information ignoring
his/her interest. The HLCS are predefined by the domain expert. In our case we
conduct the study with professional mobile users, which is described in detail in
Section 5. As examples of HLCS, we can find S1 = (restaurant, midday, client) or S2=
(company, morning, manager).
3 Related Work
We refer, in the following, recent recommendation techniques that tackle the problem
of making dynamic exr/exp (bandit algorithms). Existing works considering the user’s
situation in recommendation are not considered in this section, refer to [1] for further
information.
Very frequently used in reinforcement learning to study the exr/exp tradeoff, the
multi-armed bandit problem was originally described by Robbins [11]. The ε-greedy
is one of the most used strategy to solve the bandit problem and was first described in
[10]. The ε-greedy strategy chooses a random document with epsilon-frequency (ε),
and chooses the document with the highest estimated mean otherwise. The estimation
is based on the rewards observed thus far. ε must be in the interval [0, 1] and its
choice is left to the user. The first variant of the ε-greedy strategy is what [6, 10] refer
to as the ε-beginning strategy. This strategy makes exploration all at once at the
beginning. For a given number I of iterations, documents are randomly pulled during
the εI first iterations; during the remaining (1−ε)I iterations, the document of highest
estimated mean is pulled. Another variant of the ε-greedy strategy is what [10] calls
the ε-decreasing. In this strategy, the document with the highest estimated mean is
always pulled except when a random document is pulled instead with εi frequency,
where εi = {ε0/ i}, ε0 ∈]0,1] and i is the index of the current round. Besides ε-
decreasing, four other strategies presented [3]. Those strategies are not described here
because the experiments done by [3] seem to show that ε-decreasing is always as
good as the other strategies. Compared to the standard multi-armed bandit problem
with a fixed set of possible actions, in MCRS, old documents may expire and new
documents may frequently emerge. Therefore it may not be desirable to perform the
exploration all at once at the beginning as in [6] or to decrease monotonically the
effort on exploration as the decreasing strategy in [10].
As far as we know, no existing works address the problem of exr/exp tradeoff in
MCRS. However few research works are dedicated to study the contextual bandit
A Contextual-Bandit Algorithm for Mobile Context-Aware Recommender System 327
problem on recommender systems, where they consider the user’s behavior as the
context of the bandit problem. In [13], the authors extend the ε-greedy strategy by
dynamically updating the ε exploration value. At each iteration, they run a sampling
procedure to select a new ε from a finite set of candidates. The probabilities
associated to the candidates are uniformly initialized and updated with the
Exponentiated Gradient (EG) [7]. This updating rule increases the probability of a
candidate ε if it leads to a user’s click. Compared to both ε-beginning and ε-
decreasing, this technique gives better results. In [9], authors model the
recommendation as a contextual bandit problem. They propose an approach in which
a learning algorithm sequentially selects documents to serve users based on their
behavior. To maximize the total number of user’s clicks, this work proposes LINUCB
algorithm that is computationally efficient.
As shown above, none of the mentioned works tackles both problems of exr/exp
dynamicity and user’s situation consideration in the exr/exp strategy. This is precisely
what we intend to do with our approach. Our intuition is that, considering the
criticality of the situation when managing the exr/exp-tradeoff, improves the result of
the MCRS. This strategy achieves high exploration when the current user’s situation
is not critical and achieves high exploitation in the inverse case.
4 MCRS Model
(
S p= arg max sim( S t , S c ) ) (1)
Sc ∈PS
(
sim(S t ,S c ) = α j ⋅ sim j x tj ,x cj ) (2)
j
where simj is the similarity metric related to dimension j between two concepts
xjt and xjc; αj is the weight associated to dimension j (during the experimental
phase, αj has a value of 1 for all dimensions). This similarity depends on how
closely xj c and xjc are related in the corresponding ontology. We use the same
similarity measure as [15] defined by:
( )
sim j x tj , x cj = 2 ∗
deph ( LCS ) (3)
( deph ( x cj ) + deph ( x tj ))
328 D. Bouneffouf, A. Bouzeghoub, and A.L. Gançarski
where LCS is the Least Common Subsumer of xjt and xjc, and deph is the number
of nodes in the path from the node to the ontology root.
Task 2: Let D be the document collection and Dp ∈ D the set of documents
recommended in situation S p. After retrieving S p, the system observes the user’s
behavior when reading each document d p ∈ Dp. Based on observed rewards, the
algorithm chooses document d p with the greater reward r p.
Task 3: After receiving the user’s reward, the algorithm improves its document-
selection strategy with the new observation: in situation S t, document d p obtains
a reward rt.
When a document is presented to the user and this one selects it by a click, a
reward of 1 is incurred; otherwise, the reward is 0. The reward of a document is
precisely its Click Through Rate (CTR). The CTR is the average number of
clicks on a document by recommendation.
di = { argmaxUC ( getCTR( d ))
Random(UC )
if ( q >ε )
otherwise
(4)
{
sim( S t , S m )
1−
if ( sim( S t , S m ))< B
ε = B (5)
0 otherwise
A Contextual-Bandit Algorithm for Mobile Context-Aware Recommender System 329
To summarize, the system does not make exploration when the current user’s situation is
critical; otherwise, the system performs exploration. In this case, the degree of
exploration decreases when the similarity between St and Sm increases.
5 Experimental Evaluation
In order to empirically evaluate the performance of our approach, and in the absence
of a standard evaluation framework, we propose an evaluation framework based on a
diary set of study entries. The main objectives of the experimental evaluation are: (1)
to find the optimal threshold B value described in Section 4.2 and (2) to evaluate the
performance of the proposed algorithm (contextual-ε-greedy). In the following, we
describe our experimental datasets and then present and discuss the obtained results.
We have conducted a diary study with the collaboration of the French software
company Nomalys1. This company provides a history application, which records the
time, current location, social and navigation information of its users during their
application use. The diary study has taken 18 months and has generated 178 369 diary
situation entries. Each diary situation entry represents the capture, of contextual
time, location and social information. For each entry, the captured data are replaced
with more abstracted information using time, spatial and social ontologies [1]. From
the diary study, we have obtained a total of 2 759 283 entries concerning the user’s
navigation, expressed with an average of 15.47 entries per situation.
In order to set out the threshold similarity value, we use a manual classification as
a baseline and compare it with the results obtained by our technique. So, we take a
random sampling of 10% of the situation entries, and we manually group similar
situations; then we compare the constructed groups with the results obtained by our
similarity algorithm, with different threshold values.
Fig. 1 shows the effect of varying the threshold situation similarity parameter B in
the interval [0, 3] on the overall precision. Results show that the best performance is
obtained when B has the value 2.4 achieving a precision of 0.849. Consequently, we
use the optimal threshold value B = 2.4 for testing our MCRS.
1
Nomalys is a company that provides a graphical application on Smartphones allowing users
to access their company’s data.
330 D. Bouneffouf, A. Bouzeghoub, and A.L. Gançarski
6 Conclusion
In this paper, we study the problem of exploitation and exploration in mobile context-
aware recommender systems and propose a novel approach that balances adaptively
exr/exp regarding the user’s situation. In order to evaluate the performance of the
proposed algorithm, we compare it with other standard exr/exp strategies. The
experimental results demonstrate that our algorithm performs better on average CTR
in various configurations. In the future, we plan to evaluate the scalability of the
algorithm on-board a mobile device and investigate other public benchmarks.
References
1. Bouneffouf, D., Bouzeghoub, A., Gançarski, A.L.: Following the User’s Interests in
Mobile Context-Aware Recommender Systems. In: AINA Workshops, Fukuoka, Japan,
pp. 657–662 (2012)
2. Adomavicius, G., Mobasher, B., Ricci, F., Alexander, T.: Context-Aware Recommender
Systems. AI Magazine 32(3), 67–80 (2011)
3. Auer, P., Cesa-Bianchi, N., Fischer, P.: Finite Time Analysis of the Multiarmed Bandit
Problem. Machine Learning 2, 235–256 (2002)
4. Baltrunas, L., Ludwig, B., Peer, S., Ricci, F.: Context Relevance Assessment and
Exploitation in Mobile Recommender Systems. Personal and Ubiquitous Computing, 1–20
(2011)
5. Bellotti, V., Begole, B., Chi, E.H., Ducheneaut, N., Fang, J., Isaacs, E.: Activity-Based
Serendipitous Recommendations with the Magitti Mobile Leisure Guide. In: Proceedings
on the Move, pp. 1157–1166. ACM, New York (2008)
6. Even-Dar, E., Mannor, S., Mansour, Y.: PAC Bounds for Multi-armed Bandit and Markov
Decision Processes. In: Kivinen, J., Sloan, R.H. (eds.) COLT 2002. LNCS (LNAI),
vol. 2375, p. 255. Springer, Heidelberg (2002)
7. Kivinen, J., Warmuth, M.K.: Exponentiated Gradient versus Gradient Descent for Linear
Predictors. Information and Computation 132 (1995)
8. Langford, J., Zhang, T.: The Epoch-greedy Algorithm for Contextual Multi-armed
Bandits. In: Advances in Neural Information Processing Systems (2008)
9. Li, L., Wei, C., Langford, J.: A Contextual-Bandit Approach to Personalized News
Document Recommendation. In: Proceedings of the 19th International Conference on
World Wide Web, pp. 661–670. ACM, New York (2010)
10. Mannor, S., Tsitsiklis, J.N.: The Sample Complexity of Exploration in the Multi-Armed
Bandit Problem. In: Computational Learning Theory, pp. 255–270 (2003)
11. Robbins, H.: Some Aspects of the Sequential Design of experiments. Bulletin of the
American Mathematical Society 58, 527–535 (1952)
12. Watkins, C.: Learning from Delayed Rewards. Ph.D. thesis. Cambridge University (1989)
13. Li, W., Wang, X., Zhang, R., Cui, Y., Mao, J., Jin, R.: Exploitation and Exploration in a
Performance Based Contextual Advertising System. In: Proceedings of the International
Conference on Knowledge discovery and data mining, pp. 27–36. ACM, New York (2010)
14. Sohn, T., Li, K.A., Griswold, W.G., Hollan, J.D.: A Diary Study of Mobile Information
Needs. In: Proceedings of the 26th Annual SIGCHI Conference on Human Factors in
Computing, pp. 433–442. ACM, Florence (2008)
15. Wu, Z., Palmer, M.: Verb Semantics and Lexical Selection. In: Proceedings of the 32nd
Meeting of the Association for Computational Linguistics, pp. 133–138 (1994)