You are on page 1of 6

- 38 -

http://www.sjie.org
Scientific Journal of Information Engineering
April 2014, Volume 4, Issue 2, PP.38-43
Mining of User Correlationship in a Mobile
Reading and Social System
Yadong Fang
1
, Lei Zhang
2
, J ian Ye
2,3#

1. Shandong Inspur Software Industry Co Ltd, Jinan Shandong 250011, China
2. Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China
3. Beijing Key Laboratory of Mobile Computing and Pervasive Device, Beijing 100190, China
#
Email: jye@ict.ac.cn
Abstract
Mobile learning not only enables a user to get what he wants to learn, but also provides an approach to improve the communication
among users. Therefore, user correlation plays a very important role in finding resources with high relevance and potential friends
of users. In this paper, a novel user correlation mining algorithm (UCMA) is proposed to analyze users reading history and
interaction records. In order to get the overall user correlation, the algorithm introduces the feature of knowledge structure and
strength of relationship between two users. At the end of this paper, the algorithm is evaluated with data collected from the
prototype system. The result of the experiment shows that the proposed algorithm is feasible and effective in the calculation on the
user correlation.
Keywords: User Correlation; Mobile Learning; Knowledge Structure; Strength of Relationship
1 INTRODUCTION
With the development of ubiquitous computing and Internet technologies, mobile learning has become an important
application for users. We can easily get access to substantial electronic resources via iPad, Amazon Kindle or other
smart devices. However, current learning technologies mainly focus on the sharing of learning resources and
activities in a close structure, ignoring building up dynamic relationships between users and learning resources.
Therefore, current researches make several attempts to solve this problem. However, the correlation between users
has not been explored in recent works although its very important to both users and resource providers. In this paper,
a novel user correlation mining algorithm (UCMA) is proposed to analyze users reading history and interaction
records and to explore the correlation between users.
The rest of this paper is organized as follows. Section 2 reviews related works on knowledge structure and user
correlation and further points out the differences between ours and others. After clarifying some definitions, Section
3 outlines the architecture of UCMA. Section 4 describes user correlation exploration in detail. In Section 5, we
evaluate the performance of UCMA using data collected by 80 volunteers over a period of 3 months. Finally,
conclusions and future work insights are given in Section 6.
2 RELATED WORK
Recent works mainly focus on the visualization of knowledge structure. [1] proposes a knowledge network model to
promote intuition of researchers knowledge representation. The model includes knowledge points, knowledge
stocks and relationship between knowledge points. However, it simply uses the number of words in intersection to
represent the relationship between knowledge points, which is not accurate enough. [2] presents the CmapTools
software as an example of how concept maps, a knowledge visualization tool, can be combined with recent
technology to provide integration between knowledge and information visualizations. [3] proposes a concept
mapping tool named VCE to create dynamic concept maps for users and facilitates users visual interaction to
concepts and documents. Its central idea is to make implicit knowledge structures explicit.
- 39 -
http://www.sjie.org
User correlation is widely used in collaborative recommender system. Given user-item matrix, [4] uses person
correlation or vector cosine based similarity to calculate the similarity between two users. Then user-based
collaborative filtering algorithm can be used for recommendations. [5] Proposes using asymmetric similarity
measure to identify a neighborhood whose traits are strongly similar to those of an active users behavior. Thus the
possibility of generating irrelevant recommendations can be reduced.
There are two major differences between our work and the techniques. One is that we use transaction paths of related
KPs and residence time at each KP to calculate the knowledge structure similarity. The other is that we get the
overall user correlation through a comprehensive consideration of their knowledge structure and strength of
relationship.
3 ALGORITHM ARCHITECTURE
We define some terms related with knowledge structure and use them to model knowledge structure and further
explore user correlation. The architecture of our algorithm will be briefly described later in this section.
3.1 Preliminary
In this subsection, we will clarify some terms, including knowledge point, related knowledge point, knowledge point
correlation, transaction path of related knowledge point, residence time at knowledge point.
Knowledge Point (KP): A KP is usually a key figure, an important event or terminology. Its predefined by experts
or extracted from learning resource using text mining. For each KP, we use text and pictures to depict its detailed
information and we also define related audios and videos for it.
Related Knowledge Point: For each KP, we define its related knowledge points. Related KP is also predefined or
extracted from learning resources using text mining.
Knowledge Point Correlation (KPC): Suppose k
i
and k
j
are related KPs, the KPC between them is asymmetric. That
means KPC of k
i
to k
j
is not the same as that of k
j
to k
i
. The reason is that they dont share the same related KPs.
When calculating the KPC of k
i
to k
j
, we not only consider the times of visiting KPs from k
i
to k
j
, but also take into
account the whole knowledge network.
Transaction Path of Related Knowledge Point: If you are interested in one KP, you can get access to its detailed
information. You can listen to related audios or watch related videos to deepen the understanding of it. More
importantly, you can carry on visiting its related KPs.
Residence Time at Knowledge Point: The residence time at knowledge point is not just the time you spend on
browsing its detailed information. We also take into consideration the correlation between KPs. For example,
residence time at k
i
is not only the time you browsing k
i
, we also plus the browsing time of its related KP k
j

multiplied by certain coefficient. The coefficient is the KPC of k
i
to k
j
as we mentioned above.
3.2 Architecture of UCMA
Traditional standards related to learning resources are limited to the sharing of materialized learning resources. They
do not consider the resource of human connected by such learning resources. Thus, it is an important issue to be
solved to build a cognitive network computing model based on user interaction and learning process and realize the
sharing of dynamic social cognition network. In our work, we get user correlation through a comprehensive
consideration of their knowledge structure and strength of relationship. The architecture consists of three processes:
knowledge structure similarity calculation, interaction correlation calculation and user correlation measurement.

FIGURE 1: KNOWLEDGE STRUCTURE SIMILARITY CALCULATION
- 40 -
http://www.sjie.org
Learners browse KPs and form their own knowledge structure. As shown in Figure 1, the calculation of knowledge
structure similarity (KSS) consists of four phases: browsing history representation, KP correlation calculation,
residence time calculation and finally KSS calculation. If a user gets interested in certain learning resource, he can
make use of the communication platform we provide to exchange ideas with others. We can get users interaction
correlation from users interaction history. As shown in Figure 2, user interaction correlation (UIC) includes two
parts: statement correlation (SC) and private chat correlation (PCC). SC is drawn from users statements he makes in
public chat room, while PCC is drawn from the times that he has private chat with others. Finally, we combine KSS,
SC, PCC together and assign a weight to each factor to get the overall user correlation.

FIGURE 2: USER CORRELATION EXPLORATION
4 USER CORRELATION EXPLORATION
The process of user correlation exploration is carried out in three steps: KSS calculation, UIC calculation and
correlation measurement. Each step will be described indetail in this section.
4.1 KSS Calculation
We select any two users, U
A
and U
B
, to calculate KSS between them. The set of books U
A
has read is Book
A
= {b
A1
,
b
A2
. . . , b
AnA
}, where n
A
is the number of books UA has read. The set of books UB has read is Book
B
= {b
B1
,
b
B2
. . . , b
AnB
}, where n
B
is the number of books B has read. The set of books both U
A
and U
B
have read is Book
com
=
{b
1
, b
2
. . . , b
ncom
}, where n
com
is the number of books both U
A
and U
B
have read. For each book in Book
com
, we will
calculate the KSS. For instance, the KSS of book b
k
between U
A
and U
B
is KSS
bk
(A,B). Then KSS of U
A
and U
B
is

1
(U , U )
(U , U )
com k
n b
A B
k
A B
A B com
KSS
KSS
n n n
=
=
+

(1)
In order to get the overall KSS of U
A
and U
B
, we apply three steps as follows.
Step 1: KP correlation calculation. We can obtain all the KPs of book bk. Consider KP as a node in the graph, if two
KPs are related, we draw an edge to connect them. Then KPs together can form a knowledge network graph. We use
random walk with restart (RWR) algorithm[6] to calculate the correlation between two KPs. RWR algorithm starts
from one node in the graph and random walk across the edge. At any node, the algorithm randomly chooses an
adjacent edge with a definite probability and moves across to the next node or returns to the starting point. For a
non-periodic irreducible map, the probability of reaching any node reaches a stationary distribution after limited
times. And another iteration will not change the probability distribution. Then the probability of reaching each node
in the graph can be regarded as a degree of relevance with the starting point. RWR model can be represented as

( ) ( ) t+1 t
c =(1- a)Sc +aq (2)
In the above equation, matrix c
(t+1)
is the probability distribution in the graph by the t step. Matrix q is the initial state
and its a diagonal matrix with 1 in the main diagonal. S is the transition probability matrix. S
i,j
represents the node
is at node i at present and the next step it will move to node j. S
i,j
is outlined in equation (3).

,
( )
( )
( )
i j
Freq i j
S i j
Freq j

= = (3)
- 41 -
http://www.sjie.org
Freq (ij) is the number of times user gets access to KP j by visiting KP i first. Freq(i) is the number of times users
getting access to KP i. If KP i and KP j are not related KPs, S
i,j
is zero. a is the restart probability. For a non-periodic
irreducible map, after many iterations, (2) gets convergent. Then the correlation between KP i and KP j is
represented as
( , ) ( , ) Cor i j c i j
+
= (4)
c
+
(i, j) is the probability from KP i to KP j when it reaches a stable distribution. In our work, we find that when t =
10, c
(t)
in equation (2) has already got convergent, so we set the value of t to 10.
Step 2: Residence time calculation. Calculate U
A
and U
B
s KP residence time according to their transaction paths
when they read book b
k
. For instance, when U
A
gets access to KP i, his residence time at KP i is represented as

i i j
+
A A A
j
t =t + Cor(i, j)t


(5)
In the above equation, j is KP i's related KP and
i
A
t is the time user U
A
spends on browsing KP i. Similarly,
j
A
t is
the time user U
A
spends on reading KP j. Take the following graph for example, U
A
first browses KP i, then KP y,
then returns to i, and goes ahead to browse k and then ks related KP x.
i k x

y
So user As residence time at KP i is
*
(i, y) (i, k)
i i y k
A A A A
t t Cor t Cor t = + + , while
*
(k, x)
k k x
A A A
t t Cor t = + . If KP i
appears in another transaction path, we just add the two
i
A
t
+
to get the final
i
A
t
+

In equation (5), the reason that we use
j
A
t
+
rather than
j
A
t
+
lies in two aspects. First, its very likely that i is also js
related KP or related KP of js related KP. U
A
could browse i once again in the same transaction path. It will make
the calculation of
i
A
t
+
much easier if we consider two adjacent KPs instead of such a long transaction path. Second,
the correlation of related KPs is a value between 0 and 1, sometimes much smaller than 1, so with the transaction
path getting longer, the latter related KPs have less influence on previous related KPs.
Step 3: Single book KSS calculation. Suppose both U
A
and U
B
have read book b
k
, then the set of KPs U
A
have read
in book b
k
is K
A
= {k
A1
, k
A2
. . . , k
AmA
}and the set of KPs U
B
have read in book b
k
is KB ={k
B1
, k
B2
. . . , k
BmB
}. m
A

is number of KPs UA has browsed and m
B
is number of KPs U
B
has read. The set of KPs both U
A
and U
B
have read
is K
com
= {k
1
, k
2
. . . , k
mcom
}. m
com
is the number of KPs U
A
and U
B
have both read. Then KSS
bk
(A,B) can be
represented as

1
,
( )
i i com
i i
k
* *
A B m
i
* *
B A
b
A B
A B com
t t
min
t t
KSS U ,U
m +m - m
=
| |
|
|
\ .
=


(6)
In the above equation,
A B com
m +m - m is the number of KPs U
A
and U
B
have read in all.
i
A
t
+
and
i
B
t
+
separately
represent U
A
and U
B
s residence time at KP K
i
( K
i
K
com
) ,
i i
i i
* *
A B
* *
B A
t t
min
t t
| |
|
|
\ .
is the bigger one of
i
A
t
+
and
i
B
t
+

divided
by the smaller one, which expresses the similarity of residence time U
A
and U
B
spend on KP K
i
.
1
,
i i com
i i
* *
A B m
i
* *
B A
t t
min
t t
=
| |
|
|
\ .


represents the overall similarity of residence time U
A
and U
B
spend on the m
com
KPs. After we
get KSS
bk
(U
A
,U
B
), we can use equation (1) to get the overall KSS of U
A
and U
B
.
- 42 -
http://www.sjie.org
4.2 UI C Calculation
UIC includes two parts: SC and PCC. Suppose the set of CRs U
A
has joined is CR
A
= {cr
A1
, cr
A2
. . . , cr
AlA
} and the
number of statements user U
A
gives is respectively s
Ai
(i = 1, 2 . . . , l
A
). l
A
is number of CRs U
A
has joined. The set
of CRs U
B
has joined is CR
B
= {cr
B1
, cr
B2
. . . , cr
BlB
} and the number of statements user U
B
gives is respectively
s
Bi
(i = 1, 2 . . . , l
B
). l
B
is number of CRs U
B
has joined. The set of CRs both U
A
and U
B
have joined is CRcom = {cr
1
,
cr
2
. . . , cr
lcom
}. l
com
is the number of CRs both U
A
and U
B
have joined. Then SC of U
A
and U
B
can be formulated as
the following equation:

1
com
l
com
A B i i
i= A B
(l )
SC(U ,U )= min(sA ,sB )
l l
o



(7)

(lcom)
is a coefficient depending on l
com
. It will be enlarged with l
com
increased. For instance, in our experiment, we
find that when (l
com
) = l
com
2
, the measure we propose achieves a high performance. Dividing the correlation by the
factor l
A
l
B
is motivated by the problem of unbalanced data of users.
We use the number of times U
A
and U
B
having a private chat to represent PCC (U
A
,U
B
). As U
A
joined a lot of CRs
with other users, we use statement correlation ratio (SCR) to represent the influence of U
B
to U
A
compared to other
users. Suppose the set of users who have ever joined in the same CR with user U
A
is ISCR
A
= {ISCR
1
, ISCR
2
. . .},
then

A B
A B
A,
SC(U ,U )
SCR(U ,U )= (X ISCR)
SC(U X)
e


(8)
Likewise, U
A
has a lot of private chat with other users, we use private chat correlation ratio(PCCR) to represent the
influence of U
B
to U
A
compared to other users. Suppose the set of users who have ever had a private chat with U
A
is
HPCA = {HPC
1
, HPC
2
. . .}, then

A B
A B
A
PCC(U ,U )
PCCR(U ,U )= (Y HPC)
PCC(U ,Y)
e


(9)
4.3 Correlation Measurement
When calculating user correlation, we take into account the above three factors: KSS, SCR and PCCR. To combine
three factors together, it is necessary to determine which factor functions more importantly. Thus, the weight of each
factor is needed. Equation (10) shows the completed formula of user correlation and
1
,
2
,
3
denote the weights.
A B 1 A B 2 A B 3 A B
Correlation(U ,U )=l KSS(U ,U )+l SCR(U ,U )+l PCCR(U ,U )

(10)
In our work, we conduct an online survey to get the weights. 100 users rate how important they think each factor is
on a five-point scale. If the user thinks one factor is of very little importance, he gives it the score of 1. And if he
thinks the factor is very important, he will score it as 5. Table 1 shows the result. From the result, we can get
1
as
4.29, while
2
is 2.14 and
3
is 3.
TABLE 1: RESULT OF ONLINE SURVEY TO OBTAIN THE WEIGHTS
KSS SCR PCCR
Sum Avg Sum Avg Sum Avg
Total 429 4.29 214 2.14 300 3
5 EXPERIMENT
We develop our prototype system based on Android platform. We randomly select 8 users from our system. These 8
users form a user group UG = {U
1
,U
2
. . . ,U
8
}. Then we summon 80 volunteers to conduct the experiment. For each
user in UG, we use our approach to calculate the correlation between him and other users in the system. Then N
users with the largest correlation are selected to form a new group UG= {U
1
, U
2
. . . ,U
N
}. Here, N equals 7. So
- 43 -
http://www.sjie.org
there are eight groups in all.
We make a comparison between two kinds of methods. One is we only consider KSS and we call it KSS method.
The other is we consider both KSS and UIC and we call it KSS-UIC method. Thus, we can get two kinds of nDCG.
The result is presented in Figure 3. Note that in a perfect ranking algorithm, nDCG is 1. The mean value of nDCG in
KSS-UIC method is 0.94, which is very approximate to 1, so the performance of KSS-UIC method is very good.
Meanwhile, the nDCG value in KSS-UIC method is larger than that in KSS method in average. In other words, it
really brings advantages by introducing UIC.

FIGURE 3: COMPARISON OF NDCG BETWEEN KSS-UIC AND KSS METHOD
6 CONCLUSIONS
In this paper, by considering user knowledge structure and strength of relationship, we propose UCMA to mine the
correlation between users. We use the KPs user has read and residence time at each KP to represent his knowledge
structure. By assigning different weights to KSS, SCR and PCCR, we get the overall user correlation.
In the future, we intend to extend our work in two directions. First, we aim to improve the performance of UCMA by
taking into account new features, such as residence time at each page or chapter and dynamic feedback of the
weights of KSS, SCR and PCCR. Second, we would like to develop new applications, such as personalized friend
recommendation and learning resources recommendation.
7 ACKNOWLEDGEMENT
This work is supported by the National Natural Science Foundation of China (61070109) and Opening Project of
Beijing Key Laboratory of Mobile Computing and Pervasive Device.
REFERENCES
[1] Jianbin Sun, Pengzhu Zhang. Visualization of Researchers Knowledge Structure Based on Knowledge Network[A]. International
Congress of Inborn Errors of Metabolism[C]. San Diego, California: SIMD Press, 2009, 2067-2071
[2] Alberto J.Canas, Roger Carff, Greg Hill, et al. Concept Maps: Integrating Knowledge and Information Visualization[J].
Knowledge and Information Visualization, 2005, 3426: 205-219
[3] Xia Lin, Yen Bui, Dongming Zhang. Visualization of Knowledge Structures[A]. IEEE International Conference on Information
Visualization[C]. Sacramento, California: IEEE Press, 2007.476-484
[4] Xiaoyuan Su, Tag hi M. Khoshgoftaar. A Survey of Collaborative Filtering Techniques[J]. Advances in Artificial Intelligence,
2009: Article ID 421425
[5] Marta Millan, Maria Trujillo, Edward Ortiz. A Collaborative Recommender System Based on Asymmetric User Similarity[A].
Proceedings of the 8th international conference on intelligent data engineering and automated learning[C]. Birmingham, UK:
Springer, 2007(4881), 663-672
[6] Hanghang Tong, Christos Faloutsos, Jia-Yu Pan. Fast Random Walk with Restart and Its Applications[A]. International
Conference on Data Mining. Las Vegas, USA: IEEE Press, 2009, 613-622