You are on page 1of 9

Enhancing Social Network Analysis with a

Concept-Based Text Mining Approach to


Discover Key Members on a Virtual Community
of Practice

Hector Alvarez1 , Sebastian A. Ros1 , Felipe Aguilera2 ,


Eduardo Merlo1 , and Luis Guerrero2
1
Department of Industial Engineer, University of Chile
{halvarez,emerlo}@ing.uchile.cl, srios@dii.uchile.cl
2
Department of Computer Science, University of Chile
{faguiler,luguerre}@dcc.uchile.cl

Abstract. In order to have a successful VCoP two important tasks must


be performed: on the one hand, it is always important that community
provide useful information to every member by a good organization of
contents and topics; on the other hand, to understand the behavior of
members (i.e. which are the key members or experts, discover communi-
ties, etc). Social Network Analysis (SNA) is a powerful tool to understand
the communities members, however, our theses is that state-of-the-art
in SNA it is not sucient to obtain useful knowledge from a VCoP.
Moreover, we think that traditional SNA may lead to discover wrong
results. We propose to combine traditional SNA with data mining tech-
niques in order to produce results closer to reality and gather useful
knowledge for VCoPs enhancement. In this work, we focused in discov-
ering key members on a VCoP combining SNA with concept-based text
mining.We successfully tested our approach on a real VCoP with more
than 2500 members and we validate our results asking the community
administrators.

1 Introduction

Virtual Communities of Practice (VCoP) have experienced an explosive growth


in Internet in recent years. VCoPs value is to communicate people who wants to
share or learn about a specic topic, by interacting on an ongoing basis[9]. The
Web facilitates interaction between community members without presentational
contact needed, ubiquitous environment and atemporal virtual communication
space. To support this virtual interaction are comonly used forums, wikis, and
other similar tools.
For the VCoPs is very important to generate, store and keep knowledge re-
sulting from members interaction. The success of a VCoP depends on a gover-
nance mechanism[7] and key members participation (so called leader[2] or core

R. Setchi et al. (Eds.): KES 2010, Part II, LNAI 6277, pp. 591600, 2010.

c Springer-Verlag Berlin Heidelberg 2010
592 H. Alvarez et al.

members[7]). Likewise, every VCoP members goal is to learn specic knowl-


edge from the community, therefore, the contents of posts, which specic topic
is interesting for a member, etc. must be considered.
There are many denitions about what is a key member: the most partici-
pative member[7], the member who answers the others members questions[5] or
the member who encourage others members to participate[2]. However, non of
these denitions take into consideration the content or the meaning of their inter-
actions (posts, reply, etc.). Interaction is usualy measure by user A reply post of
user B, if A answer or not the question successfully it is not taken into account.
Those approaches they just consider sequence of posts, even if the posts talk
of a topic totally dierent than the thread which contains them. Therefore, we
hypothesized that key members dened this way lead to wrong results and inval-
idate the results obtained from the aplication of SNA techniques (core discovery
techniques, experts discovery techniques, discovery of sub communities, etc.).
In our approach, a key member is dened as the member who participate
(asking or answering) according to a specific purpose of VCoP. The more aligned
to VCoPs purposes dene the greater degree of importance a member has. This
way, a key member must be obtained by combining text mining techniques with
SNA techniques in a single process.
To measure members participation it is very common to apply SNA. The
results of this kind of process is a graphical representation which helps to nd
community core, sub communities, network clusters, peripheral members, etc.
key members belong to communities core, therefore, we should apply core algo-
rithms to discover them such as HITS.
For above reason we use Concept-Based Mining approach developed by Ros et
al.[8], which measures the purpose accomplishment by entire community. We will
focus on discovery of key members using HITS algorithm. To do so, we modeled
the community using two representations: reply to the owner of a certain post,
and reply to specifyc post. Then, we will build a graph based on clasic manner
and how these both tpologies change when using our proposal. Finally, we will
extract the top 10 key members administrators surveys and we will compare
them with HITS applied to those topologies.
We showed the benets of using concept-based mining for SNA representation
in a real VCoP with more than 2500 members and 8 years of data. We discovered
interesting and promising results.

2 Related Work

There are dierent kinds of communities. Kim et al. [4] organize Social Web Com-
munities describing the kind of users, uses and needed features for every kind
of community. Important missions for a Community of Knowledge are sharing
user-created content and how to hold users, specially key members. In this con-
text, VCoP is also a Community of Knowledge, therefore, it should accomplish
such missions, but also it must care that community members full their goals
(or purpose) when using the VCoP.
Concept-Based SNA on VCoP 593

In order to make communities enhancements, there are many approaches to


face the analysis and understanding of a community these will be explained in
following sections.

2.1 Social Network Analysis

When administrator wants to make changes in the community structure or con-


tents, it is necessary to know how it works the community. However, this implies
the understanding of human relationships. How members relate one to each
other. SNA helps to understand such relations by graph representation based
on users (nodes) and relations (arcs). Then there are several algorithms to ex-
tract experts (or key members), classify users according his relevance within
the community, discovering and describing resulting sub-communities, among
other. Besides, depending on the kind of community the meaning of the analysis
may vary.
Yulupula et al.[10] made a social network based on e-mails sent between mem-
bers of an enterprise, in order to predict the organizational structure using SNA.
They have mismatches produced by data quality and possibly because they made
the network only in a sent-mail approach, losing useful information like the
mails body. In this case, they only want to know how the community interacted.
In other case, Pfeil et al.[6] analyses an older people community. The main
idea is to study how emotional communication content exchanged alter the com-
munity structure. They found dierent network congurations for every support-
level category verifying his hypothesis. The study was made in a community of
47 users with 1.5 years of data (400 messages). That amount of data should
not be enough to make other kind of analysis like the relationship between the
content and the network structure, nding key members of the community or
study the evolution through time of the community itself.

2.2 Classification Methods

Another way to answer: what do you want to enhance? It is classifying some


community features. This makes the community improvement decision easier.
Even if you do not know what to do, the classication would help to nd the
improvements that community needs.
Sometimes, classication is used as an exploratory data tool to understand
the community. However, information obtained by this tools, provide greater
benets to the community. Hong et al.[3] uses Q-A discussion boards to classify
posts in question-posts or reply-posts and then nd the proper reply to a ques-
tion. This classication would improve the community in three ways: enhance
search quality, bringing suggestions when users ask a similar question and nd-
ing experts according to the replies provided. Recognising experts would help to
solve questions faster and better in the community.
Amatriain et al.[1] improve a recommendation system using the communitys
experts opinion. They compare user and experts preferences to know which ex-
perts are more similar to him. Then prepared a recommendation list based on
594 H. Alvarez et al.

classied experts opinion. The advantage of this work is that experts are iden-
tied, making a good example of expert-oriented enhancements. Unfortunatelly
this approach assume that we already know who are the experts.
In both cases we are concern about who the experts are and how they are
obtained. For that reason, this work focuses on nding a new approach to dis-
cover key members (experts or core members). The better you identifying them
the more relevant will be results obtained by other tecniques. Thus, it will be
possible to obtain better enhancements for the whole community.

2.3 Concept-Based Text Mining to Enhance VCoP

VCoP talks about specic main topics, one of his worries is that community do
not deviate of them. Also, VCoPs have purposes (or goals) that administrators
want to accomplish. Therefore, it is important for VCoP to supervise the goals
through time and analyse its evolution. Then, enhancements will surfaced based
on this temporal purpose evolution analysis[8].
Ros et al.[8] dene the VCoP goals of a website in collaboration with com-
munity members. When goals are well established and dened, it is possible
the application of concept-based text mining to evaluate the accomplishment of
communities goals. The concept-based text mining uses fuzzy logic theory to
assign a goal score to every forum in the community. This goal score show
how aligned to this specic goal is the text inside a community forum.
Having these scores, VCoPs administrators evaluate the goals accomplish-
ment to allow administrators to make the proper enhancements. For example, if
two forums have similar score of a specic goal, an enhancement could be merge
both. On the other hand, if a category has very high scores in two specic goals,
it is possible to split the forum in two independent forums closer to each goal.
In this work, the objective was to evaluate the goal accomplishment of VCoPs
forum, but this does not help to evaluate how users contribute to their purpose
accomplishment. Thus, making dicult nding the VCoPs key members. How-
ever, the use of concept-based mining will improve the search accuracy.

3 Enhancing SNA with Concept-Based Text Mining

Main question of present work is how to enhance key members discovery. This
question has no simple answer, the rst step is to obtain a graphic representa-
tion of the inner social community. The second step is to apply an core members
algorithm (like HITS) to this representation. As a result of the algorithms ap-
plication, we will obtain a rank of all community members leaving in the top of
the rank the experts (core or key) members.
We distinguish key member from expert member. Since a key member has
several characteristics that dene him/her. Firstly, a key member may be expert
in a eld or not. Secondly, he/she may increase the interaction in the community
because he ask interesting questions, which produce answers from the experts
on the eld. This means that questions are very specic in a eld, therefore,
Concept-Based SNA on VCoP 595

only experts are able to answer them. In other words, a key member is a person
totally aligned with the VCoPs goals and topics. Thus, producing contents which
are very relevant to satisfy other members interests. The only way to measure a
key member as we dene him is using an hybrid approach of SNA combined with
semantic-based text mining. Likewise, the mining process must include always
the dierent purposes of the community. This is why we chose the concept-based
text mining approach [8].

3.1 Network Configuration


As mentioned before, to build the social network we take into account members
interaction. In general, members activity is followed according members par-
ticipation. Participation appears when a member post in the community, which
is also the case when modeling a VCoP.
Because the activity of VCoP is described according members participation,
the network will be congured this way: the nodes will be the VCoP members,
and the arcs will represent interaction between members. How to link the mem-
bers and how to measure theirs interactions to complete the network is our main
concern. There are forums when you know whos replying who, but in other
forums this is not that clear. We are going to work with the second one.
To face this problem, we will describe two VCoPs network representation
according to whom member is replying:

1. Creator-oriented Network: when a member create a topic, every reply will


be related to him/her.
2. Last Reply-oriented Network: every reply of a topic will be a response of the
last post.

We can see a typilcal Forum structure in Fig.(1), then in Fig.(2) we can observe
how the Forum is converted into a graph. In Fig.(2), arcs will represent members
reply and nodes represent the users who made the posts. In our rst approach,
the weight of arcs will be a counter of how many times a member reply to other.
The problem is that we are not considering if the reply of members is according
to the community purpose (for any of these congurations). We have to lter
noisy post. This will be done using the concept-based text mining applied to
posts texts. Of course, networks will be dierent from those obtained before,
because, in order to draw an arc now we will compute the similarity between
concepts on a post and its reply. If a post and a certain reply are suciently
close, then we say they are similar, therefore, post and reply are relevant in a
specic topic (concept).

3.2 Concept-Based Text Mining for Network Filtering


Previous work[8] brings a method to evaluate community goals accomplishment,
now we will use this approach to classify the members posts according VCoPs
goals. These goals are dened as a set of terms, which are composed by a set of
keywords or statements in natural language. To obtain a goal accomplishment
596 H. Alvarez et al.

FORUM A

POST 1 U1

POST 2 U2

POST 3 U3

POST 5 U1

POST 4 U4

U3
POST 6

Fig. 1. A tipical forum structure, in circles are the users who posted

Creator-oriented Network Last Reply-oriented Network

1
U1 U1
U2 U2

1 2

U4 U3 U4 U3

Fig. 2. Two dierent network models to represent Forum from Fig.1

score, we use fuzzy logic to evaluate how much a goal is contained in a singular
post. Then, we will have a post vector in which the components will be the goals
accomplishment scores of the post.
The idea is to compare with euclidean distance two members posts and if the
distance it is over a certain threshold, there will be interaction between them.
We support the idea that this will help us to avoid, lter or erase irrelevant

interactions. For example, in a VCoP with k goals, let puj the post j of user u
that it is a reply to post i of user u (puj ). The distance between them will be
calculated with Eq.(1).

u u gik gjk
d(pi , pj ) =  k  (1)
2 2
k gik k gjk
Concept-Based SNA on VCoP 597

Where gik is the score of goal k in post i. It is clear that the distance exists only

if puj is a reply to puj . After that, we calculate the weight of arc uu (wuu ) with
Eq. (2).
 
wuu = d(pui , puj ) (2)
i,j
u
d(pu
i ,pj )

We used this weight in both congurations previously described (Creator-oriented


and Last Reply-oriented). Afterwards, we applied HITS to nd the key members
on the dierent networks congurations. These key members will be validated
with the VCoPs administrators.

4 Experiment in a Real VCoP

Our experimental VCoP was plexilandia.cl virtual community. Plexilandia1 is


a VCoP formed by a group of people who have met towards the building of
music eects, ampliers and audio equipment (like Do it yourself style). Al-
though, they have a web page with basic information of community, most of
their members interactions are produced by the discussion forum.
Today, plexilandia count more than 2500 members in more than 8 years of
existence and about 2100 active members . All this years they have been shearing
and discussing their knowledge about building their own plexies, eects. Besides,
there are other related topics such as luthier, professional audio, buy/sell parts.
In the beginning the administration task was performed by only one member.
Today, this task is performed by several administrators (by 2010 they count with
3 administrators). In fact, the amount of information generated weekly makes
impossible to let the administration task in just one admin.
We processed 59.279 posts created by 2107 active members from september
2002 to april 2009. We set up a threshold for the dot product in 0, 5. The process
takes about 1 hour.
We used creator-oriented and last reply-oriented network in the classic man-
ner. Afterwards, we modeled using concept based. We used the same concepts
used in previous work [8] and we dened a threshold of 0.5 in order to use
Concept-based Text Mining to build VCoPs network.
To begin with experiments, we asked administrator which are for him the key
members of Plexilandia. Although, we are able to perform monthly or weekly
analysis, we performed an analysis per year. Since, it is easier to ask administra-
tor which are the key members this year, last year, and so on; than, last week,
or week 35th last year, etc. Resulting survey is on Table 1.
Afterwards, we developed Plexilandias networks with the two topologies men-
tioned before. Then we build the same topologies but using concept based
1
Plexi is the nickname given to Marshall amp heads model 1959 that have the
clear perspex (a.k.a plexiglass) fascia to the control panel with a gold backing sheet
showing through as opposed to the metal plates of the later models.
598 H. Alvarez et al.

Table 1. Key members based on administrators survey

User Note
user2 Administrator
user1254
user37 Administrator
user808 he is not participating lately
user4
user1825
user999
user210 he is not participating lately
user240
user874
user321
user234 participation occasionally
user33

Table 2. Top 13 key members found with HITS in every approach

#rank hits-creator hits-reply hits-cb-creator hits-cb-reply


1 user2 user2 user2 user2
2 user8 user8 user8 user8
3 user31 user31 user226 user210
4 user20 user210 user210 user240
5 user210 user226 user23 user380 (-)
6 user12 user162 user33 user4
7 user380(-) user20 user29 user12
8 user162 user29 user37 user33
9 user226 user12 user4 user226
10 user23 user380 (-) user20 user128 (-)
11 user4 user33 user240 user31
12 user33 user240 user12 user37
13 user211 (-) user4 user31 user29

approach. We used PAJEK2 to draw the networks. Afterwards, HITS was applied
to every network topology. These results are shown in Table 2.
Finally, we summarized all results on Table 3. This table shows the intersection
between key members extracted using an algorithm and key members from the
survey on Table 1. We mark users in both lists with an X. We can observe that
concept based approach discovered one additional key member in every topology
used.
Concept-based text mining is used to discover new knowledge, therefore, we
wonder what kind of users are those which are not marked with an X on the
table. To do so, once more we ask the community expert. This time we showed
the list of key members gathered from every algorithm (Table 2). Surprisingly,
2
http://vlado.fmf.uni-lj.si/pub/networks/pajek/
Concept-Based SNA on VCoP 599

Table 3. Key Members Summary

user hits-creator hits-cb-creador hits-reply hits-cb-reply


user2 X X X X
user1254 - - - -
user37 - - - X
user808 - - - -
user4 X X X X
user1815 - - - -
user999 - - - -
user210 X X X X
user240 - X X X
user874 - - - -
user321 - - - -
user234 - - - -
user33 X X X X

he recognized that most of the users on the lists of key members were in fact key
members. He forgot many of them since we are using data from 2002, however,
when he saw them on the list he remember them, validating almost every user
as key member. We marked with a () sign on Table 2 those members which
are not key members. We can observe that hits-cb-creator dicovered 100% key
members. Unfortunatelly, hits-cb-reply was worse than hits-reply to detect key
members.

5 Conclusion
We propose to combine traditional SNA with data mining techniques in order
to produce results closer to reality and gather useful knowledge for VCoPs
enhancement.
We applied two network topology to represent the VCoP, creator-oriented and
last reply-oriented networks. We used Plexilandia.cl which is a VCoP with more
than 2100 active members from 2500 members base.
We showed that SNA combined to concept based text mining approach out-
performs SNA alone to discover VCoPs key members in the case of a creator-
oriented network topology. However, in the case of last reply-oriented network
SNA outperformed SNA plus concept based approach.
We think results were promising since we used all history to perform the
analysis. Besides, we did not took into consideration ranking possition, which
seems to be much closer to reality in concept based SNA approaches. Thus, we
need more experimentation in order to show the real impact of our proposal.

Acknowledgments
Authors would like to thank the continuous support of Instituto Sistemas Com-
plejos de Ingenier a (ICM: P-05-004- F, CONICYT: FBO16); Initiation into