You are on page 1of 8

Expert Systems with Applications 39 (2012) 9079–9086

Contents lists available at SciVerse ScienceDirect

Expert Systems with Applications


journal homepage: www.elsevier.com/locate/eswa

Combining ranking concept and social network analysis to detect collusive groups
in online auctions
Shi-Jen Lin a, Yi-Ying Jheng a, Cheng-Hsien Yu a,b,⇑
a
National Central University, Jhongli City, Taoyuan, Taiwan
b
China University of Technology, Hukou, Hsinchu, Taiwan

a r t i c l e i n f o a b s t r a c t

Keywords: Due to the popularity of online auction markets, auction fraud has become common. Typically, fraudsters
Ranking method will create many accounts and then transact among these accounts to get a higher rating score. This is
Collusive group detection easy to do because of the anonymity and low fees of the online auction platform. A literature review
Online auction revealed that previous studies focused on detection of abnormal rating behaviors but did not provide a
ranking method to evaluate how dangerous the fraudsters are. Therefore, we propose a process which
can provide a method to detect collusive fraud groups in online auctions. First, we implement a Web
crawling agent to collect real auction cases and identify potential collusive fraud groups based on a k-core
clustering algorithm. Second, we define a data cleaning process to remove the unrelated data. Third, we
use the Page-Rank algorithm to discover the critical accounts of the groups. Fourth, we developed a rank-
ing method for auction fraud evaluation. This method is an extension of the standard Page-Rank algo-
rithm and combines the concepts of Web structure and risk evaluation. Finally, we conduct
experiments using the Adaptive Neuro-Fuzzy Inference System (ANFIS) neural network and verify the
performance of our method by applying it to real auction cases. In summary, we find that the proposed
ranking method is effective in identifying potential collusive fraud groups.
 2012 Elsevier Ltd. All rights reserved.

1. Introduction nymity and the low online auction fees to create multiple accounts
and increase their rating scores via sham transactions. In this way,
Due to the popularity of the online auction markets, auction they can deceive the buyer with their high rating score. Related
fraud has become common. According to an Internet Fraud Report studies also show that the most severe cases of Internet fraud,
(2003–2009) issued by the Internet Crime Complaint Center (IC3), accounting for millions of dollars of loss, were reported by those
a joint operation between the FBI and the National White Collar who were misled by evaluation feedback (Griggs, 2003; Han,
Crime Center (NW3C), the number of complaints about Internet 2003; Wilke & Wingfield, 2003).
fraud increased from 1400 per month in 2001 to 22,940 per month Despite these facts, previous studies on Internet auction fraud
in 2008. From January 1, 2008 to December 31, 2008, IC3 received a focused on the detection of abnormal rating behaviors of known
total of 275,284 complaints, representing a 33.1% increase over the auction fraud groups (Wang & Chiu, 2008) or focused on the
previous year. The total amount lost increased from $17 million in classification of types of auction fraud (Pandit, Chau, Wang, &
2001 to $264 million in 2008. The total dollar loss linked to online Faloutsos, 2007). Thus, they did not rank the degree of suspicion
fraud in 2008 was $264 million, about $25 million more than in associated with each account of an auctioneer group. Furthermore,
2007. Regarding the variety of fraud types, non-delivery of mer- they did not provide an easy way for users to obtain the relevant
chandise ranked number one (32.9%). Internet auction fraud was information to evaluate collusive groups.
the second most reported offense (25.5%) (IC3, 2009). These figures Therefore, we propose a process which can provide a ranking
indicate that online auction fraud causes significant losses. method to evaluate collusive fraud groups in online auctions. First,
This type of fraud is actually facilitated by the seller reputation we implement a Web crawling agent to collect real auction cases
rankings displayed on the most popular online auction sites, such and identify potential collusive fraud groups based on the k-core
as Yahoo and eBay. Because these rankings are based on a simple clustering algorithm. Second, we define a data cleaning process
feedback mechanism, auctioneers can take advantage of the ano- to remove the unrelated data. Third, we use the Page-Rank algo-
rithm to discover the critical accounts of the groups. Fourth, we
⇑ Corresponding author at: China University of Technology, Hukou, Hsinchu, provide a ranking method for auction fraud evaluation. This meth-
Taiwan. od is extended from the standard Page-Rank algorithm and com-
E-mail addresses: sjlin@mgt.ncu.edu.tw (S.-J. Lin), chyu@cute.edu.tw (C.-H. Yu). bines the concepts of Web structure and risk evaluation. Finally,

0957-4174/$ - see front matter  2012 Elsevier Ltd. All rights reserved.
doi:10.1016/j.eswa.2012.02.039
9080 S.-J. Lin et al. / Expert Systems with Applications 39 (2012) 9079–9086

we conduct experiments using the Adaptive Neuro-Fuzzy Infer- Web structure mining categorizes Web pages and generates re-
ence System (ANFIS) neural network and verify the performance lated patterns, such as the similarity and the relationships between
of our method by applying it to real auction cases. In summary, different Web sites.
we find that the proposed ranking method is effective in identify- Recently, many researchers have applied social network analy-
ing a potential collusive fraud group. sis (SNA) combined with the Google’s Page-Rank algorithm, the
Hyperlink-induced Topic Search (HITS) algorithm, or other page-
rank algorithms to generate useful information for users (Jamali
2. Literature review
& Abolhassani, 2006; Mislove, Gummadi, & Druschel, 2006; Pujol,
Sangüesa, & Delgado, 2002). Thus, we also use SNA as the frame
In this section, we survey the related approaches for online auc-
and combine ranking algorithms to evaluate the importance of
tion fraud detection and then review the social network analysis
an account in the transactional group.
which is used to analyze the auctioneers’ interactions. We also re-
The Page-Rank vector of citation ranking weight was introduced
view the related work on the Page-Rank algorithm which will be
by Brin and Page (1998). The idea introduces a notion of page
applied to our Auction Fraud Rank algorithm.
authority, which is independent of the page content. Iterative
graph-based ranking algorithms provide a way of deciding the
2.1. Fraud detection in online auctions importance of a vertex within a graph and of deciding the impor-
tance of a page on the Web. Page-Rank is defined as:
According to the literature, fraud detection approaches can be
classified into three kinds of methods: statistical methods, data ð1  dÞ X
PRðuÞ ¼ þd PRðv Þ=Nv ð1Þ
mining techniques and formal methods (Dong, Shatz, & Xu, N v 2BðuÞ
2009a). With regard to statistical methods, Rubin et al. (2005) pro-
pose a new reputation system for auction sites to help users pro- where u represents a web page; d is a dampening factor that is usu-
tect their interests by warning them of the risk of auction fraud. ally set to 0.85; B(u) is the set of pages that point to u; PR(u) and
The model uses three variables (average number of bids, average PR(v) are rank scores of page u and v, respectively; Nv denotes the
minimum starting bid, and bidders’ profiles) to identify whether number of outgoing links of page v; and c is a factor used for
an account shows activity typical of shill biddings (Rubin et al., normalization.
2005). Other researchers also use statistical methods to detect The Page-Rank algorithm states that a page that is linked to
behaviors of fraudsters (Dong, Shatz, & Xu, 2009b; Trevathan & many pages with high Page-Ranks receives a high rank itself. A
Read, 2007, 2009). With regard to data mining techniques, Pandit popular web page is often referred to by other pages and that an
et al. (2007) propose the NetProbe approach, which models auc- important web page contains a high number of out-links. Further-
tioneers with their transactions as a Markov Random Field to de- more, the structure of collusive transactions conforms to the char-
tect the fraudsters’ suspicious patterns. It also constructs a belief acteristics of the Page-Rank algorithm. An account that has many
propagation mechanism to detect similar fraudsters (Pandit et al., transactions in the collusive group is considered an important ac-
2007). Formal notation and logic are used in mathematically rigor- count; if a page has important links connected to it, its out-linking
ous techniques for the specification, development, and verification pages will also become important pages. Thus, the Page-Rank algo-
of software and hardware designs. Xu and Cheng (2007) and Clarke rithm can be used to discover authoritative and important ac-
and Wing (1996) use a formal approach to detect shilling counts. We will modify the Page-Rank algorithm to build our
behaviors. Auction Fraud Rank algorithm to find the potential risk of accounts
Recently, Wang and Chiu (2008) used social network analysis in the group.
(SNA) indicators based on k-core and centered-weights algorithms
to detect auction fraud groups because of the normal transactions 2.3. Social network analysis
do not have a complicated relationship. However, if collusive auc-
tioneers are to manipulate their reputation, they must have trans- SNA is the study of social relations among a set of actors, and
actions among themselves. Thus, collusive auctioneers’ actors can be individuals, organizations, or communities (Borgatti,
transactions have a complicated relationship and have the high- 2010). It is used widely in the social and behavioral sciences, as
core characteristic in the transactional structure. Taking a struc- well as in political science, economics, organizational science,
tural perspective by focusing on the relationships between traders and industrial engineering (Garton, Haythornthwaite, & Wellman,
rather than their attribute values, Wang and Chiu used k-core and 1999; Wellman, 1996). The network perspective has proved fruit-
centered-weights algorithms to create a collaborative-based rec- ful in a wide range of social and behavioral science disciplines
ommendation system that could suggest risks of collusion associ- (Wasserman & Faust, 1994). The basic components of an SNA study
ated with an account (Wang & Chiu, 2005, 2008). are the actor or node and the connection or link (Borgatti, 2010).
The nodes can be individual, corporate, or collective social units,
2.2. Page-Rank algorithm and nodes are linked to one another by ties (Wasserman & Faust,
1994).
Web mining is the application of data mining techniques to dis- Divisions of actors into subgroups can be a very important as-
cover patterns from the Web. It also could be generalized into pect of social structure that can help explain how the network as
three different types: Web usage mining, Web content mining, a whole is likely to behave. A subgroup can be identified by the
and Web structure mining (Cooley, Mobasher, & Srivastava, 1997, measurements of clique, k-plex, and k-cores (Everett, 1982; Everett
1999; Srivastava, Cooley, Deshpande, & Tan, 2000). Web usage & Borgatti, 1998; Hanneman & Riddle, 2005; Liu & Terzi, 2008; Wu,
mining ascertains user profiles and the users’ behaviors recorded Xiao, Wang, He, & Wang, 2010; Zhou & Pei, 2008). The strongest
inside the Web log file. Web content mining is the application of possible clique is a given number of actors who have all possible
data mining techniques to unstructured or semi-structured data, ties present among themselves. A maximal complete sub-graph
usually HTML documents. Web content mining attempts to dis- is such a grouping, expanded to include as many actors as possible
cover useful information from Web content. Web structure mining (Johnson & Trick, 1996), as shown in Fig. 1.
analyzes the hyperlink structure of the Web to discover relation- k-Plex is an alternative way of relaxing the strong assumptions
ships between Web pages. Based on the topology of the hyperlinks, of the maximal complete sub-graph. k-Plex allows that actors may
S.-J. Lin et al. / Expert Systems with Applications 39 (2012) 9079–9086 9081

Fig. 3. SNA 2-core subgroup example.

members of the 2-core subgroup. Thus, F becomes a member of


the 2-core subgroup. If we only use k-core to identify collusive ac-
counts, we will falsely identify F as a member of the collusive ac-
Fig. 1. SNA clique subgroup example.
counts and G as a normal account. Using the approach of Wang
et al., high false negative and false positive results would be de-
be members of a clique even if they have ties to all but k other rived because many members of the 2-core subgroup have possible
members (Everett, 1982). In Fig. 2, the subgroup depicted, which links to the normal accounts and members of the 1-core subgroup
includes (a, b, c, d), is a 2-plex group. have possible links to the collusive accounts. In order to solve the
k-Core is a maximal sub-graph in which each node is adjacent above problems, other indicators are needed. Furthermore, Wang
to at least k other points. It is also thought to be an essential com- et al.’s approach cannot fit all real-world cases. In fact, the 2-core
plement to the measurement of density, which may not capture subgroup will often appear in the real auction environment, but
many of the features of the global network (Scott, 2002). The math- not all of the 2-core subgroups are collusive groups.
ematic definition of k-core is:
3. Ranking method
Let G = (V; L) be a graph. V is the set of vertices and L is the set of
lines (edges or arcs). We will denote n = |V| and m = |L|. A sub-
3.1. Auction Fraud Rank algorithm
graph Hk = (W;L|W) induced by the set W is a k-core or a core
of order k iff "v 2 W: deg H ðv Þ=k, and Hk is the maximum sub-
In this section, we will introduce our ranking method, the Auc-
graph with this property
tion Fraud Rank algorithm. We modify the standard Page-Rank
algorithm to match the collusive structure. The Auction Fraud Rank
algorithm is a combination of Web structure and potential risk per-
As shown in Fig. 3, the subgroup, which includes (A, B, C, D, E, F), spectives. It is more suitable for use in identifying collusive sub-
is a 2-core group. groups than is the standard Page-Rank algorithm. The Auction
In previous research (Wang & Chiu, 2005, 2008), the k-core indi- Fraud Rank is as follows:
cator was used to reveal the cohesiveness of an account’s trading ð1  dÞ X 1
relationship as well as the reputation structures derived from AFRðuÞ ¼ þd PRðv Þ=Nv þ
N v 2BðuÞ Date Differencei
transaction histories, since frequently engaged accounts will form
subgroups that can be captured by the k-core indicators. Through 1
 þ Corei ð4Þ
use of the k-core indicator, malicious traders can be distinguished Crediti
from the normal accounts. However, if we only use k-core to iden-
 Date_Differencei represents days between the first transaction
tify cohesive accounts, it will have high false negative and false po-
date of accounti and the average first transaction date of the
sitive results. As shown in Fig. 3, the subgroup, which includes (A,
group. If Date_Differencei is 0, Date_Differencei will be 1.
B, C, D, E, F), is a 2-core subgroup. We assume that (A, B, C, D, E, G)
 Crediti represents the accounti’s rating scores. If Crediti is 0,
are collusive accounts and F is a normal account. G was only used
Crediti should equal to be 1.
to manipulate reputation one time, but it is not a member of the 2-
 Corei represents the accounti’s core value. If the account is a
core subgroup. F is a normal account, but F has two linkages to
member of a 1-core subgroup, corei should equal to be 0. If
the account is a member of a 2-core subgroup, corei should
equal to be 0.

If the Auction Fraud Rank value is close to 10, it means that the
relevance between the account and the subgroup is very high;
otherwise, if the value is equal to 0, the relevance is very low, so
that the account might be ranked behind.
The mathematic definition of Date_Differencei is:
X
k
Date Differencei ¼ Datei  Datej =k ð5Þ
j¼1

 Datei represents the date of the first transaction recorded for a


user’s account (unit: day).
P
 kj¼1 Datej =k represents the average first transaction date of top-
k accounts (unit: day).

In the real auction environment, the first transaction dates of


Fig. 2. SNA 2-plex subgroup example. the collusive group are very close, due to the fact that malicious
9082 S.-J. Lin et al. / Expert Systems with Applications 39 (2012) 9079–9086

auctioneers create multiple accounts and manipulate their transac-


tions in order to create high reputations in a short time. Thus, we
add Date_Differencei and Crediti to the Auction Fraud Rank algo-
rithm. In order to calculate Date_Differencei, we have to find the
core accounts in the group. The Page-Rank algorithm can rank
web pages according to ingoing and outgoing links. Wang and Chiu
(2008) showed that collusive auctioneers have a complicated rela-
tionship and high core in the transaction structure. Thus, we will
use the Page-Rank algorithm to discover each account’s impor-
tance and find the core accounts in this group. If the account’s
ranking position is higher, it is possibly one of the core members.
If this group is a collusive group, our method will identify those
core accounts that very possibly are attributable to the primary
fraudsters. After calculating the ranking position of accounts, we
have to decide how many accounts (top-k) can represent core ac-
counts in different groups. Because the user inputs different ac-
counts, our system will collect related account numbers Fig. 4. 2-Core transaction.

according to the transaction structure of the accounts so entered.


We use the concept of SNA indicators, k-core, to decide k mini-
(B, G, F, E) and (T, S, U, V). Thus, the standard Page-Rank algo-
mum. The minimum k is 3 because a group forms a 2-core sub-
rithm only can find which accounts are core accounts in the
group from at least three accounts. If the group numbers
group but cannot determine the difference among 1-core
increase every 10 accounts, k will add 1 in order to avoid only three
accounts.
accounts representing the first transaction date of the bigger sub-
2. The Auction Fraud Rank algorithm which we adapted from the
group. The account and k value are as shown in Table 1.
standard Page-Rank algorithm can reveal the underlying risk of
In the previous section, we did not determine all 2-core account
the traders. If (B, E, F, G) are collusive members, their Date_Dif-
to be core accounts because the normal accounts may be in the 2-
ferencei and Crediti are less than those of normal accounts.
core subgroup. If normal accounts are in the 2-core subgroup, the
Using the Auction Fraud Rank algorithm, their Date_Differencei
normal accounts’ first transaction dates will be very different. This
and Crediti are small, and the Auction Fraud Rank value is rela-
would influence the average of the first transaction date. On the
tively high. Meanwhile, Crediti can lower the influence of Differ-
other hand, we do not consider 3-core or 4-core accounts as core
encei, to the Auction Fraud Rank value. Because some collusive
accounts because not every collusive group will have a transaction
groups may stretch layout time to avoid being easily discov-
structure above 3-core. In the real auction environment, it is
ered, the first transaction date will not be so close, but the col-
possible that collusive groups are the hub type. It is hard to decide
lusive groups’ reputation will still be small.
k-core value. However, if we use the Page-Rank algorithm to calcu-
3. Corei also plays an important role. If the account is one of the 2-
late Page-Rank values and rank positions, we are not restricted by
core subgroup’s members, this account is more dangerous than
the transaction structure. Even though the execution time of Page-
the 1-core subgroup’s accounts. If this account is not a collusive
Rank is high, it does not require a lot of resources to implement in
account, we can use Date_Differencei and Crediti to lower the
our system.
sensitivity of Corei to influence the Auction Fraud Rank value.
We use the example to explain the Auction Fraud Rank algo-
rithm. As Fig. 4 shows, we suppose that A, B, C, D, E, F, G, J, K, W,
X, Y, Z are collusive accounts and T, S, U, V are normal accounts. 4. Experiment
In the study of Pandit et al. (2007), the accounts in the collusive
group are split into two categories: fraud and accomplice. It means 4.1. Data collection
that in the collusive group, most of the 1-core subgroup, which in-
cludes (B, G, F, E), is the accomplice and most of the 2-core sub- We reviewed the online auction sites to collect information
group, which includes (A, C, D, J, K), is the fraud. In addition, we about users and their transactions for implementation of a web
consider that accomplices and fraudsters can behave like legiti- crawler. The web crawler is used to collect bidding data from
mate users and sometimes interact with other honest users to pre-
vent the group being discovered as fraudulent. Why not directly
use the standard Page-Rank algorithm to rank the position of col-
lusive accounts? And why do we use these three parameters?
The reasons are as follows:

1. If we use the standard Page-Rank algorithm to calculate the


Page-Rank value of each account, the Page-Rank value of each
account is the same among 1-core subgroups which include

Table 1
How to decide Top-k.

Account numbers Top-k value


10 3
20 4
30 5
40 6
... ...
Fig. 5. The black list of Yahoo auction site in Taiwan.
S.-J. Lin et al. / Expert Systems with Applications 39 (2012) 9079–9086 9083

Crawler Agent determines whether the current has 2-core


accounts after searching the second layer. If the current struc-
ture has 2-core accounts, the Auction Crawler Agent will focus
on them while searching the next layer. This step will loop until
the Agent does not find 2-core accounts. Third, the Auction
Crawler Agent will determine whether the 2-core accounts have
listing feedbacks. If our record reveals listing feedbacks for 2-
core accounts, the Auction Crawler Agent will focus on these
accounts while searching the next layers and the system will
go back to the second step. The stop condition of the Agent is
that the transactional structure does not have a 2-core sub-
group and our records do not include listing feedbacks of 2-core
accounts or the system has been searching to the third layers.
 Auction Crawler Agent is developed by Java. Our system offers
an interface to the user, whereby the user can simply input an
account number.
 After the Auction Crawler Agent collects accounts, we can
Fig. 6. The black list of Ruten auction site in Taiwan. reconstruct the network graph of transactions between
accounts.

Yahoo Auction Site (http://www.tw.bid.yahoo.com) and Ruten We know that the 2-core indicator can determine the parts of a
Auction Site (http://www.ruten.com.tw) which are the biggest collusive group (Wang & Chiu, 2005, 2008), but if we only consider
auction sites in Taiwan. As Figs. 5 and 6 shows, both of these two the 2-core indicator, it will bring the high false-negative results. In
auction sites will publish the black list of accounts periodically. the data collection, we’ll make sure that the case is in the collusive
The six black accounts that have collusive fraud behaviors and fraud group. According to the definition of collusive fraud group,
in the black list were chosen by manual verified. The crawler will the malicious auctioneers create multiple accounts and manipulate
start crawling from these six black accounts to find out the their transactions to create reputations.
black-related transaction network members and use to train our
ranking method. Finally, the 259 related members were found in 4.2. Experiment design
the six black-related networks. As Fig. 7 shows, the one of the
black-related transaction networks has the 2-core topology. In the experiment, we apply the ANFIS to evaluate the risk of
Details of the crawler implementation are as follows: each account in the group. Fig. 8 shows the model used in this re-
search for fraud data identification. The ANFIS takes three inputs
 We propose an Auction Crawler Agent algorithm that can (Date_difference, Credit, Auction Fraud Rank value) and has one out-
unearth suspicious network patterns. We use the concept of a put (the account’s risk).
2-core clustering algorithm to design our searching path for To realize this, we developed an ANFIS with five layers. The AN-
the Agent in order to capture the potential collusive group com- FIS approach uses Gaussian functions for fuzzy sets, Takagi and Su-
pletely and immediately. First, the Auction Crawler Agent geno’s fuzzy algorithm, and singleton fuzzy rule for the if-then
crawls the account data and record accounts that leave feed- rules. The initial parameters of the network are the mean and
back in the listing in a breadth-first fashion. Second, the Auction standard deviation of the membership functions (antecedent

Fig. 7. The topology of the black-related transaction network.


9084 S.-J. Lin et al. / Expert Systems with Applications 39 (2012) 9079–9086

Fig. 8. ANFIS model of fuzzy interference.

parameters) and consequent parameters. The output of each rule is Precision ¼ TP=TP þ FP ð8Þ
a constant term. The final output is the weighted average of each
rule’s output, and it represents the risk of the account. The ANFIS Recall ¼ TP=TP þ FN ð9Þ
learning algorithm is used to obtain and adjust the parameters of
the fuzzy terms and consequent output. The rule parameters are
5. Results and discussion
recursively updated until an acceptable error is reached. We apply
the back-propagation and hybrid algorithm separately to compare
5.1. System performance
which algorithm will has the better accuracy. The output error is
defined as follows:
X In this section, we will discuss the performance of our AFR rank-
E¼ ðdj  yj Þ2 ð6Þ ing method in the learning algorithms. Compare the precision and
j recall rates between the back-propagation learning algorithm and
hybrid algorithm, it shows the results in Tables 3 and 4. We found
where dj represents the real value (if the accountj is a collusive
that the recall rate of our system on average is 98.3% and the aver-
fraud account, the value is equal to be 1; otherwise, if the accountj
age F-measure is 92.2%. These results show that our approach is
is a normal account, the value is equal to be 1) and yj is the system
effective for collusive group detection in online auctions.
output value.
In the experimental design, we collected collusive fraud group
5.2. Comparison with other studies
cases in advance. We randomly chose two cases as testing data
and others as training data. We used training data to train rules
For comparison with other studies, we applied Wang and Chiu
and parameters of ANFIS and used testing data to validate the sys-
(2005) approach to our collected data. The results are shown in
tem effectiveness. We repeated this experiment three times. Then
Table 5.
we predicted whether the account is a collusive account when the
As we can see in Table 5, the precision rate is above 70%, but the
output of system is higher than the threshold. We used mean value
recall rate is not stable in some data. The reason is that if we only
(the dangerous mean of the training data) – standard deviation
use 2-core accounts to detect fraudsters, we will miss 1-core ac-
(the dangerous standard deviation of the training data) as the
counts. On the other hand, 1-core accounts are possible as poten-
threshold to contain most of the collusive accounts. Table 2 is a
tial collusive accounts. Thus, the Wang et al. approach will have
confusion matrix that summarizes the number of instances pre-
high false-negative results.
dicted correctly or incorrectly by such a classification model.
In order to improve the precision rate, Wang and Chiu (2008)
We used the harmonic mean of precision P and recall R as the
use the robbery algorithm to find central accounts of the SNA
effectiveness of the total system (Shaw, 1997):
map. The robbery algorithm is as follows:
F  measure ¼ 2=ð1=precision rate þ 1=recall rateÞ ð7Þ
 The center measurement starts with one account’s linking
Because high values of F-measure occur only when both precision P
degrees as initial weights or starts with initial weights 1.
and recall R are high, F-measure is the objective measure. Precision
and Recall are as follows:

Table 2
A confusion matrix for a binary classification problem.

Actual class
Predicted class Collusive account Normal account
Collusive account tp (true positive) fp (false positive)
Normal account fn (false negative) tn (true negative)
S.-J. Lin et al. / Expert Systems with Applications 39 (2012) 9079–9086 9085

Table 3 Performance Comparing


Precision and recall rate from using back-propagation learning algorithm. 120%
Data set Precision rate (%) Recall rate (%) F-measure (%)
100%
Wang et al. 2-core
No. 1 80 100 89
No. 2 94.12 100 96.97 80%

F-measure
Wang et al. 2-core
No. 3 90.91 100 95.24
60% with centers
No. 4 90.91 100 95.24
No. 5 100 100 100 Back-propagation
40% learning algorithm
No. 6 90 90 90
20% Hybrid algorithm

0%
Table 4 Data set no.1 no.2 no.3 no.4 no.5 no.6
Precision and recall rate from using hybrid algorithm.

Data set Precision rate (%) Recall rate (%) F-measure (%) Fig. 9. The performance compared to the related approaches.

No. 1 72.73 100 84.21


No. 2 94.12 100 96.97
No. 3 81.82 90 85.71 Finally, we compared the approach of Wang and Chiu (2008) to
No. 4 90.91 100 95.24 ours. The performance of F-measures is shown as Fig. 9. According
No. 5 100 100 100 to the figure, we can find the performance of our approach is high-
No. 6 83.33 100 90.9 er than that of the approach of Wang et al. Both of the back-prop-
agation and hybrid learning algorithms have good performance in
detecting the fraudster accounts. The back-propagation learning
Table 5 algorithm is recommended to use to build the model. In addition,
Precision rate and recall rate of Wang et al.’s approach. the hybrid learning algorithm has a good performance too.
Data set Precision rate (%) Recall rate (%) F-measure (%)
No. 1 78 78 78 6. Conclusions
No. 2 84.61 75 78.57
No. 3 72.7 100 84.21
No. 4 100 42.11 57.14 In summary, we propose a new ranking method for collusive
No. 5 100 85.71 92.31 group detection in online auctions and applied it to a real-world
No. 6 100 40 57.14 online auction site. The ranking method includes new indicators
that are the Date Difference and the AFR value more than previous
studies (Wang and Chiu (2008)), so this method can provide better
 If an account has higher degree (is stronger) than its neighbors, performance than the previous approaches. In future work, we will
it will steal all degrees from neighbors. All the weights origi- expand the data sets and add more indicators and learning algo-
nally associated with the ‘‘weak’’ nodes in the graph/network rithms to fit the fraudsters’ fast-changing behaviors.
will be reduced to zero.
References
This center algorithm generates two groups of accounts from
our testing data; one group of accounts was ‘‘robbed’’ to have cen- Borgatti, S. (2010). What is social network analysis? Available from: <http://
tered degree weight drop down to zero. The other group has the www.analytictech.com/networks/whatis.htm> Retrieved 07.04.10.
Brin, S., & Page, L. (1998). The anatomy of a large-scale hypertextual web search
centered degree weights greater than zero and is identified as
engine⁄ 1. Computer Networks and ISDN Systems, 30(1–7), 107–117.
potentially suspicious accounts using the method of Wang and Clarke, E. M., & Wing, J. M. (1996). Formal methods: State of the art and future
Chiu (2008). On the other hand, we also applied the robbery algo- directions. ACM Computing Surveys (CSUR), 28(4), 643.
rithm to evaluate our test data. The result of Wang et al.’s approach Cooley, R., Mobasher, B., & Srivastava, J. (1997). Web mining: Information and
pattern discovery on the world wide web. In ICTAI, 0558.
with centers is shown as Table 6. Cooley, R., Mobasher, B., & Srivastava, J. (1999). Data preparation for mining world
As we can see in Table 6, the robbery algorithm can help to im- wide web browsing patterns. Knowledge and Information Systems, 1(1), 5–32.
prove precision rate, but the recall rate drops dramatically. The Dong, F., Shatz, S. M., & Xu, H. (2009b). Inference of online auction shills using
Dempster-Shafer theory. In Proceedings of the 6th international conference on
reason is that accounts which are bigger than zero are almost 2– information technology: New generations (ITNG 2009) (pp. 908–914).
3 accounts. Although we can correctly identify fraudulent ac- Dong, F., Shatz, S. M., & Xu, H. (2009a). Combating online in-auction fraud: Clues,
counts, the false positive rate is too high. In data set no. 3, we used techniques and challenges. Computer Science Review, 3(4), 245–258.
Everett, M. G. (1982). Graph theoretic blockings k-plexes and k-cutpoints. Journal of
the robbery algorithm to detect malicious accounts, and the result Mathematical Sociology, 9(75–84), 448.
is that only one account’s value is bigger than zero and this account Everett, M. G., & Borgatti, S. P. (1998). Analyzing clique overlap. Connection, 21(1),
is not a collusive account. Therefore, the precision rate and recall 49–61.
Garton, L., Haythornthwaite, C., & Wellman, B. (1999). Studying on-line social
rate of this data set are both zero. networks. Doing Internet Research: Critical Issues and Methods for Examining the
Net, 1999, 75.
Griggs, B. (2003). Identity theft, other cybercrimes show no sign of slowing in Utah.
Knight Ridder Tribune Business News, 1.
Han, K.-H. (2003). Brother and Sister partners on the net, Scam NT$ 5 Million on Yahoo
Table 6
auction. United Evenings.
Precision rate and recall rate of Wang et al.’s approach with centers.
Hanneman, R. A., & Riddle, M. (2005). Introduction to social network methods.
Data set Precision rate (%) Recall rate (%) F-measure (%) Riverside, CA: University of California.
Jamali, M., & Abolhassani, H. (2006). Different aspects of social network analysis. In
No. 1 100 33.33 50 IEEE/WIC/ACM international conference on web intelligence (WI 2006) (pp. 66–72).
No. 2 100 11.76 21 Johnson, D. S., & Trick, M. A. (1996). Cliques, coloring, and satisfiability: Second
No. 3 0 0 0 dimacs implementation challenge. American Mathematical Society, 26, 11–52.
No. 4 100 18.18 30.77 Liu, K., & Terzi, E. (2008). Towards identity anonymization on graphs. In Proceedings
No. 5 100 14.29 25 of the ACM SIGMOD international conference on management of data (pp. 93–106).
No. 6 100 11.11 20 Mislove, A., Gummadi, K. P., & Druschel, P. (2006). Exploiting social networks for
internet search. In Burning (p. 79).
9086 S.-J. Lin et al. / Expert Systems with Applications 39 (2012) 9079–9086

Pandit, S., Chau, D. H., Wang, S., & Faloutsos, C. (2007). Netprobe: A fast and scalable Wang, J. C., & Chiu, C. (2005). Detecting online auction inflated-reputation behaviors
system for fraud detection in online auction networks. In Proceedings of the 16th using social network analysis. In Annual conference of the North American
international conference on, world wide web (p. 210). association for computational social and organizational science (NAACSOS 2005).
Pujol, J. M., Sangüesa, R., & Delgado, J. (2002). Extracting reputation in multi agent Wang, J. C., & Chiu, C. C. (2008). Recommending trusted online auction sellers using
systems by means of social network topology. In Proceedings of the first social network analysis. Expert Systems with Applications, 34(3), 1666–1679.
international joint conference on autonomous agents and multiagent systems: Part Wasserman, S., & Faust, K. (1994). Social network analysis: Methods and applications.
1 (pp. 467–474). New York: Cambridge University Press.
Rubin, S., Christodorescu, M., Ganapathy, V., Giffin, J. T., Kruger, L., Wang, H., et al. Wellman, B. (1996). For a social network analysis of computer networks: A
(2005). An auctioning reputation system based on anomaly. In Proceedings of the sociological perspective on collaborative work and virtual community. In
12th ACM conference on computer and communications, security (p. 279). Proceedings of the 1996 ACM SIGCPR/SIGMIS conference on computer personnel
Scott, J. (2002). Social networks: Critical concepts in sociology. Routledge Publication research (p. 11).
Inc. Wilke, J. R., & Wingfield, N. (2003). Leading the news: Fraud crackdown hits web
Shaw, W. (1997). Performance standards and evaluations in IR test collections: auctions – federal and state officials to make arrests, file suits EBay cooperates
Cluster-based retrieval models. Information Processing & Management, 33(1), in probe. Wall Street Journal, A.3.
1–14. Wu, W., Xiao, Y., Wang, W., He, Z., & Wang, Z. (2010). k-Symmetry model for
Srivastava, J., Cooley, R., Deshpande, M., & Tan, P. N. (2000). Web usage mining: identity anonymization in social networks. In Proceedings of the 13th
Discovery and applications of usage patterns from web data. ACM SIGKDD international conference on extending database technology.
Explorations Newsletter, 1(2), 23. Xu, H., & Cheng, Y. T. (2007). Model checking bidding behaviors in internet
Trevathan, J., & Read, W. (2007). Detecting collusive shill bidding. In Fourth concurrent auctions. International Journal of Computer Systems Science &
international conference on information technology, 2007, ITNG’07 (pp. 799–808). Engineering, 22(4), 179–191.
Trevathan, J., & Read, W. (2009). Detecting shill bidding in online English auctions. Zhou, B., & Pei, J. (2008). Preserving privacy in social networks against
Handbook of research on social and organizational liabilities in information neighborhood attacks. In Proceedings of the international conference on data,
security. Information Science Reference (pp. 446–470). PA, USA: Hershey. engineering (ICDE’08) (pp. 506–515).

You might also like