Professional Documents
Culture Documents
Combining ranking concept and social network analysis to detect collusive groups
in online auctions
Shi-Jen Lin a, Yi-Ying Jheng a, Cheng-Hsien Yu a,b,⇑
a
National Central University, Jhongli City, Taoyuan, Taiwan
b
China University of Technology, Hukou, Hsinchu, Taiwan
a r t i c l e i n f o a b s t r a c t
Keywords: Due to the popularity of online auction markets, auction fraud has become common. Typically, fraudsters
Ranking method will create many accounts and then transact among these accounts to get a higher rating score. This is
Collusive group detection easy to do because of the anonymity and low fees of the online auction platform. A literature review
Online auction revealed that previous studies focused on detection of abnormal rating behaviors but did not provide a
ranking method to evaluate how dangerous the fraudsters are. Therefore, we propose a process which
can provide a method to detect collusive fraud groups in online auctions. First, we implement a Web
crawling agent to collect real auction cases and identify potential collusive fraud groups based on a k-core
clustering algorithm. Second, we define a data cleaning process to remove the unrelated data. Third, we
use the Page-Rank algorithm to discover the critical accounts of the groups. Fourth, we developed a rank-
ing method for auction fraud evaluation. This method is an extension of the standard Page-Rank algo-
rithm and combines the concepts of Web structure and risk evaluation. Finally, we conduct
experiments using the Adaptive Neuro-Fuzzy Inference System (ANFIS) neural network and verify the
performance of our method by applying it to real auction cases. In summary, we find that the proposed
ranking method is effective in identifying potential collusive fraud groups.
2012 Elsevier Ltd. All rights reserved.
1. Introduction nymity and the low online auction fees to create multiple accounts
and increase their rating scores via sham transactions. In this way,
Due to the popularity of the online auction markets, auction they can deceive the buyer with their high rating score. Related
fraud has become common. According to an Internet Fraud Report studies also show that the most severe cases of Internet fraud,
(2003–2009) issued by the Internet Crime Complaint Center (IC3), accounting for millions of dollars of loss, were reported by those
a joint operation between the FBI and the National White Collar who were misled by evaluation feedback (Griggs, 2003; Han,
Crime Center (NW3C), the number of complaints about Internet 2003; Wilke & Wingfield, 2003).
fraud increased from 1400 per month in 2001 to 22,940 per month Despite these facts, previous studies on Internet auction fraud
in 2008. From January 1, 2008 to December 31, 2008, IC3 received a focused on the detection of abnormal rating behaviors of known
total of 275,284 complaints, representing a 33.1% increase over the auction fraud groups (Wang & Chiu, 2008) or focused on the
previous year. The total amount lost increased from $17 million in classification of types of auction fraud (Pandit, Chau, Wang, &
2001 to $264 million in 2008. The total dollar loss linked to online Faloutsos, 2007). Thus, they did not rank the degree of suspicion
fraud in 2008 was $264 million, about $25 million more than in associated with each account of an auctioneer group. Furthermore,
2007. Regarding the variety of fraud types, non-delivery of mer- they did not provide an easy way for users to obtain the relevant
chandise ranked number one (32.9%). Internet auction fraud was information to evaluate collusive groups.
the second most reported offense (25.5%) (IC3, 2009). These figures Therefore, we propose a process which can provide a ranking
indicate that online auction fraud causes significant losses. method to evaluate collusive fraud groups in online auctions. First,
This type of fraud is actually facilitated by the seller reputation we implement a Web crawling agent to collect real auction cases
rankings displayed on the most popular online auction sites, such and identify potential collusive fraud groups based on the k-core
as Yahoo and eBay. Because these rankings are based on a simple clustering algorithm. Second, we define a data cleaning process
feedback mechanism, auctioneers can take advantage of the ano- to remove the unrelated data. Third, we use the Page-Rank algo-
rithm to discover the critical accounts of the groups. Fourth, we
⇑ Corresponding author at: China University of Technology, Hukou, Hsinchu, provide a ranking method for auction fraud evaluation. This meth-
Taiwan. od is extended from the standard Page-Rank algorithm and com-
E-mail addresses: sjlin@mgt.ncu.edu.tw (S.-J. Lin), chyu@cute.edu.tw (C.-H. Yu). bines the concepts of Web structure and risk evaluation. Finally,
0957-4174/$ - see front matter 2012 Elsevier Ltd. All rights reserved.
doi:10.1016/j.eswa.2012.02.039
9080 S.-J. Lin et al. / Expert Systems with Applications 39 (2012) 9079–9086
we conduct experiments using the Adaptive Neuro-Fuzzy Infer- Web structure mining categorizes Web pages and generates re-
ence System (ANFIS) neural network and verify the performance lated patterns, such as the similarity and the relationships between
of our method by applying it to real auction cases. In summary, different Web sites.
we find that the proposed ranking method is effective in identify- Recently, many researchers have applied social network analy-
ing a potential collusive fraud group. sis (SNA) combined with the Google’s Page-Rank algorithm, the
Hyperlink-induced Topic Search (HITS) algorithm, or other page-
rank algorithms to generate useful information for users (Jamali
2. Literature review
& Abolhassani, 2006; Mislove, Gummadi, & Druschel, 2006; Pujol,
Sangüesa, & Delgado, 2002). Thus, we also use SNA as the frame
In this section, we survey the related approaches for online auc-
and combine ranking algorithms to evaluate the importance of
tion fraud detection and then review the social network analysis
an account in the transactional group.
which is used to analyze the auctioneers’ interactions. We also re-
The Page-Rank vector of citation ranking weight was introduced
view the related work on the Page-Rank algorithm which will be
by Brin and Page (1998). The idea introduces a notion of page
applied to our Auction Fraud Rank algorithm.
authority, which is independent of the page content. Iterative
graph-based ranking algorithms provide a way of deciding the
2.1. Fraud detection in online auctions importance of a vertex within a graph and of deciding the impor-
tance of a page on the Web. Page-Rank is defined as:
According to the literature, fraud detection approaches can be
classified into three kinds of methods: statistical methods, data ð1 dÞ X
PRðuÞ ¼ þd PRðv Þ=Nv ð1Þ
mining techniques and formal methods (Dong, Shatz, & Xu, N v 2BðuÞ
2009a). With regard to statistical methods, Rubin et al. (2005) pro-
pose a new reputation system for auction sites to help users pro- where u represents a web page; d is a dampening factor that is usu-
tect their interests by warning them of the risk of auction fraud. ally set to 0.85; B(u) is the set of pages that point to u; PR(u) and
The model uses three variables (average number of bids, average PR(v) are rank scores of page u and v, respectively; Nv denotes the
minimum starting bid, and bidders’ profiles) to identify whether number of outgoing links of page v; and c is a factor used for
an account shows activity typical of shill biddings (Rubin et al., normalization.
2005). Other researchers also use statistical methods to detect The Page-Rank algorithm states that a page that is linked to
behaviors of fraudsters (Dong, Shatz, & Xu, 2009b; Trevathan & many pages with high Page-Ranks receives a high rank itself. A
Read, 2007, 2009). With regard to data mining techniques, Pandit popular web page is often referred to by other pages and that an
et al. (2007) propose the NetProbe approach, which models auc- important web page contains a high number of out-links. Further-
tioneers with their transactions as a Markov Random Field to de- more, the structure of collusive transactions conforms to the char-
tect the fraudsters’ suspicious patterns. It also constructs a belief acteristics of the Page-Rank algorithm. An account that has many
propagation mechanism to detect similar fraudsters (Pandit et al., transactions in the collusive group is considered an important ac-
2007). Formal notation and logic are used in mathematically rigor- count; if a page has important links connected to it, its out-linking
ous techniques for the specification, development, and verification pages will also become important pages. Thus, the Page-Rank algo-
of software and hardware designs. Xu and Cheng (2007) and Clarke rithm can be used to discover authoritative and important ac-
and Wing (1996) use a formal approach to detect shilling counts. We will modify the Page-Rank algorithm to build our
behaviors. Auction Fraud Rank algorithm to find the potential risk of accounts
Recently, Wang and Chiu (2008) used social network analysis in the group.
(SNA) indicators based on k-core and centered-weights algorithms
to detect auction fraud groups because of the normal transactions 2.3. Social network analysis
do not have a complicated relationship. However, if collusive auc-
tioneers are to manipulate their reputation, they must have trans- SNA is the study of social relations among a set of actors, and
actions among themselves. Thus, collusive auctioneers’ actors can be individuals, organizations, or communities (Borgatti,
transactions have a complicated relationship and have the high- 2010). It is used widely in the social and behavioral sciences, as
core characteristic in the transactional structure. Taking a struc- well as in political science, economics, organizational science,
tural perspective by focusing on the relationships between traders and industrial engineering (Garton, Haythornthwaite, & Wellman,
rather than their attribute values, Wang and Chiu used k-core and 1999; Wellman, 1996). The network perspective has proved fruit-
centered-weights algorithms to create a collaborative-based rec- ful in a wide range of social and behavioral science disciplines
ommendation system that could suggest risks of collusion associ- (Wasserman & Faust, 1994). The basic components of an SNA study
ated with an account (Wang & Chiu, 2005, 2008). are the actor or node and the connection or link (Borgatti, 2010).
The nodes can be individual, corporate, or collective social units,
2.2. Page-Rank algorithm and nodes are linked to one another by ties (Wasserman & Faust,
1994).
Web mining is the application of data mining techniques to dis- Divisions of actors into subgroups can be a very important as-
cover patterns from the Web. It also could be generalized into pect of social structure that can help explain how the network as
three different types: Web usage mining, Web content mining, a whole is likely to behave. A subgroup can be identified by the
and Web structure mining (Cooley, Mobasher, & Srivastava, 1997, measurements of clique, k-plex, and k-cores (Everett, 1982; Everett
1999; Srivastava, Cooley, Deshpande, & Tan, 2000). Web usage & Borgatti, 1998; Hanneman & Riddle, 2005; Liu & Terzi, 2008; Wu,
mining ascertains user profiles and the users’ behaviors recorded Xiao, Wang, He, & Wang, 2010; Zhou & Pei, 2008). The strongest
inside the Web log file. Web content mining is the application of possible clique is a given number of actors who have all possible
data mining techniques to unstructured or semi-structured data, ties present among themselves. A maximal complete sub-graph
usually HTML documents. Web content mining attempts to dis- is such a grouping, expanded to include as many actors as possible
cover useful information from Web content. Web structure mining (Johnson & Trick, 1996), as shown in Fig. 1.
analyzes the hyperlink structure of the Web to discover relation- k-Plex is an alternative way of relaxing the strong assumptions
ships between Web pages. Based on the topology of the hyperlinks, of the maximal complete sub-graph. k-Plex allows that actors may
S.-J. Lin et al. / Expert Systems with Applications 39 (2012) 9079–9086 9081
If the Auction Fraud Rank value is close to 10, it means that the
relevance between the account and the subgroup is very high;
otherwise, if the value is equal to 0, the relevance is very low, so
that the account might be ranked behind.
The mathematic definition of Date_Differencei is:
X
k
Date Differencei ¼ Datei Datej =k ð5Þ
j¼1
Table 1
How to decide Top-k.
Yahoo Auction Site (http://www.tw.bid.yahoo.com) and Ruten We know that the 2-core indicator can determine the parts of a
Auction Site (http://www.ruten.com.tw) which are the biggest collusive group (Wang & Chiu, 2005, 2008), but if we only consider
auction sites in Taiwan. As Figs. 5 and 6 shows, both of these two the 2-core indicator, it will bring the high false-negative results. In
auction sites will publish the black list of accounts periodically. the data collection, we’ll make sure that the case is in the collusive
The six black accounts that have collusive fraud behaviors and fraud group. According to the definition of collusive fraud group,
in the black list were chosen by manual verified. The crawler will the malicious auctioneers create multiple accounts and manipulate
start crawling from these six black accounts to find out the their transactions to create reputations.
black-related transaction network members and use to train our
ranking method. Finally, the 259 related members were found in 4.2. Experiment design
the six black-related networks. As Fig. 7 shows, the one of the
black-related transaction networks has the 2-core topology. In the experiment, we apply the ANFIS to evaluate the risk of
Details of the crawler implementation are as follows: each account in the group. Fig. 8 shows the model used in this re-
search for fraud data identification. The ANFIS takes three inputs
We propose an Auction Crawler Agent algorithm that can (Date_difference, Credit, Auction Fraud Rank value) and has one out-
unearth suspicious network patterns. We use the concept of a put (the account’s risk).
2-core clustering algorithm to design our searching path for To realize this, we developed an ANFIS with five layers. The AN-
the Agent in order to capture the potential collusive group com- FIS approach uses Gaussian functions for fuzzy sets, Takagi and Su-
pletely and immediately. First, the Auction Crawler Agent geno’s fuzzy algorithm, and singleton fuzzy rule for the if-then
crawls the account data and record accounts that leave feed- rules. The initial parameters of the network are the mean and
back in the listing in a breadth-first fashion. Second, the Auction standard deviation of the membership functions (antecedent
parameters) and consequent parameters. The output of each rule is Precision ¼ TP=TP þ FP ð8Þ
a constant term. The final output is the weighted average of each
rule’s output, and it represents the risk of the account. The ANFIS Recall ¼ TP=TP þ FN ð9Þ
learning algorithm is used to obtain and adjust the parameters of
the fuzzy terms and consequent output. The rule parameters are
5. Results and discussion
recursively updated until an acceptable error is reached. We apply
the back-propagation and hybrid algorithm separately to compare
5.1. System performance
which algorithm will has the better accuracy. The output error is
defined as follows:
X In this section, we will discuss the performance of our AFR rank-
E¼ ðdj yj Þ2 ð6Þ ing method in the learning algorithms. Compare the precision and
j recall rates between the back-propagation learning algorithm and
hybrid algorithm, it shows the results in Tables 3 and 4. We found
where dj represents the real value (if the accountj is a collusive
that the recall rate of our system on average is 98.3% and the aver-
fraud account, the value is equal to be 1; otherwise, if the accountj
age F-measure is 92.2%. These results show that our approach is
is a normal account, the value is equal to be 1) and yj is the system
effective for collusive group detection in online auctions.
output value.
In the experimental design, we collected collusive fraud group
5.2. Comparison with other studies
cases in advance. We randomly chose two cases as testing data
and others as training data. We used training data to train rules
For comparison with other studies, we applied Wang and Chiu
and parameters of ANFIS and used testing data to validate the sys-
(2005) approach to our collected data. The results are shown in
tem effectiveness. We repeated this experiment three times. Then
Table 5.
we predicted whether the account is a collusive account when the
As we can see in Table 5, the precision rate is above 70%, but the
output of system is higher than the threshold. We used mean value
recall rate is not stable in some data. The reason is that if we only
(the dangerous mean of the training data) – standard deviation
use 2-core accounts to detect fraudsters, we will miss 1-core ac-
(the dangerous standard deviation of the training data) as the
counts. On the other hand, 1-core accounts are possible as poten-
threshold to contain most of the collusive accounts. Table 2 is a
tial collusive accounts. Thus, the Wang et al. approach will have
confusion matrix that summarizes the number of instances pre-
high false-negative results.
dicted correctly or incorrectly by such a classification model.
In order to improve the precision rate, Wang and Chiu (2008)
We used the harmonic mean of precision P and recall R as the
use the robbery algorithm to find central accounts of the SNA
effectiveness of the total system (Shaw, 1997):
map. The robbery algorithm is as follows:
F measure ¼ 2=ð1=precision rate þ 1=recall rateÞ ð7Þ
The center measurement starts with one account’s linking
Because high values of F-measure occur only when both precision P
degrees as initial weights or starts with initial weights 1.
and recall R are high, F-measure is the objective measure. Precision
and Recall are as follows:
Table 2
A confusion matrix for a binary classification problem.
Actual class
Predicted class Collusive account Normal account
Collusive account tp (true positive) fp (false positive)
Normal account fn (false negative) tn (true negative)
S.-J. Lin et al. / Expert Systems with Applications 39 (2012) 9079–9086 9085
F-measure
Wang et al. 2-core
No. 3 90.91 100 95.24
60% with centers
No. 4 90.91 100 95.24
No. 5 100 100 100 Back-propagation
40% learning algorithm
No. 6 90 90 90
20% Hybrid algorithm
0%
Table 4 Data set no.1 no.2 no.3 no.4 no.5 no.6
Precision and recall rate from using hybrid algorithm.
Data set Precision rate (%) Recall rate (%) F-measure (%) Fig. 9. The performance compared to the related approaches.
Pandit, S., Chau, D. H., Wang, S., & Faloutsos, C. (2007). Netprobe: A fast and scalable Wang, J. C., & Chiu, C. (2005). Detecting online auction inflated-reputation behaviors
system for fraud detection in online auction networks. In Proceedings of the 16th using social network analysis. In Annual conference of the North American
international conference on, world wide web (p. 210). association for computational social and organizational science (NAACSOS 2005).
Pujol, J. M., Sangüesa, R., & Delgado, J. (2002). Extracting reputation in multi agent Wang, J. C., & Chiu, C. C. (2008). Recommending trusted online auction sellers using
systems by means of social network topology. In Proceedings of the first social network analysis. Expert Systems with Applications, 34(3), 1666–1679.
international joint conference on autonomous agents and multiagent systems: Part Wasserman, S., & Faust, K. (1994). Social network analysis: Methods and applications.
1 (pp. 467–474). New York: Cambridge University Press.
Rubin, S., Christodorescu, M., Ganapathy, V., Giffin, J. T., Kruger, L., Wang, H., et al. Wellman, B. (1996). For a social network analysis of computer networks: A
(2005). An auctioning reputation system based on anomaly. In Proceedings of the sociological perspective on collaborative work and virtual community. In
12th ACM conference on computer and communications, security (p. 279). Proceedings of the 1996 ACM SIGCPR/SIGMIS conference on computer personnel
Scott, J. (2002). Social networks: Critical concepts in sociology. Routledge Publication research (p. 11).
Inc. Wilke, J. R., & Wingfield, N. (2003). Leading the news: Fraud crackdown hits web
Shaw, W. (1997). Performance standards and evaluations in IR test collections: auctions – federal and state officials to make arrests, file suits EBay cooperates
Cluster-based retrieval models. Information Processing & Management, 33(1), in probe. Wall Street Journal, A.3.
1–14. Wu, W., Xiao, Y., Wang, W., He, Z., & Wang, Z. (2010). k-Symmetry model for
Srivastava, J., Cooley, R., Deshpande, M., & Tan, P. N. (2000). Web usage mining: identity anonymization in social networks. In Proceedings of the 13th
Discovery and applications of usage patterns from web data. ACM SIGKDD international conference on extending database technology.
Explorations Newsletter, 1(2), 23. Xu, H., & Cheng, Y. T. (2007). Model checking bidding behaviors in internet
Trevathan, J., & Read, W. (2007). Detecting collusive shill bidding. In Fourth concurrent auctions. International Journal of Computer Systems Science &
international conference on information technology, 2007, ITNG’07 (pp. 799–808). Engineering, 22(4), 179–191.
Trevathan, J., & Read, W. (2009). Detecting shill bidding in online English auctions. Zhou, B., & Pei, J. (2008). Preserving privacy in social networks against
Handbook of research on social and organizational liabilities in information neighborhood attacks. In Proceedings of the international conference on data,
security. Information Science Reference (pp. 446–470). PA, USA: Hershey. engineering (ICDE’08) (pp. 506–515).