Professional Documents
Culture Documents
www.elsevier.com/locate/eswa
Abstract
With the prevalence of the Internet and e-commerce, the online exchange market, especially the online auction market develops very fast. The
activities of online auction produce a large number of transaction data. If utilized properly, these data can be of great benefit to sellers, buyers and
website administrator. Typically, the final price prediction results may help sellers optimize the selling price of their items and auction attributes.
At the same time, part of the information asymmetry problems may be solved for buyers. Thus, transaction time can be shortened and cost can be
saved. In this paper, we collect large amounts of historical exchange data from Eachnet, an online auction website most famous in China and use
machine learning algorithms and traditional statistical methods to forecast the final prices of auction items. We propose an attribute construction
method to overcome the problem that auction bid list changes dynamically. Some experiments are performed and the prediction results are
discussed to verify the proposed solution.
q 2005 Elsevier Ltd. All rights reserved.
dealing with the time-dependent dynamics of bidding. By airline tickets is unlimited. Several TAC competitors have
attribute selection and construction, traditional machine explored a range of methods for price prediction including
learning algorithms and statistical methods can be used to historical averaging, neural nets, and boosting. In the TAC, all
predict the final price of the online auction items. We also of work was performed with artificially generated data and did
conduct a comparison between the prediction qualities of not use any real auction data.
regressions with neural networks and show that clustering of In this paper, we focus on solving this problem on how to
data has a measurable effect on prices prediction result. overcome the dynamics of bidding which changes drastically
The rest of this paper is organized as follows. The Section 2 over time. On the basis of data collection, construction and
summarizes the related work to our research. In Section 3, we selection of attributes, prices prediction is done.
describe the Ebay–Eachnet’s auction mechanism, and introduce
the collection and preprocessing of exchange data. Construction 3. Ebay–Eachnet’s auction mechanism, data collection
and selection of attributes are also presented in detail. Section 4 and preprocessing
reports our experimental work. It provides the details of data set,
evaluation metrics, procedure and results of different exper- 3.1. Ebay–Eachnet’s auction mechanism
iments, as well as the discussion of the results. Section 5
provides some concluding remarks and directions for future Ebay–Eachnet is the largest and most successful consumer–
research. consumer auction site in China. It has possessed of 11.6 million
register users at the end of the first quarter in 2005. The
2. Related work exchange amounts exceeded 2500 million Yuan in 2004.
Similar to Ebay, Ebay–Eachnet runs a second price English
In this section, we briefly present some of research literature auction for a variety of consumer goods. The auction is ‘second
related to data analysis of online auctions and the final price or price’ because the winner pays the next highest bid, and they
the winning price prediction. are ‘English auctions’ because bids are ascending. The seller
Considerable work that applied traditional techniques to the can decide to have an auction last for 1, 3, 5, 7 or 10 days.
online auction analysis has been made in the economics domain. When a potential bidder views a current auction on EBay–
In the data mining and information systems field, there also Eachnet, they see a description of the item for sale. This
existed some researches on online auction. For example, Bapna description is up to the seller’s discretion, but is usually fairly
found that significant heterogeneity exists among the users of detailed and may include a picture of the item. The bidder also
electronic markets like Ebay and developed a stable taxonomy of can review the recent history of the seller and ratings by
biding behavior in online auctions (Bapna et al., 2004). Bajari and previous customers. Most importantly, the bidder observes the
Hortacsu (2003) explored the determinants of bidder and seller current high bid for the item. They also observe information on
behavior. The winner’s curse is that bidder pays a price higher other bidders for that auction including when they bid, but not
than the true value of the item. Lucking-Reiley analyzed the effect what their bid was. Lastly, all bidders know the exact time
of various Ebay features on the final price of auctions remained before the close of the auction.
(Lucking-Reiley, Bryan, Prasad, & Reeves, 2000). They found After observing this information, the bidder may decide to
that seller’s feedback ratings have an important effect on his bid on the item. The amount entered by the bidder is actually a
auction prices, with negative comments having a much greater ‘proxy’ bid. That is, EBay–Eachnet will take that bid and
effect than positive comments. However, these researches mainly automatically bid slightly above the next highest bidder up to
focused on the historical exchange data analysis. They only the amount entered. This system allows the bidder to leave the
described the past auction and did not involve the price prediction auction, without having to worry about being outbidden, as the
(Shah, Joshi, Sureka & Wurman, 2003). As for the researches to proxy system will keep on bidding automatically. However, if a
use the machine learning techniques to predict the final price of bidder is outbidding, then Ebay–Eachnet can notify them by
the online auction items, fewer can be found. email. The winner of the auction is the highest bidder at the end
There has been some work on price prediction of items in of the auction. The winner pays an amount slightly above the
online markets, e.g. airlines fares (Oren Etzioni, Rattapoom next highest bid.
Tuchinda, Craig A. Knoblock, & Alexander Yates, 2003). In addition, Ebay–Eachnet also runs ‘buy it now’ and
They explored how to predict ticket purchase price through ‘English auctionCbuy it now’. In this paper, only English
mining airfare data to minimize ticket purchase price. In the AI auction is studied.
community, literature on time-dependent product price
predition was fewer. The trading agent competition (TAC) 3.2. Data collection
(Wellman, Reeves, Lochner, & Vorobeychik, 2002; 2004) was
the only work that involved implicitly prices prediction in A lot of information on Ebay–Eachnet auctions is publicly
auctions. TAC relied on a simulator of airline, hotel, and ticket available. Ebay–Eachnet posts on its website the complete bid
prices and the competitors built agents to bid on these. TAC histories of closed auctions for at least one month after the ending
simulated flight prices using a stochastic process that followed date. We used a specially designed data collection program to
a random walk with an increasingly upward bias. In addition, gather the exchange data from Eachnet. The collection program
the TAC auction of airline tickets assumed that the supply of first constructs URL dynamically according to product ID. Then it
544 L. Xuefeng et al. / Expert Systems with Applications 31 (2006) 542–550
Table 1
A valid auction record
Seller ID ST ET SC HI CS DC DW PW LB BT FP
5844742 2004-1-27 2004-2-3 5 2 1 1 1 2 3 56 ¥32
17:00:00 17:00:00
Table 2
The bid process of a valid auction
sends query request and searches for the Web page that includes 1. Removing ‘buy it now’ auction records
keyword ‘exchanged’. Finally, the page is parsed to extract valid
data. The acquired valid data are stored in the relation database. There is a part of ‘buy it now’ auction data in our
The resulting database covers 36366 valid auction records and dataset. Since we are only interested in English auction in
154158 valid bid records from Eachnet during January 27, 2004 this study, so we remove this part of data and obtain a
to May 20, 2004. Each auction record consists of information of resulting experiment dataset consisting of 535 English
the seller and the auction itself. Moreover, each bid record auction records.
consists of the information of the buyer and the whole bid process.
Tables 1 and 2 show a valid auction and its bid records. Table 3 2. Dealing with the missing value
shows the meanings of each field.
Since there is no the missing value in our dataset, this step is
In this research, all Nokia mobile telephone data is extracted
skipped.
with no differentiating model and new degree from
the collected dataset, which may verify clustering of data
3. Data transformation
having effect on prediction result. The extracted dataset covers
578 valid auction records and 3547 valid bid records from For improving learning speed and making data in the
January 27, 2004 to May 16, 2004. comparable level, data transformation is done. The data
transformation technique used is Z-score as follows:
A
3.3. Data preprocessing v0Z vKsA , where v is the original value, A and sA is the
average value and standard error of attribute A, respectively.
Data preprocessing is a very important task to price For instance, Fig. 1 shows the distribution of the final prices
prediction. It involves the following steps. before and after transformation.
Table 3
The meanings of each field in the Tables 1 and 2
Table 1 Table 2
Field name The meanings of field Field name The meanings of field
ST The start time of auction Bidder ID ID of bidder
ET The end time of auction BC The credibility of bidders
SC The credibility of sellers BA The bid amount
HI Whether there is invoice or not DE The demand
CS Whether there is customer service or not AC The acquirement
DC Who charges the shipping charges Bid time The bid time
SW The shipping way
PW The payment way
LB The level of limited buyer credit
BT The bowered times of this web page
FP The final price
L. Xuefeng et al. / Expert Systems with Applications 31 (2006) 542–550 545
Fig. 1. The distribution of the final prices before (left) and after transformation (right).
3.4. Construction and selection of attributes SZ{SC}, SC denotes the creditability of the seller.
AZ{ST, ET, HI, CS, DC, SW, PW, LB, BT}, for the
meanings of each attribute, please see Table 3.
3.4.1. Construction of attributes ( )
To make the discussion more formal, consider an auction, k MINBC; MAXBC; AVGBC; STDEVBC; MINBA; MAXBA;
CZ ;
on Ebay–Eachnet, 1%k%535 in our dataset. Let A be the set of AVGBA; STDEVBA; MINBT; MAXBT; DT; BIDCOUNT
all auction records, AZ{ak}. Let B be the set of all bid records,
for the meanings of each attribute, please see Table 4.
BZ{bk}. Obviously, there is one to many relationship between
ak and bk, see Fig. 2. The selection of prediction attributes is one of the
Since the bid record number of each auction is variable, for important considerations for predicting the final price. In
effective learning, the preprocessing work must be carried out this paper, to select the independent variables, stepwise
so that there is only one unique bid process to correspond to regression and discriminant analysis are used. Through the
each auction. For using machine-learning algorithm to predict analysis, the following some attributes are omitted, namely,
the final price, the relationship between ak and bk must be {ET, SW, PW, LB, MINBC, MINBT}. To our surprise, two
changed one to one. Let J be the set of attributes of bid records, analysis methods both make attribute SC removed. However,
and bkj, j2J, denotes the cell value in the bid records related to according to the research of Lucking-Reiley et al. (2000),
auction k. Four statistics of some attributes are computed so seller’s creditability has an important effect on his auction
that one to many relationships is changed to one to one. See prices, so SC is kept back. Let D 0 denote the final
Table 4. DE and AC are omitted. Finally, 12 attributes are independent variables set, then
constructed.
Thus, Tables 1 and 2 may be denoted by Table 5.
D0 Z DKfET; SW; PW; LB; MINBC; MINBTg:
3.4.2. Selection of attributes
Let D be the prediction attributes set, S be the seller We define the dataset corresponding to D 0 as NOKIA.
attributes set, and A be the intrinsic attributes set of auction, C According to the foregoing statements, we have a
is the constructed attributes set, then DZSgAgC. formulization of prediction of the final price:
Table 4
The computing of statistic and construction of attributes
Table 5
A changed auction
4. The final price prediction addition, we also present that clustering data has an important
effect on price prediction. Our experimental dataset is NOKIA,
According to the attributes that were described in Section 3, described in previous section. It consists of 535 valid auction
the task now is to predict the final price of an auction item. This records.
problem can be tackled with machine learning algorithms in
several ways. We define the problem in two ways to compare 4.1. Evaluation metrics
the relative merits of each approach.
4.1.1. Continuous price prediction
1. Continuous price prediction
The basic metric we use to measure this performance is the
We treat the price prediction task as a regression task and residual value oi, which is defined by oi Z yi Ky^i , where yi is
use the training data to learn regression coefficients. We use the actual function or dependent feature value for case i and y^i
multivariate regression (MR) to predict a continuous price is the estimate value generated by the regression model. This is
value. similar to the residual value calculated in classical parametric
MR is a widely analysis technique owing to its ease of use regression but has the important difference since the examples
and intuitive theoretical basis (Mosteller & Tukey, 1977; Flury used are from out-of-sample data and not the examples used to
& Riedwyl, 1988). MR involves fitting a parametric function build the regression model. The average residual value over the
test data set is defined by
model to a set of data. In this sense, it is a form of inductive
supervised learning. 1X n
o Z ðy Ky^i Þ:
n iZ1 i
2. Discrete price prediction
The standard deviation of the residual values provides a
The discrete price prediction actually is a multi-class
notion of the distribution of the residuals about the mean. As
classification task. We discretize the final price into nine
we are interested only in relative performance of the model, the
intervals, as follows:
standard deviations of residual values for a set of competing
(0, 500], (500, 1000], (1000, 1500], (1500, 2000], (2000,
models are normalized such that the maximum standard
2500], (2500, 3000], (3000, 3500], (3500, 4000], (4000, 4500],
deviation is one. The standard deviation of the residual values
denoted by 1–9, respectively.
is denoted by
Thus, the price prediction can be regarded as a multi-class sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
classification problem. In this case, the output is a 500 range (?) 1 X n
Fig. 3. Histogram (left) and normal P–P plot of regression standardized residual (right).
of dependent variable through the regression relationship. In normal P–P plot of regression-standardized residual, respect-
practice, adjusted R square is used more often. ively, after certain training. As we can see from Fig. 3,
distribution of regression standardize residual is normal on
4.1.2. Discrete price prediction the whole. In Fig. 4, X-coordinate denotes the learning times
Since discrete price prediction is a classification problem, using 10 training sets and corresponding test sets,
we use accuracy metric to judge the performance of our Y-coordinate denotes adjusted R-square and standard devi-
experiments. For a set of test examples, accuracy metric is ation of the residual in the left and right, respectively. The
defined as the number of correct predictions over the total average value of R-square is 0.7944, and the average value
number of examples. of standard deviation of the residual is 0.3644. From the
results, model fit degree is not bad, but the residual is bigger.
the number of correct predictions
accuracy Z !100% Therefore, the result of continuous price prediction is not
the total number of examples good relatively. In fact, as we can see from Fig. 1, the
distribution of the final price is skewed, even after
transformation. This result might be caused by this fact.
4.2. Experimental results
4.2.2. Logistic regression and neural network
In all experiments, the dataset NOKIA was randomly split In the following experiments, logistic regression and BP
into a training set comprising 70% of examples, and a test set network method are used to classify the final price interval.
comprising 30% of examples by holdout method. Through
repeating 10 times, 10 training sets and corresponding test sets 1. Logistic regression
were produced. Finally, 10 times of simulation were
performed. Logistic regression is part of a category of statistical models
called generalized linear models. It allows one to predict a
4.2.1. Multivariate regression discrete outcome, such as group membership, from a set of
The experimental results are shown in Figs. 3 and 4. In variables that may be continuous, discrete, dichotomous, or a
Fig. 3, left side and right side denote the histogram and mix of any of these.
Fig. 6. Accuracy with BP and LR. In this section, firstly, clustering analysis is performed on
the data set NOKIA by k-means method. The number of cluster
is set to 2. The first cluster consists of 325 examples, and the
2. BP neural network second cluster consists of 210 examples. The distribution of the
final prices of each cluster is shown in Fig. 7. From the result,
The back-propagation (BP) neural network, a feed-forward clustering effect is obviously overall. Secondly, discrete price
multi-layer network based on back-propagation algorithm prediction is carried out for each cluster. The result is
developed by D. E. Rumelhart and J. L. McCelland in 1986, illustrated in Fig. 8.
has become one of the most widely used ANNs in practice. The According to Fig. 8, accuracy of BP network increases
activation transfer function (ATF) of a BP network, usually, is obviously, the average value of accuracy of cluster 1 and
Fig. 7. The distributions of the final prices cluster 1 (left) and cluster 2 (right).
L. Xuefeng et al. / Expert Systems with Applications 31 (2006) 542–550 549
cluster 2 is 94 and 95%, respectively. In contrary to BP According to the experimental results, we believe that
network, accuracy of logistic regression decreases, the average machine-learning algorithms, such as BP network, outper-
value of accuracy of cluster 1 and 2 is only 68.4 and 63.4% form traditional statistical models.
respectively. In this paper, to avoid the complexity, we do not select some
text and picture attributes, which may affect the prediction
4.4. Discussion accuracy.
Our future study will be made from two directions. One is to
From experimental results, we have some important apply the final price prediction to recommender systems based
observations. on online auction. In addition, we will extend the prediction
The effect of continuous price prediction is not good by model to different product types.
contrast with discrete price prediction. This result verifies the Recommender systems (Resnick, & Varian, 1997;
proved conclusion that multivariate regression was unfit to Schafer, Konstan, & Riedl, 1999) have been widely used
analyze the skewed data. Further, we think that discrete price in the B2C e-commerce context, such as Amazon website
prediction is more significant than continuous price prediction and so on. However, their applications under the C2C
in practice. environment are fewer. The prediction to the final price of
In the discrete price prediction, BP network holds the the auction items enables the online auction website to
promising prediction effect. Of course, the lack of examples suggest the final price of the auction item for the seller.
might lead to the experiment result of logistic regression not This will help the seller optimize some auction attributes.
good enough. Clustering of data has an important effect on the The suggested price may be regarded as bid reference for
prediction results. However, the results are significantly the buyers. Then the exchange time might be shortened and
different as to BP network and logistic regression. It makes the final price can be maximized. At the same time, the
accuracy of the former increases, while that of the latter website also could get maximum profit. As a part of project,
decreases. we will place the emphases on the following four concrete
In fact, the attributes we selected are not all attributes we aims in the future.
can get. For example, the attributes such as whether the auction
item has a picture or not, whether the seller has auction 1. Establish the interest models for the online auction
experience or not, and whether the seller has online shopping website users.
experience or not did not be included. If more attributes are 2. Study the influence importance of each attribute on the
used to predict, the prediction accuracy might be enhanced final price.
further. 3. Study how to enable the recommender systems to be
Since only one kind of data is used for prediction, whether applied effectively to the online auction.
the prediction results can be generalized to other auction items 4. Establish the intelligent recommender systems based on
is unproved in this paper. the online auction.
By attributes construction, this paper overcomes the Funding for this research was supported by the National
problem that auction bid list changes dynamically. The Science Foundation of China under grants No.70371004 and
preprocessing enables the machine learning algorithms and PhD program Foundation of Education Ministry of China
traditional statistical analysis methods to be applied to the contract number 20040006023. We would like to thank
final prices prediction. The favorable results are obtained. anonymous reviewers for their valuable comments.
550 L. Xuefeng et al. / Expert Systems with Applications 31 (2006) 542–550
References Mosteller, F., & Tukey, J. W. (1977). Data analysis and regression. Menlo
Park, CA: Addison -Wesley.
Oren, Etzioni, Rattapoom, Tuchinda, Craig, A. Knoblock, & Alexander, Yates
Bajari, P., & Hortacsu, A. (2003). Winner’s curse, reserve prices, and
(2003). To buy or not to buy: Mining airfare data to minimize ticket
endogenous entry: Empirical insights from Ebay auctions. The Rand
purchase price KDD 2003 (pp. 119–128).
Journal of Economics, 3(2), 329–355.
Resnick, P., & Varian, H. R. (1997). Recommender systems. Communications
Bapna, R., Goes, P., Gupta, A., & Jin, Y. (2004). User heterogeneity and its
of ACM, 40(3), C56–C58.
impact on electronic auction market design: An empirical explanation. MIS
Schafer, J. B., Konstan, J. A., & Riedl, J. (1999). Recommender systems in
Quarterly, 28(1).
E-Commerce. Proceedings of the first ACM conference on electronic
Flury, B., & Riedwyl, H. (1988). Multivariate statistics a practical approach. commerce, Denver, CO (pp. 158–166).
New York: Chapman & Hall. Shah, H. S., Joshi, N. R., Sureka, A. & Wurman, P. R. (2003). Mining for bidding
Lucking-Reiley, D., Bryan, D., Prasad, N. & Reeves, D. (2000). Pennies from strategies on ebay. In Lecture notes in artificial intelligence Springer.
Ebay: The determinants of price in online auctions. Technical report, Wellman, M. P., Reeves, D. M., Lochner, K. M., & Vorobeychik, Y. (2002).
University of Arizona. http://www/vanderbilt/edi/econ/reiley/papers/Pen- Price prediction in a trading agent competition. Technical report,
niesFromEBaiy.pdf. University of Michigan.
Metha, K., & Lee, B. (1999). An empirical evidence of winner’s curse in Wellman, M. P., Reeves, D. M., Lochner, K. M., & Vorobeychik, Y. (2004).
electronic auction. Proceeding of the 20th international conference on Price prediction in a trading agent competition. Journal of Artificial
information systems, Charlotte, NC. Intelligence Research, 21, 19–36.