You are on page 1of 8

Available online at www.sciencedirect.

com

Procedia Computer Science 17 (2013) 1055 – 1062

Information Technology and Quantitative Management , ITQM 2013

Support Vector Regression for Newspaper/Magazine Sales


Forecasting

Xiaodan Yua,b , Zhiquan Qib,∗, Yuanmeng Zhaoc


a College of Information Science & Technology, University of Nebraska at Omaha, Omaha, NE 68182, USA.
b Research Center on Fictitious Economy & Data Science, Chinese Academy of Sciences, Beijing 100190, China
c Key Laboratory of Terahertz Optoelectronics, Ministry of Education of China, Beijing Key Laboratory for Terahertz Spectroscopy and

Imaging, Department of Physics, Capital Normal University, Beijing 100048, China

Abstract
Advances in information technologies have changed our lives in many ways. There is a trend that people look for news
and stories on the internet. Under this circumstance, it is more urgent for traditional media companies to predict print’s (i.e.
newspapers/magazines) sales than ever. Previous approaches in newspapers/magazines’ sales forecasting are mainly focused
on building regression models based on sample data sets. But such regression models can suffer from the over-fitting problem.
Recent theoretical studies in statistics proposed a novel method, namely support vector regression (SVR), to overcome the
over-fitting problem. In contrast to traditional regression model, the objective of SVR is to achieve the minimum structural risk
rather than the minimum empirical risk. This study, therefore, applied support vector regression to the newspaper/magazines’
sales forecasting problem. The experiment showed that SVR is a superior method in this kind of task.
© 2013 The Authors. Published by Elsevier B.V.
Selection and peer-review under responsibility of the organizers of the 2013 International Conference on
Information Technology and Quantitative Management
Keywords: sales forecasting, data mining, optimization, SVM, SVR, Regression

1. Introduction

Information technologies are keep transforming our ways of doing business or in interacting with others at a
rapid pace. Firms are increasingly relying on the internet to promote products’ sales [1]. Health care providers are
also seeking ways to take full advantage of IT to deliver a better service to patients [2]. There is also a prominent
trend that people exchange opinions and experiences through the internet [3, 4]. Especially the young generation
who grows up with the development of IT is now used to browsing webs for news and interesting stories. Under
this circumstance, it is more urgent for traditional media companies to predict sales over newspapers/magazines
than ever [5, 6] to avoid excessive printing or to meet unexpected demand. But with the competitive and dynamic
business environment, accurate sales forecasting is challenging.
To our knowledge, in existing literature, three themes emerge in addressing this task. Those include the
operations management approach [7, 8, 9, 10, 11], the statistics approach [12], and the data mining approach
[13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24]. Specifically, the operations management approach often assumed
that newspapers/ magazines sales (i.e. customer demands) are random variables that follow certain probability

∗ Corresponding
author
Email addresses: yxd.xiaodanyu@gmail.com (Xiaodan Yu), qizhiquan@foxmail.com (Zhiquan Qi),
zhao.yuanmeng@gmail.com (Yuanmeng Zhao)

1877-0509 © 2013 The Authors. Published by Elsevier B.V.


Selection and peer-review under responsibility of the organizers of the 2013 International Conference on Information Technology and Quantitative
Management
doi:10.1016/j.procs.2013.05.134
1056 Xiaodan Yu et al. / Procedia Computer Science 17 (2013) 1055 – 1062

distribution. But the type of the probability distribution is often determined by personal taste or mathematical
convenience. Establishing regression models based on the finite data sets for predicting sales is the dominant
method within the statistics approach. But over-fitting is a notable problem facing the regression models’ ap-
proach. Recent theoretical advances in statistics, however, has lead to a novel method, support vector regression
(SVR), belonging to the data mining area. SVR is an extension of support vector machine (SVM) and is devel-
oped based on the structural risk minimization (SRM) objective. Prior studies have proved that SVR has a good
generalization ability [25, 26, 27, 28].
Given the advantages of SVR and the characteristics of the domain problem, this study is thus, focused on using
SVR in predicting newspaper/magazines’ sales by considering various store types and shoppers’ demographic
attributes.
The rest of the paper is organized as follows. Section 2 gives an overview of the problem. Section 3 presents
the SVR method. Experimental results are presented in section 4. Finally, we make conclusion.

2. Overview of the Domain Problem

Sales forecasting is the process of organizing and analyzing information in a way that makes it possible to es-
timate what your sales will be [29, 30]. Forecasting newspaper/magazines’ sales is vitally important for publisher-
s/wholesalers in preparing prints plan and distribution plan. Such sales forecasting also has important implication
to site selection of future stores [31].
The importance of sales forecasting in real world has stimulated the research in this area. To our knowledge,
methods for sale forecasting can be classified into three areas of research, namely operations research, statistics,
and data mining. Studies in operations research mostly understood customer demands as a random variable
following particular distributions [32]. The interaction between customer demands and selling prices have also
been recognized [8]. Major methods from the traditional statistics approach include: the naı̈ve forecast, the
moving average forecast, the multiple linear regression model, and time series model. A study by Carbonneau
[9] has given an in-depth discussion over each of these methods. Among these statistical methods, multiple linear
regression model is one of the mostly used methods. A basic assumption of this approach is that the sales of
products can be linearly estimated by a set of explanatory variables describing relevant environment, such as the
store type and shoppers’ demographics. However, this assumption can be constantly violated when there actually
exist nonlinear relationships among various explanatory variables and sales of products. Recent advances in data
mining algorithms, such as SVR, has the advantage of modeling the nonlinear relationships. Moreover, SVR
is known for its capability of minimizing the structural risk. The principle of structural risks minimization is to
achieve a balance between model’s complexity (i.e. model’s over-fitting problem) and models’ predictive accuracy
over training data set.
Given the complexity of predicting sales of newspaper/magazines, therefore, the objective of this study is to
use SVR to forecasting sales of newspaper/magazines. The data set was acquired at Hearst Corporation’s1 2010
data mining challenge program. The data set contains past relevant information (from year 2006 - 2009) on various
magazines/newspapers for each stores. Access to the data set was available during the challenge period and was
now closed. The data set has a total of five tables i.e Sales, Issue, Store/Chain, Wholesaler and Zip+4 data.
Fig.1 shows the overall data mining process of this study.

3. Support Vector Regression

The objective of SVR is to find a function g(x) that matches all input data with an error at the most ε and at
the same time as be flat as possible. ε decides the precision of the SVR model. The case of linear function g(x)
has been described in formulation (1).

1 Hearst
Corporation is one of the largest diversified media and information companies in the United States. Its major businesses include
magazine, newspaper and business publishing, cable networks, television and radio broadcasting, Internet businesses, TV production and
distribution, newspaper features distribution, business information and real estate. In 2010, Hearst Corporation announced its first data mining
challenge project. The objective is to predict the sales of magazine copies at each newsstand location to optimize the overall contribution of
the newsstand location typically referred to as draw.
Xiaodan Yu et al. / Procedia Computer Science 17 (2013) 1055 – 1062 1057

Fig. 1. Overview of the Data Mining.

y = g(x) = (w · x) + b (1)

Let the training samples to be T = (x1 , y1 ), ..., (xl , yl ) ∈ (Rn × Y)l , where the inputs xi ∈ Rn , and response
variables yi ∈ Y = R, i = 1, ..., l. Identifying values for parameters w and b can be done by minimizing the
following objective function

1 l
min ∗ w2 + C (ξi + ξi∗ ), (2)
w,b,ξi ,ξi 2 i=1
s.t. ((w · xi ) + b) − yi ≤ ε + ξi , i = 1, ..., l, (3)
yi − ((w · xi ) + b) ≤ ε + ξi∗ , i = 1, ..., l (4)
ξi ≥ 0, i = 1, ..., l, (5)
ξi∗ ≥ 0, i = 1, ..., l, (6)

where ξi , ξi∗ were introduced as slack variables for over and under-predicting. C is a penalty parameter to adjust
the trade-off between the regression error and the regularization on y = g(x).
Considering the case when linear functions cannot separate the input patterns in the original space, we need to
project the input data into a higher dimensional feature space as is shown in the following formula:

Rn → H,
Φ: (7)
x → x = Φ(x)
(8)

Then we express the linear function y in space H as

y = g(x) = (w · Φ(x)) + b (9)

The objective function in space H is given by the following formulation:


1058 Xiaodan Yu et al. / Procedia Computer Science 17 (2013) 1055 – 1062

1 l
min ∗ w2 + C (ξi + ξi∗ ), (10)
w,b,ξi ,ξi 2 i=1
s.t. ((w · Φ(xi )) + b) − yi ≤ ε + ξi , i = 1, ..., l, (11)
yi − ((w · Φ(xi )) + b) ≤ ε + ξi∗ , i = 1, ..., l (12)
ξi ≥ 0, i = 1, ..., l (13)
ξi∗ ≥ 0, i = 1, ..., l (14)

The dual formation of the problem (10) is

1 ∗  
l l l
min∗ (αi − αi )(α∗j − α j )K(xi , x j ) + ε (α∗i + αi ) − yi (α∗i − αi ), (15)
αi ,αi 2 i, j=1 i=1 i=1


l
s.t. (αi − α∗i ) = 0, (16)
i=1
0 ≤ αi ≤ C, i = 1, ..., l (17)
0 ≤ α∗i ≤ C, i = 1, ..., l, (18)

where K(xi , x j ) = Φ(xi ) · Φ(x j ). K is called the kernal function.


Then the decision function for x in the space, Rn , is of the following form:


l
y = g(x) = (a∗i − ai )K(xi , x) + b. (19)
i=1

4. Experiment

4.1. Data Preprocessing


Given the objective of this study and the primary key definitions of the data, we first extracted the data by
aggregating the individual magazine sales at the store level. We didn’t aggregate all magazine sales at the store
level because we assumed that given the same store, the sales can vary significantly among different titles and
issues of magazines/newspapers. There are over 250 attributes for each record.
Datasets’ dimensions were reduced according to their importance to the sales of newspaper/magazines at each
store. Previous studies have found store type and some demographics has significant influence on sales. Based on
findings from previous literature, we selected the following relevant explanatory variables: store type (Hearst has
given a total of 12 store types based on the locations of the stores. Examples of such store types are bookstore,
convenient store, drug store, and etc.), occupation, education, age, gender, and income. The response variable is
the sales of magazines at each store.
As suggested by previous studies, we performed data transformation by scaling all variables between 0 to 1.
Specifically, the store type was assessed by a nominal scale, demographic variables and sales were transformed
into interval scales.

4.2. Data Exploration


Before proceeding to run SVR model, we first visually examined major characteristics of the data. In order
to present the data in an easy-to-understand form, we clustered the sales into three classes, namely low, media,
and high. Then we explored the major characteristics of each of the three classes over important demographic
attributes.
Fig.2 presented the data exploration results.
Xiaodan Yu et al. / Procedia Computer Science 17 (2013) 1055 – 1062 1059

Fig. 2. Data Exploration.

The above diagrams showed that stores’ sales do vary when some of the shoppers’ demographics characteris-
tics vary. Specifically, stores with local population having higher percentage of technical professionals and sales
service people tends to have higher sales. Local shoppers’ education background also play a role in stores’ sales.
Gender seems do not affect the sellers with relatively high sales. Age distributions among all three classes of sale
do not vary obviously.
However, it should be noted that these visual graphs cannot tell us if there are significant effects of these
explanatory variables on sales.

4.3. SVR model


Our algorithm code is written in MATLAB 2010. The experiment environment: Intel Core I5 CPU, 2 GB
memory. we picked up the first 500 as the training set. Training sets of the smaller size were randomly extracted
from the 200 selected images. 500 samples were used as a fixed validation set, 500 samples were used as a fixed
test set. Finally, we run SVR on the data set and got the results as follows. In the above table, deviation denotes

Table 1. The accuracy of SVR on Hearst datasets.


Sizes of training data Linear RBF
deviation deviation
100 0.1459 0.1368
200 0.1418 0.1310
300 0.1410 0.1209
400 0.1376 0.1147
500 0.1321 0.1092

the average Euclidean Distances between the actual sales and sales predicted by the SVR model on the testing
data set. From Table 2, we find that for all sample sizes, SVR gets better results with the help of the RBF kernel.
To get more interpretable results from SVR, we did a factor analysis within the linear case of SVR as is shown
in Fig. 3.
In the above two graphs, the horizontal axis indicate the 32 explanatory factors. The vertical axis indicate the
relative importance of the 32 attributes compared with each other. The graph on the left hand were results when
sample size was 200. The graph on the right hand showed results when samples size equals to 400. We can see
1060 Xiaodan Yu et al. / Procedia Computer Science 17 (2013) 1055 – 1062

Fig. 3. Relative Importance of Explanatory Variables on Sales.

that results from different samples sizes do vary. Our experiments showed that the importance of each explanatory
factors tend to be stable as sample sizes exceed 400
Table 2 presented the detailed information on attributes.

Table 2. Description of the 32 dimensions in the Hearst datasets.


ID Dimension ID Dimension

1 Store type 17 Percentage of age from 50 to 70


2 Percentage of technical professionals 18 Percentage of age older than 70
3 Percentage of sales service professionals 19 Percentage of male
4 Percentage of farm service 20 Percentage of female
5 Percentage of blue collar 21 Percentage of income lower than 15000
6 Percentage of other occupation 22 Percentage of income from 15000 to 24999
7 Percentage of retired 23 Percentage of income from 25000 to 34999
8 Percentage of occupation unknown 24 Percentage of income from 35000 to 49999
9 Percentage of education high school 25 Percentage of income from 50000 to 74999
10 Percentage of education some college 26 Percentage of income from 75000 to 99999
11 Percentage of education bachelor degree 27 Percentage of income from 100000 to 124999
12 Percentage of education graduate 28 Percentage of income from 125000 to 149999
13 Percentage of education less than high school 29 Percentage of income from 150000 to 174999
14 Percentage of education unknown 30 Percentage of income from 175000 to 199999
15 Percentage of age younger than 20 31 Percentage of income from 200000 to 249999
16 Percentage of age from 20 to 50 32 Percentage of income larger than 250000

The results generally confirmed that such demographic characteristics as gender, age, income, education, and
occupation distributions affect newspaper/magazines’ sales. Specifically, the percentage of people with income
level between 100, 000 to 124, 999 has the highest influence to the sales. The percentage of people working as
blue collar has the second highest importance. Education and age are also important factors in the SVR model.

5. Conclusion

The overall objective of this study is to accurately predict newspaper/magazine sales. This paper presented
how SVR method can be used in sale forecasting. The experiment showed that SVR is a superior method in this
kind of task. Future research could compare the performance of SVR with the performance of other data mining
Xiaodan Yu et al. / Procedia Computer Science 17 (2013) 1055 – 1062 1061

methods on forecasting sales. Further, more research is needed in selecting appropriate explanatory variables with
theoretical bases.

6. Acknowledgement

We would like to thank Jerry Gill who built the integrated data set based on the original Hearst’s five tables.
We would also like to thank Travis Good and Adina Kunwar for their help with the early version of this study.
This work has been partially supported by grants from National Natural Science Foundation of China (NO.70921061,
NO.11271361), the CAS/SAFEA International Partnership Program for Creative Research Teams, Major Interna-
tional ( Ragional) Joint Research Project (NO.71110107026), the President Fund of GUCAS.

References
[1] D. Khazanchi, S. G. Sutton, Assurance services for business-to- business electronic commerce: A framework and implications, J. AIS 1.
doi:http://jais.isworld.org/articles/default.asp?vol=1&art=11.
[2] X. Yu, D. Owens, D. Khazanchi, Building socioemotional environments in metaverses for virtual teams in healthcare: A conceptual
exploration, in: Health Information Science, Springer, 2012, pp. 4–12.
[3] A. Davis, D. Khazanchi, An empirical study of online word of mouth as a predictor for multi-product category e-commerce sales,
Electronic Markets 18 (2) (2008) 130–141. doi:10.1080/10196780802044776.
URL http://dx.doi.org/10.1080/10196780802044776
[4] X. Yu, D. Khazanchi, An exploratory study of the impact of it capabilities adaptation on shared mental models similarity, in: MWAIS
2011, 2011.
[5] S. H. Chang, D. E. Fyffe, Estimation of forecast errors for seasonal-style-goods sales, Management Science 18 (2) (1971) B89–B96.
URL http://EconPapers.repec.org/RePEc:inm:ormnsc:v:18:y:1971:i:2:p:b89-b96
[6] F. G. G. Cardoso, Newspaper demand prediction and replacement model based on fuzzy clustering and rules, Information Sciences 177
(2007) 4799C4809.
[7] E. Silver, D. Pyke, R. Peterson, Inventory Management and Production Planning and Scheduling, John Wiley & Sons, 1998.
URL http://books.google.com.hk/books?id=GI5jQgAACAAJ
[8] Y. Qin, R. Wang, A. J. Vakharia, Y. Chen, M. M. Seref, The newsvendor problem: Review and directions for future research, European
Journal of Operational Research 213 (2) (2011) 361 – 374. doi:10.1016/j.ejor.2010.11.024.
URL http://www.sciencedirect.com/science/article/pii/S0377221710008040
[9] K. L. Donohue, Efficient supply contracts for fashion goods with forecast updating and two production modes, MANAGEMENT SCI-
ENCE 46 (2000) 1397–1411.
[10] Y. Shao, C. Zhang, X. Wang, N. Deng, Improvements on twin support vector machines, Neural Networks, IEEE Transactions on 22 (6)
(2011) 962–968.
[11] Y. Shao, N. Deng, A coordinate descent margin based-twin support vector machine for classification, Neural Networks 25 (2012) 114–
121.
[12] G. E. P. Box, G. Jenkins, Time Series Analysis, Forecasting and Control, Holden-Day, Incorporated, 1990.
[13] S. W. Sigurdur Olafsson, Xiaonan Li, Operations research and data mining, European Journal of Operational Research 187 (2008)
1429C1448.
[14] X. L. Yingjie Tian, Yong Shi, Recent advances on support vector machines research, Technological and Economic Development of
Economy 18 (2012) 5–33.
[15] R. Carbonneau, K. Laframboise, R. Vahidov, Application of machine learning techniques for supply chain demand forecasting, European
Journal of Operational Research 184 (2008) 1140–1154.
URL http://www.sciencedirect.com/science/article/B6VCT-4MXSS6S-2/2/a61aa83358dd7bfceeed7abb4ad9706b
[16] Y. Tian, Y. Shi, X. Liu, Recent advances on support vector machines research, Technological and Economic Development of Economy
18(1) (2012) 5–33.
[17] Z. Qi, Y. Tian, S. Yong, Twin support vector machine with universum data, Neural Networks 36C (2012) 112–119. doi:doi:
10.1016/j.neunet.2012.09.004.
[18] Z. Qi, Y. Tian, S. Yong, Robust twin support vector machine for pattern classification, Pattern Recognition 46(1) (2013) 305–316.
doi:doi:10.1016/j.patcog.2012.06.019.
[19] Z. Qi, Y. Tian, S. Yong, Laplacian twin support vector machine for semi-supervised classification, Neural Networks 35 (2012) 46–53.
doi:doi: http://dx.doi.org/10.1016/ j.neunet. 2012.07.011.
[20] Z. Qi, S. Yong, Structural regular multiple criteria linear programming for classification problem, International Journal of Computers,
Communications & Control 7(4) (2012) 732–742. doi:doi: http://dx.doi.org/10.1016/ j.neunet. 2012.07.011.
[21] Z. Qi, Y. Tian, S. Yong, Rgularized multiple criteria second order cone programming formulations, in: KDD(DMIKM), 2012.
[22] Z. Qi, Y. Tian, S. Yong, Regularized multiple criteria linear programming via linear programming, in: Procedia Computer Science,
Vol. 9, 2012, pp. 1234–1239.
[23] Z. Qi, Y. Tian, S. Yong, Regular multiple criteria linear programming for semi-supervised classification, in: ICDM(OEDM), 2012.
[24] Z. Qi, Y. Tian, S. Yong, Multi-instance classification based on regularized multiple criteria linear programming, Neural Computing &
Applicationsdoi:10.1007/s00521-012-1008-0.
1062 Xiaodan Yu et al. / Procedia Computer Science 17 (2013) 1055 – 1062

[25] N. Deng, Y. Tian, Support vector machines: Theory, Algorithms and Extensions, Science Press, China, 2009.
[26] Z. Qi, Y. Tian, Y. Shi, Robust twin support vector machine for pattern classification, Pattern Recogn. 46 (1) (2013) 305–316.
doi:10.1016/j.patcog.2012.06.019.
URL http://dx.doi.org/10.1016/j.patcog.2012.06.019
[27] V. Vapnik, L. Bottou, Local algorithms for pattern recognition and dependencies estimation, Neural Comput. 5 (6) (1993) 893–909.
doi:10.1162/neco.1993.5.6.893.
URL http://dx.doi.org/10.1162/neco.1993.5.6.893
[28] Z. Qi, Y. Tian, Y. Shi, Structural twin support vector machine for classification, Knowledge-Based Systems 43 (2013) 74–81.
doi:10.1016/j.knosys.2013.01.008.
[29] A. A. Kurawarwala, H. Matsuo, Forecasting and inventory management of short life-cycle products, Operations Research 44 (1).
doi:10.2307/171910.
URL http://dx.doi.org/10.2307/171910
[30] A. Hefny, Forecasting for Management of Seasonal Goods Inventories, Memo, The Institute of National Planning, United Arab Republic,
1964.
URL http://books.google.com.hk/books?id=0Q6mPgAACAAJ
[31] B. Parikh, Applying Data Mining to Demand Forecasting and Product Allocations, 2003.
URL http://books.google.com.hk/books?id=hvECHQAACAAJ
[32] G. A. Godfrey, W. B. Powell, An adaptive, distribution-free algorithm for the newsvendor problem with censored demands, with appli-
cation to inventory and distribution problems, Management Sci 47 (8) (2001) 1101–1112.

You might also like