## Are you sure?

This action might not be possible to undo. Are you sure you want to continue?

**A Novel Nonlinear Combination Model Based on Support Vector Machine for Rainfall Prediction
**

Kesheng Lu Department of Mathematics and Computer Sciences Guangxi Normal University for Nationality Chongzui, Guangxi, China Email: chaokuiwu@163.com Lingzhi Wang Department of Mathematics and Computer Science Liuzhou Teachers College Liuzhou, Guangxi, China Email: wlz1974@163.com

Abstract—In this study, a novel modular–type Support Vector Machine (SVM) is presented to simulate rainfall prediction. First of all, a bagging sampling technique is used to generate different training sets. Secondly, different kernel function of SVM with different parameters, i.e., base models, are then trained to formulate different regression based on the different training sets. Thirdly, the Partial Least Square (PLS) technology is used to select choose the appropriate number of SVR combination members. Finally, a 𝜈–SVM can be produced by learning from all base models. The technique will be implemented to forecast monthly rainfall in the Guangxi, China. Empirical results show that the prediction by using the SVM combination model is generally better than those obtained using other models presented in this study in terms of the same evaluation measurements. Our ﬁndings reveal that the nonlinear ensemble model proposed here can be used as an alternative forecasting tool for a Meteorological application in achieving greater forecasting accuracy and improving prediction quality further. Keywords-support vector machine; kernel function; partial least square; rainfall prediction;

I. I NTRODUCTION Rainfall forecasting has been a difﬁcult subject in hydrology due to the complexity of the physical processes involved and the variability of rainfall in space and time [1], [2]. With the development of science and technology, in particular, the intelligent computing technology in the past few decades, many emerging techniques, such as artiﬁcial neural network (ANN), have been widely used in the rainfall forecasting and obtained good results [3], [4], [5]. ANN are computerized intelligence systems that simulate the inductive power and behavior of the human brain. They have the ability to generalize and see through noise and distortion, to abstract essential characteristics in the presence of irrelevant data, and to provide a high degree of robustness and fault tolerance [6], [7]. Many experimental results demonstrate that the rainfall forecasting of ANN model outperformed multiple regression, moving average and exponent smoothing from the research literature. In addition, ANN approaches want of a strict theoretical support, effects of applications are strongly

978-0-7695-4335-2/11 $26.00 © 2011 IEEE DOI 10.1109/CSO.2011.50 1343

depended upon operator’s experience. In the practical application, ANN often exhibits inconsistent and unpredictable performance on noisy data [8]. Recently, support vector regression (SVM), a novel neural network algorithm, was developed by Vapnik and his colleagues [9], which is a learning machine based on statistical learning theory, and which adheres to the principle of structural risk minimization seeking to minimize an upper bound of the generalization error, rather than minimize the training error (the principle followed by ANN) [10], [11]. When using SVM, the main problems is confronted: how to choose the kernel function and how to set the best kernel paraments. The proper parameters setting can improve the SVM regression accuracy. Different kernel function and different parameter settings can cause signiﬁcant differences in performance. Unfortunately, there are no analytical methods or strong heuristics that can guide the user in selecting an appropriate kernel function and good parameter values. In order to overcome these drawbacks, a novel technique is introduced. The generic idea consists of three phases. First, an initial data set is transformed into several different training sets. Based on the different training sets, different kernel function of SVM and different parameter settings are then trained to formulate different regression forecasting. Finally, a SVM can be produced by learning from all base models. The rainfall data of Guangxi is predicted as a case study for development of rainfall forecasting model. The rest of this study is organized as follows. Section 2 elaborates a triple-phase SVM process is described in detail. For further illustration, this work employs the method to set up a prediction model for rainfall forecasting in Section 3. Finally, some concluding remarks are drawn in Section 4. II. T HE B UILDING P ROCESS OF THE N ONLINEAR E NSEMBLE M ODEL Originally, SVM has been presented to solve pattern recognition problems. However, with the introduction of Vapnik’s 𝜀-insensitive loss function, SVM has been developed to solve nonlinear regression estimation problems, such as new techniques known as support vector regression

SVR will gain good generalization performance and vice versa. these training sets are input to the different individual SVM regression models.(SVR) [12]. Generally speaking. C. and training data 𝑥𝑖 are mapped into a high (even inﬁnite) dimensional feature space by the mapping function 𝜑(⋅). we need to select a subset of representatives in order to improve ensemble efﬁciency. Instead of selecting an appropriate 𝜀 Sch¨ kopf et al. which is nonlinearly mapped from the input space x. Support Vector Regression The SVR model maps data nonlinearly into a higherdimensional feature space. This primal optimization problem is a linearly constrained quadratic programming problem [13]. Suppose we are given training data (x𝑖 . several methods have been investigated for the generation of ensemble members making different errors. the term 𝐾(𝑥𝑖 . which can be solved by introducing Lagrangian multipliers and applying Karush-Kuhn-Tucker (KKT) conditions to solve its dual problem: ⎧ 𝑁 𝑁 𝑚𝑖𝑛𝑅(𝛼. 𝑦 = 𝑓 (𝑥). Generating individual SVR predictors With the work about bias–variance trade–off of Bretiman [14]. which introduces a new parameter 𝜈 which can control the number of support vectors and training errors without deﬁning 𝜀 a prior. different training sets can be obtained. where the value of kernel function equals the inner product of two vectors 𝑥𝑖 and 𝑥𝑗 in the feature space 𝜙(𝑥𝑖 ) and 𝜙(𝑥𝑗 ). varying the parameters of SVR or utilizing different training sets. different SVR can be produced. In this paper. For SVR model. if there are a great number of individual members. 2. 𝜉 ∗ ) = 1 𝜔 𝑇 𝜔 + 𝐶(𝜈𝜀 + 1 ∑ (𝜉𝑖 + 𝜉 ∗ )) 𝑖 2 𝑁 ⎨ 𝑖=1 ∗ 𝑠. 𝑑𝑖 is the output value and 𝑛 is the total number of data dimension. . Polynomial kernel and Guassian kernel. proposed a variant. ⋅ ⋅ ⋅ . At present. the popular kernel functions are Linear kernel. 𝛼 ≥ 𝐶. SVR has been emerging as an alternative and powerful technique to solve the nonlinear regression problem.𝑡. there are three methods for generating diverse models. ⋅ ⋅ ⋅ . 𝑑𝑖 − 𝑓 (𝑥𝑖 ) ≤ 𝜀 + 𝜉𝑖 𝑓 (𝑥𝑖 ) − 𝑑𝑖 ≤ 𝜀 + 𝜉𝑖 ⎩ ∗ 𝜉𝑖 . 𝛼∗ ) = ∑ 𝑑 (𝛼 − 𝛼∗ ) − 𝜀 ∑ 𝑑 (𝛼 + 𝛼∗ ) 𝑖 𝑖 𝑖 𝑖 𝑖 𝑖 𝑖=1 𝑖=1 𝑁 𝑁 ∑ ∑ ⎨ ∗ ∗ (𝛼𝑖 − 𝛼𝑖 )(𝛼𝑗 − 𝛼𝑗 )𝐾(𝑥𝑖 . In this paper. 𝑁. 𝑥𝑗 ) is deﬁned as kernel function. where 𝑥𝑖 ∈ 𝑅𝑛 is the input vector. (1) Using different the type of SVR kernel function. The modelling aim is to identify a regression function. in which it undertakes linear regression. meaning that 𝐾(𝑥𝑖 . in the ﬁrst stage. each individual neural predictor has generated its own result. which have been shown to exhibit excellent performance. In the second stage. Rather than obtaining empirical errors. 𝜔 ∈ 𝐹 (1) basically depended on different the kernel function. such as the linear kernel function and the polynomial kernel function. 𝜈-Support vector regression If the proper hyper parameters are picked up. (3) where 0 ≤ 𝜈 ≤ 1. Such methods 1344 . (3) Using different training data: by re-sampling and preprocessing data. Therefore. The Establishment of Combination Forecasting Model To summarize. To be more precise. 𝜉𝑖 ≥ 0. Selecting appropriate ensemble members After training. ∑ (𝛼 − 𝛼∗ ) = 0 𝑖 𝑖 𝑖=1 ⎩ ∗ 0 ≤ 𝛼. 𝐶 is the regulator. 𝑑𝑖 )𝑖=1 . However. 𝜙 : 𝑅𝑛 → 𝐹. 𝑁 (2) ∗ where 𝛼𝑖 and 𝛼𝑖 are the Lagrangian multipliers associated with the constraints. B. that accurately predicts the outputs 𝑑𝑖 corresponding to a new set of input– output examples. the initial data set is divided into different training sets by used Bagging and Boosting technology. It has achieved great success in both academic and industrial platforms due to its many attractive features and promising generalization performance. and then various single SVM regression predictors are produced where 𝜔 and 𝑏 are coefﬁcients. 𝜀 ≥ 0. The 𝜈-SVR regression in data sets can be described as follows: ⎧ 𝑁 𝑚𝑖𝑛𝑅(𝜔. how to generate diverse models is a crucial factor. the Partial Least Square (PLS) regression technique [16] is adopted to select appropriate ensemble members. E. : through varying the cluster center 𝑐 of the SCR. so it is important to select right model. different cluster radius 𝜎 of the SVR. (2) Utilizing different the parameters of SVR. 𝑥𝑗 ) −1 2 𝑖=1 𝑗=1 𝑁 𝑠. an ensemble model regression model consisting of diverse models with much disagreement is more likely to have a good generalization [15]. 𝜉. the proposed nonlinear combination forecasting model consists of four main stages. (𝑥𝑖 . 𝑖 = 1. such as different cluster center 𝑐 of the SVR. Interested readers can be referred to [16] for more details. 𝑥𝑗 ) = 𝜙(𝑥𝑖 ) × 𝜙(𝑥𝑗 ). 𝜙(𝑥) denotes the high dimensional feature space. The linear regression function (in the feature space) is described as follows: 𝑓 (𝑥) =𝜔𝜙(𝑥) + 𝑏. called 𝜈o support vector regression. they proved that 𝜈 is an upper bound on the fraction of margin errors and lower bound of the fraction of support vectors [15]. In machine learning theories. A. D. 𝑑𝑖 ). SVR aims to minimize the upper limit of the generalization 𝑁 error.𝑡. 2. 𝑖 = 1.

Normalized Mean Squared Error (NMSE). Figure 3 shows the forecasting results of four different models for 40 testing samples. we can generally see that the forecasting results are very promising in the rainfall forecasting under the research where either the measurement of ﬁtting performance is goodness or where the forecasting performance is effectiveness. The NMSE result of the v–SVM regression model has obvious advantages over three other models.1029 0.9766 Training Se t TR 1 Original Data Se t.1285.3456 0. Table 2 shows that the forecasting performance of four different models from different perspectives in terms of various evaluation indices.0955. Those are ﬁtted the 500 samples and forecasted the 40 samples by the those models. Testing results in June for 30 testing samples. Table I A COMPARISON OF FITTING RESULT OF FOUR DIFFERENT MODELS ABOUT 500 TRAINING SAMPLES Ensemble Moel NMSE 0. we can see that the forecasting results of v–SVR ensemble model are best in all models. the forecast is only one sample each time and the training samples is an additional one each time on the base of the previous training. that is. we can generally see that learning ability of v–SVM regression ensemble outperforms the other three models under the same network input. From the table. The more important factor to measure performance of a method is to check its forecasting ability of testing samples in order for actual rainfall application. Subsequently. In order to investigate the effect of the proposed model.0221. Method of modeling is one-step ahead prediction.0374 MAPE 0. Similarly. 550 500 450 400 350 300 250 200 150 100 50 0 Actual monthly rainfall Simple averaging MSE regression Variance based weight v−SVR combination Rainfall(mm) 5 10 15 20 25 30 35 Monthly 40 Figure 2. A ﬂow diagram of the proposed semiparamentric ensemble forecasting model. For example. E XPERIMENTAL R ESULTS AND D ISCUSSION A.7654 0. the comparison results are used to test the effect of predictive models. the Mean Absolute Percentage Error (MAPE) and Pearson Relative Coefﬁcient (PRC). 500 data of whose were used to train samples for 𝜈-SVM regression learning. B. SVR 1 C. the mean squared error (MSE) based regression ensemble and variance–based weighted ensemble are established. the differences among the different models are very signiﬁcant.1.7360 0. From the graphs and table. three types of errors are used in this paper. In such a way ﬁnal combination forecasting results can be obtained.2486 PRC 0. PLS model is used to select choose the appropriate number of SVR ensemble members. III. the NMSE of the MSE ensemble model is 0.based on diversity principle. Thus the data set contained 540 data points in time series. 𝜈-SVM regression is used to aggregate the selected combination members (𝜈SVR). however the NMSE of the v–SVM regression model reaches 0. 1345 . the simple averaging ensemble. In the four stage. the NMSE of the variance weighted ensemble model is 0.3211 0.0452 0. Empirical Data This study has investigated Modeling 𝜈-SVM regression to predict average monthly precipitation from January 1965 to December 2009 in Guangxi. Analysis of the Results Table 1 illustrates the ﬁtting accuracy and efﬁciency of the model in terms of various evaluation indices for 500 training samples.9350 0. which be found in many paper [6]. As shown in Table 2 about the rainfall forecasting of four different model.0653.0976 0. the NMSE of the simple averaging ensemble model is 0.7892 0. Performance evaluation of model In order to measure the effectiveness of the proposed method. DS Training Se t TR 2 Bagging Technology Output SVR 2 Output SVR 3 Output PLS Se le ctiom SVR 4 Output Training Se t TR M-1 v-SVR Ensemble SVR 5 simple averaging MSE ensemble variance weighted v-SVM regression Output SVR 6 Training Se t TR M Output Figure 1. In the third stage. The basic ﬂow diagram can be shown in Fig. and the other 40 data were used to test sample for 𝜈-SVM regression Generalization ability. such as.

[5] G. 9. MA: MIT Press. 16. 2008. Ersoy and P.5. 5. [11] V. Hong. This paper proposes a novel nonlinear combination forecasting method in terms of v–SVR principle. [2] G. Vapnik. Breiman. C. The nature of statistical learning theory. Williamson and P. New York: Springer Press.” Applied Mathematics and Computation. IEEE Computer Society Press.” Lecture Note Computer Science. pp: 1207–1245. I: Preliminary concepts. pp:484–487. [13] V.Table II A COMPARISON OF FORECASTING RESULT OF FOUR DIFFERENT MODELS ABOUT 40 TESTING SAMPLES Ensemble Moel simple averaging MSE ensemble variance weighted v–SVM regression NMSE 0. Our experimental results demonstrated the successful application of our proposed new model. M.” Journal of Hydrology. Bartlett.” Hydrological Processes. O. pp: 281–287. [8] W. “An empirical method to improve the prediction limits of the GLUE methodology in rainfall– runoff modeling. “Support vector method for function approximation. [12] B.2053 PRC 0. E.2. Golowich and A.8710 0. 1999. No. Sveinsson. 0832092. J. Govindaraju.” Proeedings of the Third Internatioal Joint Conference on Computational Sciences and Optimization. 2009. M. 2005. Advance in neural information processing system. Advance in neural information processing system. Schmitz. 200. ACKNOWLEDGMENT The authors would like to express their sincere thanks to the editor and anonymous reviewer’s comments and suggestions for the improvement of this paper. 2008. Vol. regression estimation and signal processing. pp: 115– 124. Pirouz. Smola.6726 0. the Springer Press.” Journal of Hydrologic Engineering. Chen. Vol. 115–123. This model was applied to the forecasting ﬁelds of monthly rainfall in Guangxi. Vol. 48(1–4). “Application of an artiﬁcial neural network to typhoon rainfall forecasting. pp: 41–57. H. C ONCLUSION Accurate rainfall forecasting is crucial for a frequent unanticipated ﬂash ﬂood region to avoid life losing and economic loses.0955 0. Swain. H. 31-50. regression estimation and signal processing. Mozer. Vol. vol. A.” Journal of Hydrology. pp: 281–287. “PAI–OFF: A new proposal for online ﬂood forecasting in ﬂash ﬂood prone catchments. v–SVM regression. [9] V. Connor. S. 2006. eds. Cao. R.” In Edited by M. No. pp. pp: 847–861. L. Yingwen Song and Lean Yu.” In Edited by M. Vol. No. “Modiﬁed support vector machines in ﬁnancial time series forecasting. 1997. Vol. “A novel nonparametric regression ensemble for rainfall forecasting using particle swarm optimization technique coupled with artiﬁcial neural network. Tay and L. The Paul Merage School of Business. Petsche. Liangyong Huang and Xiongming Pan. R. 1346 . M. 2010. [16] D. 3. 2009.0221 MAPE 0. Vapnik. Vol. and in part by the Department of Guangxi Education under Grant No. 1997. “Study on the meteorological prediction model using the learning algorithm of neural network based on pso algorithms.” Neurocomputing. Sch¨ lkopf. In terms of the different forecasting models. [3] Jiansheng Wu. [14] L. empirical results show that the developed model performs the best for monthly rainfall on the basis of different criteria. Lin and L. Vol.7965 0. 200707MS061.0653 0. 2008. Irvine. Vapnik. 2. 1.9341 [4] Jiansheng Wu. K. IV. [6] Jiansheng Wu and Long Jin. K. Smola. It demonstrated that it increased the rainfall forecasting accuracy more than any other model employed in this study in terms of the same measurements. Lai.” IEEE Transactions on Neural Networks.”Proceedings of Combining Artiﬁcial Neural Nets-Ensemble and Modular Multi-net Systems. for MAPE efﬁciency index. pp: 49–58. Vol.H. Golowich and A. Cullmann. Petsche. Benediktsson.” Neural Computation. pp. “Parallel Consensual Neural Neural Networks. H. MA: MIT Press. 360. 19. “Combining Predictors. “Artiﬁcial neural network in hydrology. J. K.4381 0.” Technical report. Vol. “Support vector method for function approximation.1285 0. 54-64. [7] R. F.4109 0. Vol. “A novel bayesian additive regression trees ensemble model based on linear regression and nonlinear regression for torrential rain forecasting. Jordan and T. Vol.8820 0. So the v–SVM regression ensemble forecasting model can be used as an alternative tool for monthly rainfall forecasting to obtain greater forecasting accuracy. o “New support vector algorithms. University of California. Mozer. Vol. K. S. Jordan and T. “An overview of partial least square. pp. Cambridge. 2000. pp: 1–14. pp: 83–88. for the complex forecasting problem. 9. C. 1. M. Berlin.” Journal of Tropical Meteorology. R EFERENCES [1] Lihua Xiong. 1997. 349. 5553. [10] F. “Rainfall forecasting by technological machine learning models. 8. A. S. Cambridge. 2002. This work was supported in part by Guangxi Natural Science Foundation under Grant No. 1825–1837. [15] J. 1995. SpringerVerlag Berlin Heidelberg. Smola. the proposed v–SVM regression model is also the smallest.

- 10.1.1.147.8675 (3)xotid
- Heart Disease Prediction Using Associative Relational Classification Technique (Acar) With Som Neural NetworkIJMER
- 1-s2.0-S0096300311011416-mainValentina Ceaucă
- High-Performance Concrete Compressive Strength Prediction Based Weighted Support Vector MachinesAnonymous 7VPPkWS8O
- Unit_3_4Akshay Chowdhary
- Comparative Study of Musical Performance by Machine LearningEditor IJRITCC
- Studying Executive PersonalityTerry College of Business
- dvd009-masterthesisreportLe Anh Tuan
- OthersSTUDENTS OF DOE CUSAT
- Brain Tumor Segmentation and Classification using FCM and Support Vector MachineIRJET Journal
- Crime Scene ClassificationAshish Rana
- svmdocsaras
- 100302vol2no3
- Machine Learning Methods for Fully Automatic Recognition of Facial Expressions and Facial Actions.pdfSusan Thomas
- 2017 Tgrs Cnnshyamfec
- Unsupervised Feature Selection Applied to SPOT5 Satellite Images IndexingCandra Agung Pratama
- Tn-7 Hydrostatic Stress Rupture CurvesJose Bustos
- Demand ForecastingDeepak kr. patel
- Semantic based Automated Service DiscoveryInternational Organization of Scientific Research (IOSR)
- SOLAR...DTUsatish reddy
- [3] part2-nndisha210
- Sigma Plot ManualMarta Muñoz Ibañez
- tmp9FD0.tmpFrontiers
- smo-bookdaotheman
- 55Rona Likeblonde Purnamasari
- Anchoring Bias in Consensus ForecastsStevano Andreas
- Short Course Outline.docApam Benjamin
- T-128_5300_something_old__something_new__a_longitudinal_study_of_search_behavior_and_new_product_introduction.pdfisha19309
- Qmm Assignment 2Ambika Sharma

- tmpA609Frontiers
- Lemon Disease Detection Using Image ProcessingInternational Journal for Scientific Research and Development
- Product Review Analysis with Ranking System Based on Transaction ID and OTPInternational Journal for Scientific Research and Development
- An Enhance Image Retrieval of User Interest Using Query Specific Approach and Data Mining TechniqueInternational Journal for Scientific Research and Development
- tmp35DF.tmpFrontiers
- Moving Car Detection using HOG featuresInternational Journal for Scientific Research and Development
- Named Entity Recognition for English Tweets using Random Kitchen Sink AlgorithmInternational Journal for Scientific Research and Development
- A Survey on Medical Data ClassificationInternational Journal for Scientific Research and Development
- tmpD1FFrontiers
- tmp8086.tmpFrontiers
- A Survey of Various Intrusion Detection SystemsInternational Journal for Scientific Research and Development
- Colour Object Recognition using Biologically Inspired ModelInternational Journal for Scientific Research and Development
- A Survey On Mining Conceptual Rule and Ontological Matching For Text SummarizationInternational Journal for Scientific Research and Development
- Texture Classification based on Gabor WaveletWhite Globe Publications (IJORCS)
- An Improved sentiment classification for objective word.International Journal for Scientific Research and Development
- Survey paper on methodologies employed in MINERAL explorationInternational Journal for Scientific Research and Development
- tmpD0ABFrontiers
- Character recognition of Sindhi (Arabic) ScriptInternational Journal for Scientific Research and Development
- tmpD7F8Frontiers
- tmp96EDFrontiers
- Survey on Efficient Feature Subset Selection Technique on High Dimensional Small Sized DataInternational Journal for Scientific Research and Development
- Data Partitioning and Machine Learning Techniques for Lake Level Forecasting: A SurveyInternational Journal for Scientific Research and Development
- tmp768BFrontiers
- tmpBC71Frontiers
- Detecting Content Based Image Spam in E-mailInternational Journal for Scientific Research and Development
- Language Identification System for Indian LanguagesInternational Journal for Scientific Research and Development
- tmp4E31.tmpFrontiers
- tmp81FA.tmpFrontiers
- Agricultural Plant Disease Detection and its Treatment using Image ProcessingInternational Journal for Scientific Research and Development
- A Survey on Machine Learning Approach in Data MiningInternational Journal for Scientific Research and Development

Sign up to vote on this title

UsefulNot usefulClose Dialog## Are you sure?

This action might not be possible to undo. Are you sure you want to continue?

Loading