P. 1
Applicability of Data Mining Techniques for Climate Prediction – A Survey Approach

Applicability of Data Mining Techniques for Climate Prediction – A Survey Approach

|Views: 1,010|Likes:
Published by ijcsis
British mathematician Lewis Fry Richardson first proposed numerical weather prediction in 1922. Richardson attempted to perform many kinds of low complexity numerical forecasts before World War II. The first successful numerical prediction was performed in 1950 by a team composed of American metrologists Jule Charney, Philip Thompson, Larry Gates, and Ragnar using the ENIAC digital computer. Climate prediction is a challenging task for researchers and has drawn a lot of research interest in the recent years. Many government and private agencies are working to predict the climate. In recent years, more intelligent weather forecast based on Artificial Neural Network (ANNs) has been developed. Two major Knowledge Discovery areas are (a) data analysis and mining, which extracts patterns from massive volumes of climate related observations and model outputs and (b) data-guided modeling and simulation (e.g., models of water and energy or other assessments of impacts) which take downscaled outputs as the inputs. In this survey we present some of the most used data mining techniques for climate prediction. But still it is a challenging task. In this paper, we survey various climate prediction techniques and methodologies. End of this survey we provide recommendations for future research directions.
British mathematician Lewis Fry Richardson first proposed numerical weather prediction in 1922. Richardson attempted to perform many kinds of low complexity numerical forecasts before World War II. The first successful numerical prediction was performed in 1950 by a team composed of American metrologists Jule Charney, Philip Thompson, Larry Gates, and Ragnar using the ENIAC digital computer. Climate prediction is a challenging task for researchers and has drawn a lot of research interest in the recent years. Many government and private agencies are working to predict the climate. In recent years, more intelligent weather forecast based on Artificial Neural Network (ANNs) has been developed. Two major Knowledge Discovery areas are (a) data analysis and mining, which extracts patterns from massive volumes of climate related observations and model outputs and (b) data-guided modeling and simulation (e.g., models of water and energy or other assessments of impacts) which take downscaled outputs as the inputs. In this survey we present some of the most used data mining techniques for climate prediction. But still it is a challenging task. In this paper, we survey various climate prediction techniques and methodologies. End of this survey we provide recommendations for future research directions.

More info:

Published by: ijcsis on Jun 30, 2010
Copyright:Attribution Non-commercial

Availability:

Read on Scribd mobile: iPhone, iPad and Android.
download as PDF, TXT or read online from Scribd
See more
See less

05/12/2014

pdf

text

original

(IJCSIS) International Journal of Computer Science and Information Security, Vol. 8, No.

1, April 2010

Applicability of Data Mining Techniques for Climate Prediction – A Survey Approach
Dr. S. Santhosh Baboo Reader, PG and Research department of Computer Science, Dwaraka Doss Goverdhan Doss Vaishnav College Chennai santhos2001@sify.com Abstract―British mathematician Lewis Fry Richardson first proposed numerical weather prediction in 1922. Richardson attempted to perform many kinds of low complexity numerical forecasts before World War II. The first successful numerical prediction was performed in 1950 by a team composed of American metrologists Jule Charney, Philip Thompson, Larry Gates, and Ragnar using the ENIAC digital computer. Climate prediction is a challenging task for researchers and has drawn a lot of research interest in the recent years. Many government and private agencies are working to predict the climate. In recent years, more intelligent weather forecast based on Artificial Neural Network (ANNs) has been developed. Two major Knowledge Discovery areas are (a) data analysis and mining, which extracts patterns from massive volumes of climate related observations and model outputs and (b) data-guided modeling and simulation (e.g., models of water and energy or other assessments of impacts) which take downscaled outputs as the inputs. In this survey we present some of the most used data mining techniques for climate prediction. But still it is a challenging task. In this paper, we survey various climate prediction techniques and methodologies. End of this survey we provide recommendations for future research directions. Keywords―Weather Forecasting, Climate Prediction, Temperature Control, Neural Network, Fuzzy Techniques, Knowledge Discovery, Machine Learning, Data Mining. I. INTRODUCTION Data mining is the process of extracting important and useful information from large data sets [1]. In this survey, we focus our attention on application of data mining techniques in weather prediction. Now a day’s weather prediction is an emerging research field. This work provides a brief overview of data mining techniques applied to weather prediction. Data mining techniques provides with a level of confidence about the predicted solutions in terms of the consistency of prediction and in terms of the frequency of correct predictions. Some of the data mining techniques include: Statistics, Machine Learning, Decision Trees, Hidden Markov Models, Artificial Neural Networks, and Genetic Algorithms. Basically data mining techniques can be classified as such as frequent-pattern mining, classification, clustering, and constraint-based mining [2]. Classification techniques are designed for classifying unknown samples using information provided by a set of classified samples. This set is usually referred to as a training set, because in general it is used to train the classification technique how to perform its classification. Neural networks and Support Vector Machines techniques learn from a training set how to classify unknown samples.
203

I. Kadar Shereef Head, Department of Computer Applications Sree Saraswathi Thyagaraja College Pollachi kadarshereef@gmail.com

In other words samples whose classification is unknown. The K- nearest neighbor classification technique does not have any learning phase, because it uses the training set every time a classification must be performed. Due to this problem, K- nearest neighbor is referred to as a lazy classifies. A major generic dispute in climate data mining results from the nature of historical observations. In recent years, climate model outputs and remote or in situ sensor observations have grown rapidly. However, for climate and geophysics, historical data may still be noisy and incomplete, with uncertainty and incompleteness typically increasing deeper into the past. Therefore, in climate data mining the need to develop scalable solutions for massive geographical data co-exist with the need to develop solutions for noisy and incomplete data [3]. The remainder of the paper is organized as follows. In Section 2 we present the related work for solving Climate prediction used data mining techniques. Section 3 provides a marginal explanation for future enhancement. Section 4 concludes the paper with fewer discussions. II. RELATED WORK Data mining and their applications have been utilized in different research areas and there is a bloom in this field. Different techniques have been applied for mining data over the years. Qiang yang and Xindong wu [4] discussed the ten important challenging problems in data mining research area. Most used ten data mining techniques are discussed in a paper [4]. Ganguly et al. in [3] explained the necessity of data mining for climate changes and its impacts. Knowledge discovery from temporal, spatial and spatiotemporal data is decisive for climate change science and climate impacts. Climate statistics is an established area. Nevertheless, recent growth in observations and model outputs, combined with the increased availability of geographical data, presents new opportunities for data miners. Their paper maps climate requirements to solutions available in temporal, spatial and spatiotemporal data mining. The challenges result from long-range, long-memory and possibly nonlinear dependence, nonlinear dynamical behavior, presence of thresholds, importance of extreme events or extreme regional stresses caused by global climate change, uncertainty quantification, and the interaction of climate change with the natural and built environments. Their paper makes a case for the development of novel algorithms to
http://sites.google.com/site/ijcsis/ ISSN 1947-5500

(IJCSIS) International Journal of Computer Science and Information Security, Vol. 8, No. 1, April 2010

address these issues, discussed the recent literature, and proposed new directions. An illustrative case study presented here suggests that even relatively simple data mining a proaches can provide new scientific insights with high societal impacts. Shyi-ming Chen and Jeng-ren Hwang together in [5] proposed a new fuzzy time series model called the two factors time – variant fuzzy time series model to deal with forecasting problems. In this proposed model, they developed two algorithms for temperature prediction. The author presented a one – factor time variant fuzzy time series model and proposed an algorithm called Algorithm-A ,that handling the forecasting problems. However, in the real world, an event can be affected by many factors. For example , the temperature can be affected by the wind , the sun shine duration, the cloud density, the atmospheric pressure,…etc., if we only one use one factor of them to forecast the temperature, the forecasting results may lack accuracy. The author can get better forecasting results if we consider more factors for temperature prediction. In [6-9], the researchers only use the one-factor fuzzy time series model to 6deal with the forecasting problems. The author proposed a new forecasting model which is a two - factors time –variant fuzzy time series model. He developed two algorithms which use two factors (ie. the daily average temperature and the daily cloud density) for temperature prediction. Author concluded that the forecasting results of Algorithm B* are better than the forecasting results of Algorithm-A and Algorithm-B. Acosta and Gerardo [10], presented an artificial neural network (ANN), implemented in a Field Programmable Gate Array (FPGA) was developed for climate variables prediction in a bounded environment. Thus, the new ANN makes a climate forecast for a main (knowledge based) system, devoted to the supervision & control of the greenhouse. The main problem to solve in weather forecasting is to achieve the ability of prediction of time series. The ANN approach seems attractive in this task from several points of view [11], [12]. He utilized there are various ANN architectures, capable to learn the evaluative features of temporal series, and to predict future states of these series from past and present information. He achieved the best system for a simple, low cost and flexible architecture of the ANN using the Field Programmable Gate Arrays (FPGA) technology. Nikhil R. Pal and Srimanta Pal in [13] mentioned the effectiveness of multilayer perceptron networks (MLPs) for prediction of maximum and the minimum temperatures based on past observations on various atmospheric parameters. To capture the seasonality of atmospheric data, with a view to improving the prediction accuracy, author then proposed a novel neural architecture that combines a Self Organizing Feature Map (SOFM) and MLP’s to realize a hybrid network named SOFM-MLP with better performance. They also demonstrate that the use of appropriate features such as temperature gradient cannot only reduce the number of features drastically, but also can improve the prediction accuracy. Based on these observations they used a Feature Selection MLP (FSMLP) instead of MLP. They used a combined FSMLP and SOFM-

MLP results in a network system that used only very few inputs but can produce good prediction. LAI.L.L. in [14] described a new methodology to short term temperature and rainfall forecasting over the east coast of claim based on some necessary data preprocessing technique and the Dynamic Weighted Time- Delay Neural Networks (DWTDNN), in which each neuron in the input layer is scaled by a weighting function that captures the temporal dynamics of the biological task. This network is a simplified version of the focused gamma network and an extension of TDNN as it incorporates Apriori Knowledge available about the task into the network architecture. Based on this architecture the forecast prediction result is approximately evaluated. Satyendra Nath Mandal in [15] presents generally soft computing model was composed of fuzzy logic, neural network, genetic algorithms etc., Most of the time , these 3 components are combine in different ways to form model, such as Fuzzy – Neuro Model, Neuro-genetic Algorithm model, Fuzzy – neuro- GA model etc., All this combination is widely used in prediction of time series data. But the author proposed models of soft computing using neural network based on fuzzy input and genetic algorithm have been tested on same data and based on error analysis ( calculation of average error ) a suitable model is predicted for climate prediction. Aravind sharma in [16] proposed a new technique is called Adaptive Forecasting Model. They represent a new approach where the data explanation is performed with soft computing technique. It is used to predict metrological position on the basis of measurements by a weather system designed. This model helped in making forecast of different weather conditions like rain and thunderstorm, sunshine and dry day and perhaps a cloudy weather system. (i.e.) purpose of this model is to represent a warning System for likely adverse conditions using sensors. He used at data recording at 4 samples per second [17] was adequate to see minute’s changes in atmospheric pressure and temperature trends. Perhaps sampling at every minute interval might have been all right as atmospheric conditions do not change very fast. At some places in bad weather, atmospheric conditions perhaps can change faster; hence, the instrument used for data recording did not miss any such signature and find no abrupt changes. S. Kotsiantis in [18] investigated the efficiency of data mining techniques in estimated minimum, maximum and means temperature values. To achieve, they conducted number of experiments with well-known regression algorithms using real temperature data of the city. Algorithms performance has been evaluated using standard statistical indicators, such as correlation co-efficient, Root mean squared error, etc., using this approach they found that the regression algorithms could enable experts to predict minimum, maximum and average temperature values with satisfying accuracy using as input the temperatures of the previous years. Y. Radhika and M. Shashi in [19] proposed an application of Support Vector Machine (SVM) for weather prediction. Time series data of daily maximum temperature at a location
204 http://sites.google.com/site/ijcsis/ ISSN 1947-5500

(IJCSIS) International Journal of Computer Science and Information Security, Vol. 8, No. 1, April 2010

is analyzed to predict the maximum of the next day at that location based on the daily maximum temperatures for a span of previous ‘n ‘ days referred to as order of the input. The performance of SVM was compared with MLP for different orders. The results obtained show that SVM performs better than MLP trained with back propagation algorithm for all orders. It was also observed that parameter selection in the case of SVM has a significant effect on the performance of the model. Yufu Zhang in [20] presented a statistical methodology for predicting the actual temperature for a given sensor reading. Author present two techniques: Single sensor prediction and multi-sensor prediction. The experimental results indicate that their methods can significantly increase the estimation accuracy of sensors temperature by up to 67% as compared to ignoring the error in sensor readings. The author also found that exploiting, the correlation of different sensors results in better thermal estimates than ignoring them and estimating each sensor temperature individually. Both single sensor case and multi-sensor case are investigated with different strategies of exploiting the correlation information. Optimal and heuristic estimation schemes are proposed to address the problem when the underlying nature of the sensor noise is Gaussian and NonGaussian. Ivan Simeonov in [21] explained the algorithmic realization of system for short-term weather forecasting, which makes acquisition, processing and visualization of information, related to the parameters temperature, atmospheric pressure, humidity, wind speed and direction. Some of the weather forecasting methods are 1) Persistence method, 2) Trends method, 3) Climatology method, 4) Analog method and 5) Numerical weather prediction method [22]. Based on the above methods, the author creates a new system for short term weather forecasting. The creation of the algorithm for short-term weather forecasting, based on the common and special features of known methods for weather forecasting and some surface features to the earth ground level. A system to predict the climate change was developed by Zahoor et al. in [23]. The impact of seasonal to inter-annual climate prediction on society, business, agriculture and almost all aspects of human life, force the scientist to give proper attention to the matter. The last few years show tremendous achievements in this field. All systems and techniques developed so far use the Sea Surface Temperature (SST) as the main factor, among other seasonal climatic attributes. Statistical and mathematical models are then used for further climate predictions. In their paper, they developed a system that uses the historical weather data of a region (rain, wind speed, dew point, temperature, etc.), and apply the data-mining algorithm “K-Nearest Neighbor (KNN)” for classification of these historical data into a specific time span. The k nearest time spans (k nearest neighbors) is then taken to predict the weather. Their experiments show that the system generates accurate results within reasonable time for months in advance. Wang et al. in [24] put forth a technique for predicting the climate change using Support Vector Machine (SVM). The climate model is the critical factor for agriculture. However,
205

the climate variables, which were strongly corrupted by noises or fluctuations, are complicated process and can not be reconstructed by a common method. In their paper, they adapted the SVM to predict it. Specifically, they incorporated the initial condition on climate variables to the training of SVM. The numerical results show the effectiveness and efficiency of the approach. The technique proposed in [24] was effective in predicting the variations in the climate using the initial conditions. Shikoun et al. in [25] described an approach for climate change prediction using artificial neural networks. Great development has been made in the effort to understand and predict El Nino, the uncharacteristic warming of the sea surface temperature (SST) along the equator off the coast of South America which has a tough collision on the climate change over the world. Advances in enhanced climate predictions will result in considerably enhanced economic opportunities, predominantly for the national agriculture, fishing, forestry and energy sectors, as well as social benefits. Their paper presents monthly El Nino phenomena prediction using artificial neural networks (ANN). The procedure addresses the preprocessing of input data, the definition of model architecture and the strategy of the learning process. The principal result of their paper is finding out the best model architecture for long term prediction of climate change. Also, an error model has been developed to improve the results. III. FUTURE DIRECTIONS Weather plays an important role in many areas such as agriculture. In a near future, more sophisticated techniques can be tailored to address complex problems in climate prediction and hence provide better results. In this study we found that neural network based algorithms are performance well comparatively other techniques. To improve the performance of the neural network algorithms other statistical based feature selection techniques can be incorporated. In the other direction fuzzy techniques have to be incorporated. IV. CONCLUSION In this section some of the main conclusions and contributions of the work are summarized. In conclusion, it is our opinion there is a lot of work to be done on this emerging and interesting research field. In recent years, more intelligent weather forecast based on Artificial Neural Network (ANNs) has been developed. This paper survey the methodologies used in the past decade of years for climate prediction. In particular, this survey presents some of the most extensively used data mining techniques for climate prediction. Data mining techniques provides with a level of confidence about the predicted solutions in terms of the consistency of prediction and in terms of the frequency of correct predictions. In the study we found that neural network based algorithms can provide better performance comparatively than other techniques. Furthermore, in order to improve the presentation of the neural network algorithms other statistical based feature selection techniques can be integrated. In the other direction fuzzy techniques can be incorporated to achieve better predictability.

http://sites.google.com/site/ijcsis/ ISSN 1947-5500

(IJCSIS) International Journal of Computer Science and Information Security, Vol. 8, No. 1, April 2010

REFERENCES
[1] [2] J. Abello, P. M. Pardalos, and M. G. C. Resende, “Handbook of Massive Data Sets,” Kluwer Academic Publishers (2002). Jiawei Han and Micheline Kamber, “Data Mining: Concepts and Techniques,” Second Edition, University of Illinois at UrbanaChampaign. Auroop R Ganguly, and Karsten Steinhaeuser, “Data Mining for Climate Change and Impacts,” IEEE International Conference on Data Mining, 2008. Qiang Yang, and Xindong Wu, “International Journal of IT & Decision making,” world scientific publishing company, vol. 5, no. 4, pp. 597-604, 2006. Shyi-ming Chen, and Jeng-ren Hwang, “Temperature prediction using fuzzy time series,” IEEE Transactions on systems, man and cybernetics, Part- B: cybernetics, vol. 30, no. 2, April 2000. Shyi-ming Chen, and Jeng-ren Hwang, “Forecasting enrollments based on fuzzy time series,” Fuzzy Sets Systems, vol. 81, no. 3, pp. 603-609. Shyi-ming Chen, and Jeng-ren Hwang, “Forecasting enrollments based on fuzzy time series – Part I,” Fuzzy Sets System, vol. 54, no. 1, pp. 1-9, 1993. Shyi-ming Chen, and Jeng-ren Hwang, “Forecasting enrollments based on fuzzy time series – Part II,” Fuzzy Sets System, vol. 62, no. 1, pp. 1-8, 1994. J. Sullivan, and W. H. Woodall, “A comparison of fuzzy forecasting and Markov modeling,” Fuzzy Sets Systems, vol. 64, no. 3, pp. 279293, 1994. Acosta, and Gerardo, “A Firmware Digital Neural Network for Climate Prediction Applications,” Proceedings of IEEE International Symposium on Intelligent Control, Sep 5-7, 2001, Mexico City, Mexico. Koskela, T. Lehtokangas, J. Saarinen and K. Kaski, “Time Series Prediction With Multilayer Perceptron, FIR and Elman Neural Networks”, Proceedings of the World Congress on Neural Networks, INNS Press, San Diego, USA, pp. 491-496, 1996. J. Corchado, C. Fyfe, and B. Lees, “Unsupervised Neural Method for Temperature Forecasting”, Proceedings of the International ICSC Symposium on Engineering of intelligent Systems EIS’98, vol. 2, Neural Networks, pp. 300-306, 1998. Nikhil R. Pal, Srimanta Pal, Jyotrimoy Das, and Kausik Majumdar, “SOFM – MLP: A Hybrid Neural Network for Atmospheric Temperature Prediction,” IEEE Transactions on Geoscience and Remote Sensing, vol. 41, no.12, Dec 2003. L. L. Lai, H. Braun, Q. P. Zhag, Q. Wu, Y. N. Ma, W. C. Sun, L. Yang, “Intelligent Weather Forecast,” Proceedings of the third international conference on Machine Learning and Cybernetics, Shanghai, 26-29 August 2004. Satyendra Nath Mandal, J. Pal Choudhury, S. R. Bhada Chaudhuri, and Dilip De, “Soft Computing Approach in Prediction of a time series data,” Journal of Theoretical and Applied information Technology JATIT. Aravind Sharma, and Manish Manorial, “A Weather Forecasting System Using the concept of Soft Computing: A New approach,” IEEE, 2006. K. Ochiai, H. Suzuki, and Y. Tokunaga, “Snowfall and rainfall forecasting from the images of weather radar with artificial neural networks,” Neural networks for Signal Processing, Proceedings of the IEEE Signal Processing Society workshop, pp. 473-481, 1998. S. Kotsiantis, A. Kostoulas, S. Lykoudis, A. Argiriou, and K. Menagias, “Using Data Mining Techniques for Estimating Minimum, Maximum and Average Daily Temperature Values,” 2007. Y. Radhika, and M. Shashi, “Atmospheric Temperature Prediction Using SVM,” International Journal of Computer Theory and Engineering, vol. 1, no. 1, April 2009. Yufu Zhang, and Ankur Srivastava, “Accurate Temperature Estimation Using Noisy Thermal Sensors” ACM, 2009. Ivan Simeonov, Hristo Kilifarev, and Rajcho llarionor, “Algorithmic realization of system for short-term weather forecasting,” ACM, 2007. http://ww2010.atmos.uiuc.edu/(Gh)/guides/mtr/fcst/mth/prst.rxml, 21 July 1997, Weather forecasting Methods. Zahoor Jan, Muhammad Abrar, Shariq Bashir, and Anwar M. Mirza, “Seasonal to Inter-annual Climate Prediction Using Data Mining KNN Technique,” Wireless Networks, Information Processing and Systems, vol. 20, pp. 40-51, 2009. Wang Deji, Xu Bo, Zhang Faquan, Li Jianting, Li Guangcai, and Sun Bingyu, “Climate Prediction by SVM Based on Initial Conditions,” Sixth International Conference on Fuzzy Systems and Knowledge Discovery, vol. 5, pp.578-581, 2009.

[25] N. Shikoun, H. El-Bolok, M. A. Ismail, and M. A. Ismail, “Climate Change Prediction Using Data Mining,” IJICIS, vol. 5, no. 1, pp. 365379,s 2005.

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

Lt. Dr. S. Santhosh Baboo, aged forty, has around Seventeen years of postgraduate teaching experience in Computer Science, which includes Six years of administrative experience. He is a member, board of studies, in several autonomous colleges, and designs the curriculum of undergraduate and postgraduate programmes. He is a consultant for starting new courses, setting up computer labs, and recruiting lecturers for many colleges. Equipped with a Masters degree in Computer Science and a Doctorate in Computer Science, he is a visiting faculty to IT companies. It is customary to see him at several national/international conferences and training programmes, both as a participant and as a resource person. He has been keenly involved in organizing training programmes for students and faculty members. His good rapport with the IT companies has been instrumental in on/off campus interviews, and has helped the post graduate students to get real time projects. He has also guided many such live projects. Lt. Dr. Santhosh Baboo has authored a commendable number of research papers in international/national Conference/journals and also guides research scholars in Computer Science. Currently he is Senior Lecturer in the Postgraduate and Research department of Computer Science at Dwaraka Doss Goverdhan Doss Vaishnav College (accredited at ‘A’ grade by NAAC), one of the premier institutions in Chennai.

[11]

[12]

[13]

[14]

I. Kadar Shereef, done his Under-Graduation (B.Sc., Mathematics) in NGM College, Post-Graduation in Trichy Jamal Mohamed college and Master of Philosophy Degree in Periyar University (distance education). He is currently pursuing his Ph.D., in Computer Science in Dravidian University, Kuppam, and Andhra Pradesh. Also, he is working as a Lecturer, Department of BCA, Sree Saraswathi Thyagaraja College of Arts and Science, Pollachi. He is having more than one year of research experience and more than 4 years of teaching experience. His research interest includes Data mining, Climate Prediction, Neural Network and Soft Computing.

[15]

[16]

[17]

[18]

[19]

[20] [21]

[22] [23]

[24]

206

http://sites.google.com/site/ijcsis/ ISSN 1947-5500

You're Reading a Free Preview

Download
scribd
/*********** DO NOT ALTER ANYTHING BELOW THIS LINE ! ************/ var s_code=s.t();if(s_code)document.write(s_code)//-->