You are on page 1of 3

Statistical Hydrology, Fall 2012

Carlos Serrano Moreno

Term Project Proposal
Title: Principal Component Analysis of Precipitation in Spain
Introduction Flood forecasting is one of the most important challenges in hydrological sciences nowadays. Providing alerts with an adequate anticipation time on the occurrence of the flood events mitigates its impact and brings enormous social benefits. Mediterranean areas are especially vulnerable to the occurrence of flash flood events, due to the steep slopes and the big amount of the runoff draining along the impermeable surface of the catchment. In order to be able to provide a sufficient lead time for mitigating the effects of this hazardous events scientific researchers use products as Numerical Weather Predictions (NWPs) or rainfall observations provided by weather radars. However, even these products are able to provide rainfall estimations at fine resolution the large uncertainty embedded in these simulations makes that all these estimations have to be pre-processed and corrected before being used as an input for hydrologic models. In order to try to correct this estimations researchers try to use all the information available, this means that it is also important work with the directly registered data that weather stations provide. Whether tools such as NWP's or weather radars are just providing rainfall estimations over the whole study area, weather stations provide a direct measurement of the variables at a location. Then, due to the high amount of weather stations available in Mediterranean areas it becomes also important to learn how to deal with large sets of data that have been obtained at different positions. Especially when working over a large domain (national or continental scale) it is very important to distinguish which of the stations are the ones that provide relevant data and be able to reject the stations that provide redundant information. Another typical situation where being able to prioritize the data is important also appears when dealing with weather stations. Normally, when trying to find long-term rainfall predictions one comes out with agencies or organizations that provide estimations of variables that can be related with rainfall such as temperature or pressure but a direct estimation of rainfall is unavailable. Also in this case the study of the data available from the local stations can help to find an accurate relation between temperature, pressure or any other variable with rainfall.


In most locations the registered information is available from January 1920 until August 2012. However. Due to the big amount of these variables the use of PCA technique becomes necessary in order to identify which of these variables are closely related with the rainfall and try to find a way of predicting the monthly rainfall by using one combination of the variables here given. Total Precipitation. Year. Fall 2012 Carlos Serrano Moreno One of the common techniques used to deal with large data sets is Principal Components Analysis (PCA). obtain new variables that are going to be linearly independent between each other. for this project data registered by 19 weather stations in Spain will be used so as to find the relationship among the variables registered that leads to provide a good estimator of the rainfall. Some of the variables registered at the weather stations are: Month. it will be interesting to find if the same PCA base can be used for understanding the problem all over Spain or if. It will not only be possible to identify which variables are the ones who have a stronger meaning inside. This technique is a statistical analysis method frequently used in the geophysical sciences to explain correlations in a large set of variables and provides a smaller number of independent components. it is expected to find that each meteorological variables plays a different role for each climatic area. Then the ones that will explain a higher % of the variance will be chosen so as to predict the monthly rainfall. Temp. Depending on the results obtained. but also. due to the big climatic differences between regions in the country. and thanks to the Spanish Meteorological Agency (AEMET). monthly data registered in 19 different weather stations placed in different provinces is available. Atmospheric Pressure. ad average isolation. hail days. Min. Snowy days. Max Temp. on the other hand. Max. In order to get familiar with the PCA technique. Daily precipitation. if a common relationship between the variables in all the stations is observed this one will be used to characterize the precipitation over Spain. The first goal will by identifying if the vectors of the PCA base are the same (or involve the same variables in the same way) for each station. By using PCA the complexity of the problem will be simplified because of the fact that a smaller number of variables will be involved in the estimation of the rainfall. Available data and objective: In order to do this project.Statistical Hydrology. Procedure: The PCA analysis will be performed in every different weather station as well as in the whole sample. By using PCA. Then according to the results. Temperature. there are some climatic regions inside the 2 . Av. Rainy days.

4. Kalayci . 205-214 3 ..C. Piechota. and T.Statistical Hydrology. S. Fall 2012 Carlos Serrano Moreno country that follow different patterns (some variables will be strongly correlated with the rainfall in some areas but not in the other ones). No. pp. Reference paper: Kahy E. 2008: Streamflow Regionalization: Case Study of Turkey. Vol 13. Journal of Hydrologic Engineering.