You are on page 1of 23

Politecnico di Bari

Statistical Methods for Environmental Analyses in a Changing Climate


Case Study on Wuhan City
🦇🍜

Dr. Eng. Vincenzo TOTARO


DICATECh

PhD School
Polytechnic University of Bari
2020/2021

Aras Botan Izzaddin IZZADDIN 35th cycle – DICATECh – Hydrology (ICAR/02) Kurdistan, Iraq
Elhussein Mohamed Fouad Mourad Hussein AHMED 35th cycle – DICATECh – Chemistry (CHIM/07) Cairo, Egypt
Mohamed KHALIL 36th cycle – DICATECh – Civil Eng. (ICAR/08) Syria
Outline

A – Dataset: loading the data , creating variables and plotting the data sets

B – Descriptive statistics statistical characteristics, sample statistics, box plot, skewness & kurtosis,

histogram &normal curve, and normal QQ-

plot.

C – Inferential analysis autocorrelation, stationary distribution, non stationary distribution, power

of the test , trend detection, and change point detection


Introduction and Methodology

Descriptive Statistics

Inferential Analysis

Conclusion
Introduction and Methodology

Descriptive Statistics

Inferential Analysis

Conclusion
Introduction and Methodology

• understanding variability and changes in


climate-driven phenomena.

• The decision rule is H0 is rejected if the p-value is smaller than the


prespecified significance level α. Otherwise, H0cannot be rejected

• 1) identification of the theoretical model;


2) estimation of the model parameters;
3) evaluation of the goodness of fit of the model to the data
Introduction and Methodology

NOAA station ID: CHM00057494 – REAL data!

Daily precipitation data from 01.01.1950


until 31.12.2021 for 25,680 days!

We chose the most famous city in 2020 “Wuhan” 🦇🍜!

Annual maxima was obtained by a


“for loop” in R.
Introduction and Methodology
Time series of Wuhan city
Year Precipitation Year Precipitation
1951 150.5 1986 109.8
1952 102.2 1987 85.9
1953 125.7 1988 111.9
1954 142.2 1989 95.5
1955 172.2 1990 110.1
1956 52.6 1991 209.8
1957 107 1992 104.1
1958 172.7 1993 108.8
1959 317.4 1994 74.5
1960 89.5 1995 110
1961 214.5 1996 116.6
1962 198 1997 106.5
1963 136.3 1998 285.7
1964 106.6 1999 122.2
1965 56.6 2000 115.3
1966 72.5 2001 84.7
1967 68.8 2002 89.3
1968 71.6 2003 100.5
1969 261.7 2004 157.4
1970 87.8 2005 103.2
1971 66 2006 82.2
1972 79.2 2007 148.6
1973 70.2 2008 87.3
1974 112.4 2009 121.6
1975 80.2 2010 96.7
1976 71.1 2011 197.9
1977 68.8 2012 155.2
1978 53.2 2013 191.8
1979 106.8 2014 73.7
1980 124.2 2015 161.3
1981 118.2 2016 189
1982 298.5 2017 67.8
1983 155.8 2018 68.1
1984 128 2019 17.8
1985 86.2 2020 36.6
Introduction and Methodology

Descriptive Statistics

Inferential Analysis

Conclusion
Statistical characteristics

first year 1951


last year 2020
record period 70 years
317.4
maximum record value
mm
minimum record value 17.8 mm

Precipitation
Min. : 17.8
1st Qu.: 80.7
Median :106.9
Mean :120.3
3rd Qu.:147.0
Max. :317.4
Position and disperses indices

• Variance, standard deviation, • Variance = 3546.92


range, coefficient of variation • SD = 59.56
• Range = 17.8 to 317.4
• Mean = 120.32
• CV = 0.495
Asymmetry of the distribution

Skewness= 1.37887

Kurtosis= 1.97481
Probability distribution

To see if our data normally distributed or not , we


observe our line and the point how it falls.,

Q-Q plot , by putting the sample numbers and the


value of our data in a plot
Introduction and Methodology

Descriptive Statistics

Inferential Analysis

Conclusion
Inferential analysis Auto correlation

Autocorrelation have been used within the same variable by


considering the time , which this can be implemented by using
lag and observing the residual auto correlation
Inferential analysis time-series
Gauged stations Gumbel maximum likelihood
“Estimation Method used: MLE"
Negative Log-Likelihood Value: 374.0057
Estimated parameters:
location scale shape
93.47094264 41.64393800 0.06227723
Standard Error Estimates:
location scale shape
5.49408596 4.00509125 0.07700801
Estimated parameter covariance matrix.
location scale shape
Auto
location correlation
30.18498 9.05564992 -0.119060040 mu sigma
Inferential analysis scale 9.05565 16.04075589 -0.030246987 94.87 42.44
shape -0.11906 -0.03024699 0.005930234
AIC = 754.0115 GEV maximum likelihood
Gumbel MOM BIC = 760.757
"Estimation Method used: GMLE"
"GEV Fitted using L-moments estimation." Negative Log-Likelihood Value: 375.3934
Estimated parameters:
location scale shape location scale shape
91.6100126 38.5884507 0.1456533 91.4450430 42.4420072 0.1857332
Standard Error Estimates:
location scale shape
5.56564537 4.36774044 0.09469461
Estimated parameter covariance matrix.
location scale shape
location 30.9764083 10.5449846 -0.114206547
scale 10.5449846 19.0771565 0.151223712
Power of likelihood ratio test

The power of likelihood depend mostly on trend coefficient and sample size

> lr.test(Gumbel_MLE, GEV_MLE)


Likelihood-ratio = 0.70124, chi-square critical value = 3.8415,
alpha = 0.0500, Degrees
of Freedom = 1.0000, p-value = 0.4024
alternative hypothesis: greater

> lr.test(Gumbel_MLE, GEV_MOM)


Likelihood-ratio = -2.0741, chi-square critical value = 3.8415,
alpha = 0.0500, Degrees
of Freedom = 1.0000, p-value = 1
alternative hypothesis: greater

> lr.test(GEV_MLE, GEV_MOM)


Likelihood-ratio = -2.7754, chi-square critical value = 0.00, alpha
= 0.05, Degrees of
Freedom = 0.00, p-value = 1
alternative hypothesis: greater
GUM AIC LR
Augmented Dickey-Fuller Test
alternative: stationary

Type 1: no drift no trend


lag ADF p.value
[1,] 0 -2.59 0.0110
[2,] 1 -1.62 0.0982 Non –stationary with no drift and no trend!
[3,] 2 -1.36 0.1881 (H0 connot be rejected)
[4,] 3 -1.22 0.2389
Type 2: with drift no trend
lag ADF p.value
[1,] 0 -6.89 0.01
[2,] 1 -4.71 0.01
[3,] 2 -4.23 0.01 The augmented Dickey–Fuller (ADF) statistic, used in the test, is
[4,] 3 -3.86 0.01 a negative number.
Type 3: with drift and trend The more negative it is, the stronger the rejection of the
lag ADF p.value hypothesis that there is a unit root at some level of confidence
[1,] 0 -6.90 0.0100 >> non-stationary.
[2,] 1 -4.74 0.0100
(H0 is rejected) [3,] 2 -4.26 0.0100
[4,] 3 -3.87 0.0207
----
Note: in fact, p.value = 0.01 means p.value <= 0.01
non-stationarity cannot be rejected for a 5% significance level.
Trend detection

Mann-kendall trend test Trend line

tau = -0.0398,
2-sided pvalue =0.63007

Approximate Cox-Stuart trend test


D- = 19, p-value = 0.306 (H0 cannot be rejected)
alternative hypothesis: data have a decreasing trend
Change point detection (step change)

Pettitt test
Median change point test / Pettitt’s test for change.
Pettitt's test for single change-point detection

This is a rank-based test for a change in the median of a series


with the exact time of change unknown.

U* = 317,
p-value = 0.3535 >> (H0 cannot be rejected)
alternative hypothesis: two.sided
sample estimates: probable change point at time K 13

H0 is rejected if the p-value is smaller than the prespecified significance level α. Otherwise, H0 cannot be rejected!
Introduction and Methodology

Descriptive Statistics

Inferential Analysis

Conclusion
Conclusion
References

Thank you
• Totaro, V.: On the use of TCEV and Kappa four-parameter distributions
for at-site flood frequency analysis, Ph. D. Thesis, Politecnico di Bari,
2020.

• Houghton, J. T. (2001). Climate Change 2001: The Scientific Basis.

• Pettitt, 1979, A non-parametric approach to the change point problem.


Journal of the Royal Statistical Society Series C, Applied Statistics 28,
126-135.

• Koutsoyiannis, D. and Montanari A.: Negligent killing of scientific


concepts: the stationarity case, Hydrolog. Sci. J., 60, 1174–1183,

• Reiss, R.-D. and Thomas, M. (2007) Statistical Analysis of Extreme


Values: with applications to insurance, finance, hydrology and other
fields. Birkhaeuser, 530pp., 3rd edition.

You might also like