Time series analysis in R

D G Rossiter Department of Earth Systems Analysis University of Twente, Faculty of Geo-Information Science & Earth Observation (ITC) Enschede (NL) July 23, 2012

Contents
1 2 Introduction 1

Loading and examining a time series 2 2.1 Example 1: Groundwater level . . . . . . . . . . . . . . . . . . 2 2.2 Example 2: Daily rainfall . . . . . . . . . . . . . . . . . . . . . . 12 2.3 Answers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 Analysis of a single time series 3.1 Summaries . . . . . . . . . . . . . . . . . . . . . . 3.1.1 Numerical summaries . . . . . . . . . . . . 3.1.2 Graphical summaries . . . . . . . . . . . . 3.2 Smoothing . . . . . . . . . . . . . . . . . . . . . . 3.2.1 Aggregation . . . . . . . . . . . . . . . . . 3.2.2 Smoothing by filtering . . . . . . . . . . . 3.2.3 Smoothing by local polynomial regression 3.3 Decomposition . . . . . . . . . . . . . . . . . . . . 3.4 Serial autocorrelation . . . . . . . . . . . . . . . . 3.5 Partial autocorrelation . . . . . . . . . . . . . . . 3.6 Spectral analysis . . . . . . . . . . . . . . . . . . . 3.7 Answers . . . . . . . . . . . . . . . . . . . . . . . . Modelling a time series 25 25 25 28 31 31 32 35 36 43 50 52 58 61

3

. . . . . . . . . . . .

. . . . . . . . . . . .

. . . . . . . . . . . .

. . . . . . . . . . . .

. . . . . . . . . . . .

. . . . . . . . . . . .

. . . . . . . . . . . .

. . . . . . . . . . . .

4

Version 1.0. Copyright © 2009–2012 D G Rossiter All rights reserved. Reproduction and dissemination of the work as a whole (not parts) freely permitted if this original copyright notice is included. Sale or placement on a web site where payment must be made to access this document is strictly prohibited. To adapt or translate please contact the author (http://www.itc.nl/personal/rossiter).

4.1 4.2

4.3

4.4

4.5 4.6 5

Modelling by decomposition . . . . . . . . . . . . . . . . . . . Modelling by linear regression . . . . . . . . . . . . . . . . . . 4.2.1 Modelling a decomposed series . . . . . . . . . . . . . 4.2.2 Non-parametric tests for trend . . . . . . . . . . . . . 4.2.3 A more difficult example . . . . . . . . . . . . . . . . . Autoregressive integrated moving-average (ARIMA) models 4.3.1 Autoregressive (AR) models . . . . . . . . . . . . . . . 4.3.2 Moving average (MA) models . . . . . . . . . . . . . . 4.3.3 ARMA models . . . . . . . . . . . . . . . . . . . . . . 4.3.4 ARIMA models . . . . . . . . . . . . . . . . . . . . . . 4.3.5 Periodic autoregressive (PAR) models . . . . . . . . . Modelling with ARIMA models . . . . . . . . . . . . . . . . . 4.4.1 Checking for stationarity . . . . . . . . . . . . . . . . . 4.4.2 Determining the AR degress . . . . . . . . . . . . . . . Predicting with ARIMA models . . . . . . . . . . . . . . . . . Answers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . .

61 62 68 71 73 75 75 82 82 82 83 86 86 90 94 95

Intervention analysis 98 5.1 A known intervention . . . . . . . . . . . . . . . . . . . . . . . . 98 5.2 Answers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100 Comparing two time series 101 6.1 Answers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105 Gap filling 7.1 Simulating short gaps in a time series 7.2 Gap filling by interpolation . . . . . . 7.2.1 Linear interpolation . . . . . . 7.2.2 Spline interpolation . . . . . . . 7.3 Simulating longer gaps in time series . 7.3.1 Non-systematic gaps . . . . . . 7.3.2 Systematic gaps . . . . . . . . . 7.4 Estimation from other series . . . . . . 7.5 Answers . . . . . . . . . . . . . . . . . . 107 108 110 110 111 114 114 118 120 126

6

7

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

8

Simulation 128 8.1 AR models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128 8.2 Answers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132 133 135

References Index of R concepts

ii

1

Introduction
Time series are repeated measurements of one or more variables over time. They arise in many applications in earth, natural and water resource studies, including ground and surface water monitoring (quality and quantity), climate monitoring and modelling, agricultural or other production statistics. Time series have several characteristics that make their analysis different from that of non-temporal variables: 1. The variable may have a trend over time; 2. The variable may exhibit one or more cycles of different period; 3. There will almost always be serial correlation between subsequent observations; 4. The variable will almost always have a “white noise” component, i.e. random variation not explained by any of the above. This tutorial presents some aspects of time series analysis (shorthand“TSA”1 ), using the R environment for statistical computing and visualisation [9, 13] and its dialect of the S language. R users seriously interested in this topic are well-advised to refer to the text of Metcalfe and Cowpertwait [12], part of the excellent UseR! series from Springer2 . The Comprehensive R Action Network (CRAN) has a “Task View” on time series analysis3 . This lists all the R packages applicable to TSA, categorized and with a brief description of each. In addition, Shumway and Stoffer [16] is an advanced text which uses R for its examples. Venables and Ripley [18] include a chapter on time series analysis in S (both R and S-PLUS dialects), mostly using examples from Diggle [5]. Good introductions to the concepts of time series analysis are Diggle [5] for biological applications, Box et al. [3] for forecasting and control, Hipel and McLeod [7] (available on the web) for water resources and environmental modelling, and Salas [15] from the Handbook of Hydrology for hydrological applications. Davis [4] includes a brief introduction to time series for geologists. Wilks [20, Ch. 8] discusses time series for climate analysis, with emphasis on simulation. If you are unfamiliar with R, you should consult the monograph “Introduction to the R Project for Statistical Computing for use at ITC” [14]; this explains how to use R on the ITC network, how to obtain and install your own (free and legal) copy, and how to use the documentation and tutorials. Or, you could work through the standard R tutorial provided with the R distribution. The tutorial is organized as a set of tasks followed by questions to check your understanding; answers are at the end of each section. If you are ambitious, there are also some challenges: tasks and questions with no solution provided, that require the integration of skills learned in the section.
1 2

Not to be confused with a bureaucratic monster with the same initials http://www.springer.com/series/6991 3 http://cran.r-project.org/web/views/TimeSeries.html

1

1.Note: The code in these exercises was tested with Sweave [10.txt and anatolia_alibe.2). Turkey..0 (2012-03-30). Q1 : What are some things that a water manager would want to know about this time series? Jump to A1 • Task 1 : Start R and switch to the directory where the example datasets are stored.15. 55.36 34.45 34.83 2 .55 54.7 . 2.1). These are provided as text files anatolia_hati. These also illustrate some of the problems with importing external datasets into R and putting data into a form suitable for time-series analysis. Monthly roundwater levels (§2. So. Daily rainfall amounts (§2. 2 Loading and examining a time series We use two datasets to illustrate different research questions and operations of time series analysis: 1.show("anatolia_hati.txt.6. here we use the file. • Task 2 : Examine the file for the first well. 11] on R version 2.txt") Here are the first and last lines of the file: 34. running on Mac OS X 10. from January 1975 through December 2004 CE..1 Example 1: Groundwater level The first example dataset is a series of measurements of the depth to groundwater (in meters) in two wells used for irrigation in the Anatolian plateau. Then the A L TEX document was compiled into the PDF version you are now reading.show function: > file.96 55. the text and graphical output you see here was automatically generated and incorporated A into L TEX by running actual code through R and its packages. • You could review the file in a plain-text editor. 2. Your output may be slightly different on different versions and on different platforms.

4 34. because the end will just be when the vector ends. • Q3 : What is the structure? Jump to A3 • Task 4 : Convert the vector of measurements for this well into a time series and examine its structure and attributes. The ending date end is optional if either frequency or deltat are given..8 34..7 34.txt") > str(gw) num [1:360] 34.4 34. The simplest way to specify it is by starting date and frequency.. • The ts function converts a vector into a time series.ts(gw. it is a monthly series. ˆ end : ending date of the series.000 2004.9 .7 34. In this example we know from the metadata that the series begins in January 1975 and ends in December 2004. Using the scan function to read a file into a vector: > gw <.Q2 : Can you tell from this file that it is a time series? Jump to A2 • Task 3 : Read this file into R and examine its structure.scan("anatolia_hati. start = 1975. they are two ways to specify the same information.000 3 .917 $class [1] "ts" 12. of which enough must be given to specify the series: ˆ start : starting date of the series. frequency = 12) > str(gw) Time-Series [1:360] from 1975 to 2005: 34..5 34.5 34. > gw <. it has several arguments. After the series is established. ˆ deltat : fraction of the sampling period between successive observations. > attributes(gw) $tsp [1] 1975. we examine its structure with str and attributes with attributes.8 34.9 . Only one of frequency or deltat should be provided. ˆ frequency : number of observations in the series per unit of time.

000 1975.417 2003.02 55. 2003 2004 53...667 1976..ts to show the actual values..83 1975 1976 . Task 5 : Print the time series.35 55.250 1975. 2003 54.500 1976.28 56.917 2004.250 Nov 1975.45 34.11 57.52 55.39 34.583 > cycle(gw) 2003.48 51.66 52.667 2004.32 53.20 35.88 35.750 2003.833 1976.500 Dec 1975.22 35.55 > time(gw) 1975 1976 .917 1976.45 34. 2003 2004 Jan Feb Mar Apr May Jun Jul 1975.70 34.000 1976.750 2003.000 Aug 1975 1975..70 52.500 2004.> start(gw) [1] 1975 > end(gw) [1] 2004 12 1 In the above example we also show the start and end functions to show the starting and ending dates in a time series. • The generic print method specialises into print.917 2003.92 .000 2004.500 2003..583 1976 1976.32 34.28 33. also show the the time associated with each measurment.71 51. > print(gw) Jan Feb Mar Apr May Jun Jul Aug Sep Oct 34. and the position of each observation in the cycle.78 54.86 35.750 2004.417 1976.833 2004. 2003 2003.583 2004 2004.41 51.47 2004 55.80 34.21 54.833 2003.167 1976.833 2003.51 35.417 1975.083 1976.46 56.84 54.60 35.54 34.18 52.333 2003.333 1975.333 1976.71 52.73 54.16 35.417 2004.167 2004.667 2003.98 35..18 34.583 .70 35.11 55.083 2004.167 1975.250 2004.250 1976.083 Sep 1975.37 51.86 35..48 1976 33.96 Dec 35.36 34..08 Nov 1975 35.750 1976.167 Oct 1975. time shows the time associated with each measurment.917 1975 1976 . Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec 1 2 3 4 5 6 7 8 9 10 11 12 1 2 3 4 5 6 7 8 9 10 11 12 4 . and cycle shows the position of each observation in the cycle..07 50.667 2003.333 2004.083 1975.

There are several ways to visualise this.ts for time-series objects.2003 2004 1 1 2 2 3 3 4 4 5 5 6 6 7 7 8 8 9 9 10 10 11 11 12 12 Q4 : What are the units of each of these? Jump to A4 • Task 6 : Determine the series’ frequency and interval between observations. > frequency(gw) [1] 12 > deltat(gw) [1] 0. • The generic plot method specialises to plot. we show a few possibilities here: 5 .08333333 Q5 : What are the units of each of these? Jump to A5 • Task 7 : Plot the time series. but this information can also be extracted from the series object with the frequency and deltat. • Of course we know this from the metadata. methods. as in the arguments to ts.

46 40. end = c(1993.48 1992 42. main = "Anatolia Well 1") 6 . year) and position in cycle (here. from the shallowest depth at the end of the winter rains (April) beginning in 1990. pch = 20.42 41.92 1991 43. main = "Anatolia Well 1") plot(gw. main = "Anatolia Well 1") par(mfrow = c(1. ylab = "Depth to water table (m)".16 42.13 41. start = c(1990.44 40. connected with the c ‘catentate’ function: > window(gw.28 43. start = c(1990.22 1992 42.05 42. ylab = "Depth to water table (m)". 1)) Anatolia Well 1 Anatolia Well 1 q q q q q q q q q q q q qq q q q q q qq q q q q q q q q q q q q q q q Anatolia Well 1 55 55 q q q q q q q q 50 50 q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q Depth to water table (m) Depth to water table (m) q q q q q q q q q q 45 45 q q q q q q q q q q q q q q 30 30 1975 1980 1985 1990 Time 1995 2000 2005 1975 1980 1985 1990 Time 1995 2000 2005 30 1975 q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q 35 35 35 q q q q q qq q q q q qq q qqq q q q q q q qq q q qq q q q q q qq q q q q q q q q q q q q q q q qq q qq q qq q q q q q q q q q q q qq q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q 40 40 40 45 q q q q qq q q q q q qq qq q q qq q Depth to water table (m) q q q q q q q q q q qq q q 50 q q q q q 55 qq 1980 1985 1990 Time 1995 2000 2005 Q6 : What are the outstanding features of this time series? Jump to A6 • Task 8 : Plot three cycles of the time series.88 45. type = "h".61 41.6.74 40. 4).34 1991 41.48 1993 > plot(window(gw. cex = 0. ylab = "Depth to water table (m)". 3)) plot(gw.79 43. 3)).73 42. 3)) Jan Feb Mar Apr May Jun Jul Aug Sep Oct 1990 39.11 43.28 Nov Dec 1990 42.49 44.52 41.07 40.06 44.20 44. type = "o". main = "Anatolia Well 1") plot(gw.54 1993 42. col = "blue". col = "blue".85 40.11 43.82 40. • We use the window function to extract just part of the series. + type = "o".26 44.09 45. end = c(1993.> > > + > + > par(mfrow = c(1.16 42. 4). to see annual behaviour. The start and end can have both a cycle (here.66 41.27 42. month). ylab = "Depth to water table (m)".50 43.82 41.19 43.

38 39. cool and moist in the winter.50 43.44 40.54 Nov Dec 1990 42.61 41.95 Another way to view a time series is as the differences between successive measurements.07 40.19 43. the lag argument specifies different lags (intervals between measurements): > window(gw.82 41. lag 2 (two months).73 42.06 44.52 41. • The diff function computes differences. and lag 12 (one year). end = c(1990. The window function may also be used to extract a single element.26 44.11 43.5 1993. 1990.22 1992 42.88 45.89 39. for the period 1990 – 1992.79 43.0 1991.0 1992.27 42.34 1991 41.92 7 .82 40.0 Q7 : What is the annual cycle? Jump to A7 • This region of Turkey has a typical Mediterreanean climate: hot and dry in the summer.49 44. These wells are used for irrigation in the summer months. 12)) Jan Feb Mar Apr May Jun Jul Aug Sep Oct 1990 39.74 40. 1)) Jan 1990 39. c(1992. by specifying the same start and end date: > window(gw. by default for successive measurements.5 Time 1992.5 1991. 1).11 43.09 45.85 40.Anatolia Well 1 45 q q q 44 q q q q q q Depth to water table (m) q 43 q q q q q q q q q q q q q q q q q q q q q 41 42 q q q 40 q 39 q 1990.16 42. start = c(1990.20 44.46 40.28 43.95 39. Task 9 : Compute and view the difference for lag 1 (one month).42 41.

64 Nov Dec 1990 -1.55 1992 -1.18 -0.96 -0.08 Apr 1.31 1991 -0.00 Nov Dec 1990 -0.05 42.59 1.78 0.11 -0. etc.03 -0.09 0.22 1992 -0.07 -0.75 0.84 0.17 -1.62 -0. differences of differences (order 2). that is.90 -1.91 Jun 2. c(1992.78 -0. • > diff(window(gw.23 -1.00 Mar 1.07 Aug Sep Oct 0.68 Q8 : What happens to the length of resulting series with different lags? Jump to A8 • Q9 : Do you expect the one-month differences to be the same for each month within a year? Are they? Jump to A9 • Q10 : Do you expect the one-month differences to be the same for the same month interval in different years? Are they? Jump to A10 • Specifying the differences argumeent to diff computes higher-order differences.39 0.61 Dec 0.21 -0.11 -0. c(1992. 12)). 1990.61 1.73 -0.48 -0.51 -0. c(1993.49 0.32 -0.27 0.06 May 0.30 May Jun Jul Aug Sep Oct 1.06 -0.74 -1.61 2. lag = 2) Jan Feb Mar Apr 1990 -0.18 -1.48 1992 42.02 -1.91 -0.1991 43.77 1.77 1. 1990.36 0.08 -0. second and third order differences of the series from 1990 – 1993. lag = 12.04 2. 12)).64 0.59 -0.43 May 1.57 -0.56 0.60 -0.60 -0.46 0.88 0.45 1.82 1991 -1. 12))) Jan Feb Mar Apr 1990 -0. 1990.02 -0.33 1992 -0.96 0.68 0.42 1991 -1. c(1992. 12)).21 -0.74 > diff(window(gw.59 -0.70 1.30 Sep Oct 0.53 0.91 Jun 3.98 2.98 1.68 1992 -1.16 42.06 -0.74 1992 -2.11 1991 1992 Feb 0.95 > diff(window(gw.46 Jul 0. differences = 1) 8 . lag = 12) Jan 1.84 1.45 0.70 -0.23 -0.34 0.37 Jul 3.53 Aug 1.06 -0.98 Nov 1991 0.87 1.48 > diff(window(gw.43 1.49 0. 1990. Task 10 : Compute the first.69 1991 -1.

17 0.86 Q11 : What is the interpretation of these differences? Jump to A11 • Task 11 : Plot the first differences for the whole time series.28 2.96 0. differences = 2) Jan Feb Mar Apr May 1992 -0.38 0.30 > diff(window(gw.79 Jun 0. 1990.08 0. lag = 12.15 -0.56 1.24 1.34 2.27 -0.11 1993 2.76 -0.98 1.41 0.30 -1.62 -0.21 1.62 > diff(window(gw.20 Dec 0.00 1.24 1.56 1993 3.13 2.50 1. + main = "Anatolia well 1") Anatolia well 1 One−month differences in groundwater depth (m) −6 1975 −4 −2 0 2 1980 1985 1990 Time 1995 2000 2005 9 .68 1.01 Mar 1.60 -0.66 -0. 12)).42 Sep 4.88 0. lag = 12.54 Aug 3.66 Apr 1.45 Mar 1.27 0.57 Feb 0.55 1992 -1.68 0.45 0.11 -0.06 Nov Dec 1993 5.36 0.56 0.47 -1.47 May Jun Jul Aug Sep Oct 1.46 0.62 1.39 0.98 1993 -0.46 Apr 1.51 -0.66 Jul 3.30 0.28 -1.84 2.08 -0.35 -1. 12)).61 0.58 -2.45 Jun Jul Aug Sep Oct 0. • > plot(diff(gw).33 Oct 3. c(1993.78 -0. ylab = "One-month differences in groundwater depth (m)".32 May 0.68 0.60 1. c(1993.05 -1.49 1992 0.65 1993 -1.14 Nov Dec 1992 -1.Jan 1991 1. 1990. differences = 3) Jan Feb 1993 -0.29 Nov 1991 0.

36 > time(gw)[i] [1] 1988.98 2.500 2000.74 > time(gw)[i] [1] 2000.70 2.167 1990.417 2001.500 2002.333 > cycle(gw)[i] [1] 3 5 1 7 6 7 5 5 10 .51 2.57 2. The which.which(abs(diff(gw)) > 2) > diff(gw)[i] [1] -7. and then all of these where the level changed by more than 2 m in one month. in either direction: > i <.000 1998.36 2.74 2.34 -2.which.333 1996.167 > cycle(gw)[i] [1] 3 > i <. compared to the time series itself.417 > cycle(gw)[i] [1] 6 > i <.max(diff(gw)) > diff(gw)[i] [1] 2.333 [8] 2003.07 > time(gw)[i] [1] 1988.min(diff(gw)) > diff(gw)[i] [1] -7. Q12 : What are the outstanding features of the first differences? Jump to A12 • Task 12 : Identify the months with the largest extractions and recharges. • The which function identifies elements in a vector which meet some criterion.max are shorthand for finding the minimum and maximum.which.This gives us a very different view of the sequence.min and which. We first find the most extreme extractions and recharge.

is this correct.43 40.12 41. start = 1986.58 Nov Dec 1987 44.94 45. end = 1990). given than no other month in the time series had more than about 2.78 39.49 44.96 40. end = c(1988.06 46.42 46. end = 1990). the cycle function is used to display the position in the cycle. ylab = "Groundwater depth (m)". start = 1986. the resulting vector of dates is subsetted with the [] indexing operator. 6)) Jan Feb Mar Apr May Jun Jul Aug Sep Oct 1987 39. + main = "Anatolia well 1".73 43.Note the use of the time function on the series to get the dates corresponding to the selected measurements.18 1988 46. type = "h".42 39. lty = 2) Anatolia well 1 Groundwater depth (m) 40 1986 42 44 46 1987 1988 Time 1989 1990 Q13 : Is the pattern for Spring 1987 – Spring 1988 consistent with the surrounding years? Jump to A13 • Here are the actual measurements for the relevant time period: > window(gw.53 1988 Q14 : How could this anomaly be explained? Jump to A14 • 11 . here the month. 3).5 m recharge? First.39 42. we take a closer look at the measurements for this and surrounding years: > plot(window(gw.58 43. start = c(1987. Similarly. col = "blue") > lines(window(gw. The extreme recharge (about 7.64 39.5 m) in March 1988 draws our attention.

the CSV file is Tana_Daily.6 12 .2 Example 2: Daily rainfall The second example dataset is a series of daily rainfall records from a station in the Lake Tana basin. to quote the R Data Import/Export manual [17. Q15 : What could be some research questions for this time series? to A15 • Jump This is a typical “messy” Excel file.2. Here is a screenshot of the original file: This is a nice format to view the data but not to analyze as a time series. The first piece of advice is to avoid doing so if possible!” The recommended procedure is to export each sheet of the Excel file to a separate comma-separated value (“CSV”) file. §8]: “The most common R data import/export question seems to be: ‘how do I read an Excel spreadsheet?’ . for 26 years (1981 – 2005). .csv. using Excel or another program that can read Excel files and write CSV files.show function: > file. Again we use the file. Ethiopia. . First. we must do some substantial manipulations to get it into useful format for R. We have done this4 . It is possible to reformat and clean up the dataset in Excel. but much more difficult than the systematic approach we can take in R.show("Tana_Daily.csv") 4 • using the Numbers application on Mac OS X 10. Task 13 : View the contents of the CSV file.

.0..26.0.SEP.13..DATE.0 .csv". 2)..3" "1..36.skip = T.0.. "0" "0" "0" "0" . " ")) > str(tana) 'data.0.12. skip = 1.9. ˆ header=T to specify that the first line read (after skipping) contains the variable names.5. "0" "0" "0" "0" ..1..4. ˆ na.6.3" "12. ..table for details). + colClasses = c(rep("integer".5" "0. Note that optional arguments sep and dec can be used if the separator between fields is not the default “..A". "0" "0" "0" "0" .0. + "NA".0.31.5.0.8. here we use: ˆ skip=1 to skip over the first line. 1 2 3 4 5 6 7 8 9 10 . "0" "0.31.0.85.Here are the first and last lines of the file: BahirDar.0.8..0.0. blank.15.0.8..7.0.MAY. na..0.32.APRIL.0. "0" "0" "0" "0" .0.read. of 14 variables: 1981 NA NA NA NA NA NA NA NA NA .JUNE. > tana <..29. ˆ blank. .0. which is a version of the more general read.28... "14...0.9.0 .4.6.0.5.0.0.. ˆ colClasses to specify the data type of each field (column). header = T.0..0. rep("character".3" "0.0.0.9" "29. which is just the station name.0..MAR..0..2. 13 .0. here the months.0.1.OCT..0. "0...0.DEC 1981...0.9.1.csv function.JULY.4" .4" "27.5.0.0.24..18.30.0. note blanks are also considered missing.frame': $ YEAR : int $ DATE : int $ JAN : chr $ FEB : chr $ MAR : chr $ APRIL: chr $ MAY : chr $ JUNE : chr $ JULY : chr $ AUG : chr $ SEP : chr $ OCT : chr 831 obs.0.33. ..2.9" "17.6" . YEAR.0.csv("Tana_Daily.0.1.11...NOV. + 12)).9" "31.1.A is a missing value.0.6" .0..skip=T to skip blank lines. Task 14 : Read the CSV file into an R object and examine its structure.6.3.7.A" to specify that any string that is identically N.. .” or the decimal point is not the default “.28.0. These have quite some useful optional arguments (see ?read. "26...6" "0" .4" "0..0.9" "8.3.FEB.28.AUG.0.0..0.”..0.0. "13.0.strings="N.0. • We use the very flexible read.32.14..0.0.0.3.8" "0" "0" "0" ... "0" "0" "0" "0" .3.0.JAN.3.lines.0.9.2. this is because we want to treat all the rainfall amounts as character strings to fix the “trace amount” entries specified with the string TR..0.strings = c("N..table metod..lines.

which can be identified with is.5 4.2 5. Q16 : What is in each field of the dataframe? Jump to A16 • There are many missing values.3 0.8 0 0 0 0 21.3 27 NA 27 0 0 0 0 0 28 NA 28 0 0 0 0 0 29 NA 29 0 0 0 0 30 NA 30 0 0 0 0 31 NA 31 0 0 0 32 NA NA 33 1982 1 0 0 0 0 0 34 NA 2 0 0 0 0 0 35 NA 3 0 0 30 0 0 JUNE JULY AUG SEP OCT NOV DEC 0.4 0 0 0 0.9 6.4 0 0 12.9 3.2 0 58. > tana[1:35.4 0 0 0 0 0 0 1.9 13.9 1.5 8.4 0 0 0 10.8 0 25 NA 25 0 0 0 0 0 26 NA 26 0 0 0 0.4 48.7 8.4 0 24 NA 24 0 0 0 52.4 23.4" "0" "0" "0" .9 0..7 0 0 0 108..5 0 0 0 6.3 0 0 0 0 0 28.3 0 0 0 0.7 11.8 0 0 0 0 0 14.9 25.csv function creates a so-called dataframe.8 9.3 2.1 1.6 8.na function.4 1.5 0 0 0 0.2 0.3 0 0 0 2.2 7.5 7.8 0.3 16.6 0.3 6.6 25.9 2.2 0.3 0.9 21.4 1.5 0 0 0 0 0.6 0.5 1.7 9 NA 9 0 0 0 0 1.5 0.7 20. the sum function then shows how many in total.9 26.4 0 0 0 0 0 11.8 26.3 16.6 23 NA 23 0 0 0 0.1 16 NA 16 0 0 0 0 0 17 NA 17 0 0 0 0 0 18 NA 18 0 0 0 0 2.8 0 0 0 0 0 0.8 0 0 0 6 0 0 0 0 0 0 7 1 2.3 2.9 1.9 31.6 8.3 0.2 0 0 32.2 17.4 22 NA 22 0 0 0 14.3 11. We also compute the number 14 ..5 3 11.5 17.9 1.6 0 0 0 0 11. Only the first line of each year has the year number in the first field.5 26.7 8.9 0.8 0 0 0 10.6 27.4 0.2 0 5.7 0 4.4 3.9 0 0 0 8.2 10 NA 10 0 0 0 0 0 11 NA 11 0 0 0 0 0 12 NA 12 0 0 0 0 0 13 NA 13 0 0 0 0 0 14 NA 14 0 0 0 0 0 15 NA 15 0 0 0 0 7.8 23.2 5.5 0.3 5.$ NOV $ DEC : chr : chr "1.2 8 2 0 0 0 0 64.4 0 1. "0" "0" "0" "0" .7 0 0 0 0.2 0.3 14.9 20.8 0 0 0 3.8 16.5 0 0 0 19.4 27. ] YEAR DATE JAN FEB MAR APRIL MAY 1 1981 1 0 0 0 0 0 2 NA 2 0 0 0 0 0 3 NA 3 0 0 0 0 0 4 NA 4 0 0 0 0 0 5 NA 5 0 0 0 0 0 6 NA 6 0 0 0 0 0 7 NA 7 0 0 0 0 0 8 NA 8 0 0 0 0 3.3 20 NA 20 0 0 0 0 0 21 NA 21 0 0 0 0 4.4 0 0 0 0 0.7 0 1.3 0 0 5.9 0 0 7.7 25 32.5 11.7 19 NA 19 0 0 0 0.2 30 6. which is a matrix with named fields (columns).8 30.8 24.7 0 0.1 4.1 6.6 4.3 1.8 0 0 21.6 0 0 0 29.. The read.1 3.7 <NA> 0 0 0 0 0 0 We can see that each year has 31 lines followed by one lines with only two NA missing values.2 0 0 3.

3:14] == "0.5 0 0.5 0.5 Q17 : What are the meanings of the "" (empty string). 0.01 1 9. and sort these for easy viewing with sort.. To get all the months as one vector we stack columns 3 through 14 with the stack function: > sort(unique(stack(tana[.rm = TRUE) [1] 69 > sum(tana[.rm = TRUE) [1] 457 Note that the null values are not missing.rm = TRUE) [1] 2 > sum(tana[.0. 0.6 0. and how many zeroes: > sum(tana[. 30-February).1 [501] 95.2 9. or (since they have very little effect on rainfall totals) to zero.4 1. na. na.5 1..1 1.of blank records.1 9. or one order of magnitude smaller. 3:14] == "".3 1.g.2 TR 0.8 9. 3:14] == "0". na.rm = TRUE) [1] 4920 The trace values are conventionally set to half the measurement precision. [491] 9.0 0.6 1.01". 3:14])$values)) [1] [11] 0.9 9. na.2 1.3 tr 0. 3:14] == "TR".9 0.5 90. na. 3:14] == "tr".rm = TRUE) [1] 14 > sum(tana[. We can see all the values with the unique function.0 9. and "TR" (also written "tr") values? Jump to A17 • Let’s see how many of each of the problematic values there are.rm argument is needed to remove missing values before summing: > sum(is.4 9. they are for days that do not exist (e. 3:14])) [1] 170 > sum(tana[. Q18 : What is the measurement precision? Jump to A18 • 15 .na(tana[.7 .3 9.4 0.8 0.7 0. the na.

.9 82.1 to zero.7 1. 3:14])$values)) [1] [11] 0.rm = TRUE) [1] 0 > sum(tana[. the sequence must be by year. To detect seasonality we need equal-length years.'0. • This is complicated by the varying month length.5 0.01')='0'") + } > sort(unique(stack(tana[..4 0. so these years had 366. Fortunately in this case. as required for time series analysis.3 1.rm = TRUE) [1] 0 > sum(tana[. .rm = TRUE) [1] 5005 Task 16 : Organize the daily values as one long vector of values.2 1.5 9. na.6 9.8 0.0 90. .recode(tana[. • For this we use the very useful recode function in the car “Companion to Applied Regression” package from John Fox [6].5 1.1 85.0 1 83 9. "FEB"] [1] [12] [23] [34] [45] "" NA "0.0 83.1 9. and there was no rain on this date in any of the leap years: > tana[tana$DATE == 29.5 The problematic values have been replaced by zeroes. February had 29 days in 1984. The require function loads the library if it is not already loaded.'tr'. but each month’s vector contains all the years. i] <.Task 15 : Set the trace values and any measurements below 0. 2004. 3:14] == "tr".3 0.4 9 9.2 0 0.3 9.. February is a dry month.0" NA "0" NA NA NA "" NA NA NA "" NA "" NA "0" NA "0" NA NA NA "" NA "" NA "" "" NA "0" NA NA "" NA "" "" NA NA NA 16 .0" NA "" NA "" NA NA NA "" NA "" NA "0. "c('TR'. not 365 days.5 9.4 1.8 . 3:14] == "0".6 1.3 89.2 87. so it is not possible simply to stack the 12 vectors. Finally.5 0.6 0.7 0.9 0. Also. > require(car) > for (i in 3:14) { + tana[.1 95.rm = TRUE) [1] 0 > sum(tana[. 1988.9 [491] 9. as we can verify by repeated the summary of “problematic” values: > sum(tana[. na. [481] 8.7 9.01". na.2 9.1 1. na. 3:14] == "0. i]. 3:14] == "TR".

by = 32. start = 1981. > rm(month. 0. month. 30.c(tana. 30. the ts function is used to convert the series. 31.c(0.. 31. specifying their length by our knowledge of the length of each month. 31. 30. 31. 31) for (yr. • Again. Note the zero-length “months” which will match with the first two columns of the dataframe (fields YEAR and DATE).days <. showing them as red vertical bars near the top of the plot: 5 Apologies to anyone born on this date! 17 . > > + > + + + + + + > tana. yr.ppt <.row:(yr.ppt.row in seq(from = 1.NULL month.days.row + month. 28.ppt.first.days[month..ppt <. month.col) Check that this is an integral number of years: > length(tana. tana[yr..ppt <. 30. 31.first. length = (2006 1981 + 1))) { for (month.first.row.So a simple if somewhat inelegant solution is to ignore records for 29February5 We use the c “catentate” function to stack vectors.1).ppt) chr [1:9490] "0" "0" "0" "0" "0" "0" "0" "0" .ppt)/365 [1] 26 Task 17 : Convert this to a time series with the appropriate metadata.col in 3:14) { tana.col]) } } str(tana. 31.col] .. frequency = 365) > str(tana.first.ts(tana.ppt) Time-Series [1:9490] from 1981 to 2007: 0 0 0 0 . We enhance the plot by highlighting the missing values. the frequency argument specifies a cycle of 365 days and the start argument specifies the beginning of the series (first day of 1981): > tana.

recycle = T).na(tana.0 for (i in yrs) { ymax <.ceiling(ymax)) [1] 91 18 . 1989. i + 1). To compare several years side-by-side. > > > + + + > yrs <. 2006) ymax <. ylab = "mm") abline(h = 100. we compute the maximum daily rainfall with the max function. i.numeric before use in ylim.ppt).rm = T)) } (ymax <. Note: A small complication is that ylim requires numeric arguments. 1991. and the most recent year. y = 100.coords(x = time(tana. 1988. 1999. pch = ifelse(is. but max returns a character value. window(tana. col = "gray") points(xy. na.ppt. This is also the case for the sum function used in the graph subtitle.> > > + > plot(tana. 1983.ppt.numeric(max(ymax. The points function is used to place points on top of a bar graph created with the plot function (with the optional type="h" argument). and use it to set a common limit with the optional ylim argument to plot.c(1982. main = "Lake Tana rainfall". 1998. with these highlighted. ""). col = "red") grid() Lake Tana rainfall 100 l ll l l llllllll l l mm 0 1980 20 60 1985 1990 Time 1995 2000 2005 There are six years with a few missing observations. "l".ppt). To zoom in on the within-year structure. and a long series of missing observations in 1991. even on a numeric time series.as. This must be converted to a number with as. we display one-year windows for the years with missing values.

0 1982.ppt.ppt.4 1988.8 1999.8 1984. type = "h".2 1983.3 Lake Tana rainfall 1988 ll 80 80 Lake Tana rainfall 1989 ll 60 mm 40 mm 1988. start = i.3 Lake Tana rainfall 1991 llllllllllllllllllllllllllllllllllllllllllllllllllllllllll 80 80 Lake Tana rainfall 1998 l 60 mm 40 mm 1991.6 2006.8 1989. end = i + 1)).4 1991. end = i + 1)). pch = ifelse(is.0 20 0 0 1989. col = "gray") points(xy.0 1999.4 1998.8 2000.6 1988. ymax)) title(main = paste("Lake Tana rainfall".6 1998.ppt. na. 1)) Lake Tana rainfall 1982 l Lake Tana rainfall 1983 l 80 l 60 80 mm 40 mm 1982.rm = T))) abline(h = ymax.2 . recycle = T).6 Time Annual total: 1257.0 20 40 60 1998.3 Time Annual total: 1422.2 2006.0 1988.0 20 0 0 1983.0 2006.6 1991.6 1989.0 20 40 60 1989. ylim = c(0.4 1989. i). "").4 1999.4 0 Time Annual total: 1683.0 0 1999.1 Time Annual total: 1595.6 1999.8 2007.4 1982.8 1992.4 2006.2 1988.2 1998.6 1982.0 1991.> > + + + + + + + + + + + > par(mfrow = c(4. start = i.ppt.6 Lake Tana rainfall 1999 l 80 80 Lake Tana rainfall 2006 60 mm 40 mm 20 20 40 60 19 2006. start = i.4 1983. 2)) for (i in yrs) { plot(window(tana. end = i + 1)).0 Time Annual total: 894. "l".coords(x = time(window(tana.numeric(window(tana. sub = paste("Annual total:".8 1990. sum(as.8 1983. end = i + 1).0 20 0 0 1998.2 1991. col = "red") grid() } par(mfrow = c(1.2 1989.na(window(tana.6 1983.0 Time Annual total: 1474. start = i. ylab = "mm".2 1982.2 1999.0 Time Annual total: 1314. y = ymax.0 20 40 60 1983.0 Time Annual total: 1429.

033 1999.ppt.na(tana. ... ylab = "mm". > sum(is.ppt.na. "na.ppt. + sub = "continuous record") Lake Tana rainfall 120 mm 0 20 40 60 80 100 2000 2002 Time continuous record 2004 2006 Q20 : What is the extent of the contiguous time series? Jump to A20 • 20 .997 > tail(cycle(tana. for some analyses complete series are needed.036 > head(cycle(tana..action")=Class 'omit' int [1:6578] 1 2 3 4 5 6 7 8 9 10 .027 1999.c)) [1] 1999.022 1999.c)) [1] 360 361 362 363 364 365 > sum(is.ppt.c <.ppt.ppt. > frequency(tana.ppt.992 2006.na(tana..9 0 0 . main = "Lake Tana rainfall".030 1999.Q19 : Does there seem to be a seasonal difference in rainfall? Is it consistent year-to-year? Jump to A19 • This series has missing values..c)) [1] 0 > plot(tana.ppt) > str(tana.995 2006.contiguous(tana.contiguous function finds the longest contiguous sequence of values: > str(tana.c)) [1] 2006..ppt.c) [1] 365 > head(time(tana.ppt.ppt)) [1] 127 > tana.c)) [1] 9 10 11 12 13 14 > tail(time(tana.ppt) Time-Series [1:9490] from 1981 to 2007: 0 0 0 0 . The na.984 2006.989 2006.c.986 2006.attr(*.c) Time-Series [1:2912] from 1999 to 2007: 0 7.025 1999.

so here deltat is fractional years: 1/12 = 0.3 Answers A1 : Among the questions might be: ˆ Is the the groundwater level increasing or decreasing over time? ˆ Is any trend consistent over the measurement period? ˆ We expect an annual cycle corresponding to the rainy and dry seasons. Return to Q4 • A5 : frequency is the number of measurements within each cycle. significant recharge 1975 – 1978. 3. 21 . here this is the month of the year. here the depth to groundwater. depth is increasing so the groundwater is further from the surface. but this is not seen in 1975–1978.Gap filling is discussed in §7. Severe draw-down in 1988. Return to Q3 • A4 : print shows the actual value of each measurements. with rapid recharge. 4. 2. 2. We have to know from the metadata what it represents. is this in fact observed? What is the lag of this cycle compared to the rainfall cycle? ˆ Is the amplitude of the cycle constant over the measurement period or does it change year-to-year? ˆ Are the trends and cycles the same for both wells? ˆ A practical issue: if measurements are missing from one of the time series. it is just a list of numbers. (3) some combination of these? Return to Q1 • A2 : No. Generally increasing trend over time since about 1983 and lasting till 2002. here there are 12 months in one year. time gives the fractional year.83 Return to Q5 • A6 : 1. can it be “reliably” estimated from (1) other observations in the same series. terms of a single cycle. cycle shows the position of each measurement in its cycle. (2) observations in the other series. But. that is. deltat is the interval between successive measurements in ¯. Clear yearly cycle. Return to Q2 • A3 : The measurements are a 360-element vector.

from January 1990 to January 1991 the groundwater depth increased by . recharge begins in September. the recharge does not begin until March 1988. which seems to be the beginning of 22 . 1990 (January through May) was much higher than the difference from 1992 vs. recharge) in the fall and winter. The third differences are the change in two-year differences. March to April) should be the same in each year. extraction) in the spring and summer (extraction) and negative (decreasing groundwater depth. For example. so that.e. In this case the second year’s differences were less than the first. for example the recharge in 1991 vs.e. for example. Return to Q10 • A11 : The first differences are the year-to-year differences in the same month. 3. the groundwater depth didn’t increase as much in the second year (January to January). whereas in the other years it begins in August. i. compared to the difference in the previous year (January 1990 to January 1991) is -0. This in fact is observed. the one-year lag can only begin in January 1991 (difference with January 1990). the groundwater is closest to the surface in April and then decreases rapidly until August. 2. i. Return to Q7 • A8 : The series gets shorter. the differences given are for the end date of each lag.e. i. Return to Q12 • A13 : No. Not only is the magnitude of extraction and recharge more. compared to the first. The amplitude of the annual cycle increased until about 1999 and seems to be stable since then. four times as much groundwater lowering than any other. There is one extreme extraction event. Note the small recharge in September 1988.509999999999991 . for example the difference from January 1991 to January 1992. Return to Q9 • A10 : If there is no variability in the recharge or extraction the monthly differences (for example. The annual cycle is clear: month-to-month differences are positive (increasing groundwater depth. This is not observed. Return to Q11 • A12 : 1.Return to Q6 • A7 : It is annual. Return to Q8 • A9 : Differences between months should reflect the annual cycle: negative in the fall and winter (recharge) and positive in the spring summer (extraction). 1991. whereas the difference in the next year (January 1991 to January 1992) was The second differences are the change in difference from one year to the next.

or perhaps measured in a different way. Return to Q16 • A17 : "" (empty string) means there is no data at all. Return to Q14 • A15 : ˆ Is there an annual cycle in rainfall amount? ˆ If so. so it is unlikely to be a data entry error. from 1 .1. how many “rainy seasons” are there? ˆ How variable is the rainfall year-to-year? ˆ Is there a trend over time in the rainfall amount. no day in the month (here. however it is only entered for the first day of each year (upper-left of the sheet). Return to Q18 • A19 : There is a definite seasonality: rains from about May to about August and the other months dry. which seems to be an inconsistent attempt to record the trace amount. as does 0. and "TR" means trace rainfall. . how long is the auto-correlation period? ˆ What are probable values for missing measurements. But also March 1988 would have to had extreme rainfall.0. so the precision is 0. perhaps because of failure of winter rains. .1. Return to Q13 • A14 : To explain this there would have to have been continued extraction in the winter of 1987–1988.e.the recharge (fall and winter rains). they are consistent month-to-month in this period. either overall or in a specific season? ˆ How consistent is rainfall day-to-day within the dry and rainy seasons? In other words. Rains can begin early (1993) or there can be some sporadic 23 . with this being corrected in March 1988. 31 JAN . DATE is the day of the month (all months). 29 – 31 February). 0 means no rainfall. i. Another possibility is that the records for winter 1987–1988 were incorrectly entered into the data file. However. and what is the uncertainty of these predictions? Return to Q15 • A16 : YEAR is the year of measurement. we do not have the rainfall data to see if this is possible. January does have a 0.01 value. also the minimum measurement for February was 0. but then this is interrupted by six months of extraction during the (wet) winter. Return to Q17 • A18 : Only one decimal place is given. DEC are the rainfall amounts for the given day of the named month..

less in 1990). Return to Q19 • A20 : From the 9th day of 1999 through the end of 2006.rains before the true start of the rainy season (1992. Return to Q20 • 24 . almost eight years.e. i.

.9 .8 34.. from 1 . year = as. Both of these are converted from time-series to numbers with the as.f$gw.90 34. the summary function (specified with the FUN argument) is applied to each year with the by function.f$year. 1st Qu..3 3. Recall the time function returns the time of observation (here. with the IND “index” argument being the grouping factor: > by(gw. 3.f) 'data. for each year separately. $ cycle: num 1 2 3 4 5 6 7 8 9 10 .1.. 29.frame(gw. time = time(gw)) > str(gw. + cycle = as. for the whole series.data.4 34.68 More useful is a summary grouped by some attribute of the time series...7 34. FUN = summary) 25 . Task 19 : Summarize the groundwater levels of the first well. 57.frame': 360 obs. one column being the time series and the other two factors giving the year and cycle. IND = gw. these are converted to year number with the floor function.frequency(series)..89 Median 41. We also keep the fractional time axis.numeric(floor(time(gw))). as field time. the cycle function returns the position in the cycle. of 4 variables: $ gw : Time-Series from 1975 to 2005: 34.63 Mean 3rd Qu. • The summary function gives the overall distribution: > summary(gw) Min. Now the year can be used as a grouping factor.numeric function. • The time-series data structure (one long vector with attributes) is not suitable for grouping by year or position in cycle..80 Max.numeric(cycle(gw)). Similarly.1 Analysis of a single time series Summaries Summary descriptive statistics and graphics show the overall behaviour of a time series.f <. as fractional years). typically cycle or position in the cycle.58 46. 41.1 Numerical summaries Task 18 : Summarize the groundwater levels of the first well... $ year : num 1975 1975 1975 1975 1975 . $ time : Time-Series from 1975 to 2005: 1975 1975 1975 1975 1975 .5 34. We create a data frame. > gw.

38.20 35.22 33. 31.f$year: 1979 Min.50 32.94 31.42 30.90 30.05 42.34 30.f$year: 1988 Min. Max. Max. Max. Median Mean 3rd Qu.81 32.f$year: 1984 Min. 30.22 35.74 40. 1st Qu.78 26 .73 45.73 34.39 36.f$year: 1985 Min.29 31. Max. 34. Max.61 31. Max.78 35. 1st Qu.69 36. Max.42 40.40 ---------------------------------------------------gw.84 35. Max.06 42.72 35.47 31. 39.99 ---------------------------------------------------gw. 1st Qu.43 41.76 34. 30.58 ---------------------------------------------------gw.09 41.35 38.74 33. 1st Qu.77 36. Median Mean 3rd Qu. Median Mean 3rd Qu.53 31.51 37. 39.92 32. Median Mean 3rd Qu.90 33.37 32.18 35.10 42. 1st Qu. Median Mean 3rd Qu.33 ---------------------------------------------------gw. 1st Qu. 30.92 32.45 38. Max.16 33.76 36. Median Mean 3rd Qu. Median Mean 3rd Qu. 1st Qu. Max. Median Mean 3rd Qu. 34.62 35.61 31.15 31. Median Mean 3rd Qu.28 39.20 ---------------------------------------------------gw.f$year: 1983 Min. Median Mean 3rd Qu.96 ---------------------------------------------------gw. Max. 1st Qu. 1st Qu. 1st Qu.30 31.f$year: 1977 Min.79 46.40 30.76 30.70 31.f$year: 1980 Min. Median Mean 3rd Qu.60 31.68 30.99 34.36 34.45 32.50 35. 33. Median Mean 3rd Qu.f$year: 1987 Min. 1st Qu.97 ---------------------------------------------------gw.07 42.f$year: 1981 Min. 1st Qu. 29. Max. 1st Qu. Max.56 31.71 38.36 ---------------------------------------------------gw. Median Mean 3rd Qu.04 42.53 ---------------------------------------------------gw.f$year: 1986 Min.16 ---------------------------------------------------gw.08 30.f$year: 1976 Min.16 40.86 ---------------------------------------------------gw.44 40.41 30.54 34.28 32.51 ---------------------------------------------------gw.f$year: 1975 Min. 30. 36. 32.53 31. 1st Qu. Max.43 40.f$year: 1982 Min.34 34.68 38.f$year: 1978 Min. Median Mean 3rd Qu.gw.04 ---------------------------------------------------gw.09 43.

28 42. 46. Median Mean 3rd Qu.23 ---------------------------------------------------gw.90 44.94 42.07 39.95 55.11 40.55 44. 1st Qu. Max. 45. 42.67 40.19 50.f$year: 1993 Min. 49.99 47.59 ---------------------------------------------------gw.62 46.33 47. 1st Qu.74 49.30 49.53 46. 41.49 52. Max. Max. 1st Qu.86 52.85 47.39 ---------------------------------------------------gw.66 42.66 43. Max. Max.16 42. Median Mean 3rd Qu.30 42. Max.78 41.52 44. Max. 1st Qu. Max.62 51. Median Mean 3rd Qu.00 45.31 45.20 47.96 45.72 ---------------------------------------------------gw. Max.42 ---------------------------------------------------gw.93 ---------------------------------------------------gw.57 43.46 49.84 47. 1st Qu. 1st Qu.06 52.87 47.f$year: 1994 Min. Median Mean 3rd Qu.17 49.f$year: 2002 Min.63 50.88 48.89 47.66 41. 27 .01 43.52 41.09 ---------------------------------------------------gw. Median Mean 3rd Qu. Median Mean 3rd Qu. Max.f$year: 1995 Min. Median Mean 3rd Qu.49 56. Median Mean 3rd Qu.f$year: 1996 Min.f$year: 1997 Min.18 45.f$year: 1989 Min.f$year: 1998 Min.86 ---------------------------------------------------gw.18 ---------------------------------------------------gw.80 ---------------------------------------------------gw. 39.78 48. Max. 44.99 45. Median Mean 3rd Qu.38 46.f$year: 1999 Min.f$year: 2000 Min.30 42.75 49.50 47. Max.24 44.95 45.54 48. 1st Qu. Median Mean 3rd Qu.33 44.62 43.66 43.82 41. Max.46 43. Max.49 ---------------------------------------------------gw. 43. Median Mean 3rd Qu. 1st Qu.78 ---------------------------------------------------gw. Median Mean 3rd Qu. 40. 39.26 46. 1st Qu.02 45. 1st Qu.85 49.f$year: 1990 Min.21 41. 45.f$year: 1991 Min. 1st Qu.02 44.20 45.f$year: 1992 Min. Median Mean 3rd Qu. 1st Qu.86 44. 42. 1st Qu.f$year: 2001 Min.---------------------------------------------------gw. 40.11 ---------------------------------------------------gw.81 40. Median Mean 3rd Qu. 1st Qu.

66 52. in fact anything that summarizes a numeric vector.f$gw.08 Median 30. IND = gw. 30. groundwater level) on the right-hand side (here.52.73 54. quantile etc. often grouped by cycle number or position in the cycle. Task 20 : Display boxplots of the Anatolia well levels.9)[["1978"]] [1] 30.65 57.839 3.76 Max.9 > by(gw.71 52. year): 28 .97 > by(gw. min (the shallowest).f$gw. median.42 54.54 54.f$year: 2004 Min.41 30. FUN = summary)[["1978"]] Min. 30. IND = gw.f$year: 2003 Min.97 > by(gw. probs = 0. 1st Qu. FUN = quantile. FUN = min)[["1978"]] [1] 29.2 Graphical summaries Boxplots.f$gw.90 30.f$year.76 53. reveal the average behaviour of time series.72 56.f$year.f$year. • This is produced by the boxplot function. Median Mean 3rd Qu.34 54.52 ---------------------------------------------------gw. 51. 50.47 54.f$year. Max.68 ---------------------------------------------------gw. FUN = max)[["1978"]] [1] 30. grouped by year.1.59 55. IND = gw. Max.f$gw. 29.02 Note that the function applied could be max (to get the deepest level). IND = gw. A single year can be extracted with the [[]] list extraction operator: > by(gw. with the formula operator ~ showing the dependence of the left-hand side of the formula (here.08 53..52 57. 1st Qu. Median Mean 3rd Qu. 1st Qu.21 55.20 53.34 Mean 3rd Qu.

50333 41.10417 47.49667 52.99417 54.78500 [29] 53.36917 42.85417 31. grouped by position in the cycle (month). which is computed with the aggregate function (see below.50333 40.> boxplot(gw.2.59083 > time(ann.21333 31.56833 45.62000 32. + ylab = "groundwater level (m)") > grid() Anatolia well 1 groundwater level (m) 30 35 40 45 50 55 1975 1977 1979 1981 1983 1985 1987 1989 1991 1993 1995 1997 1999 2001 2003 This gives a clear impression of the per-year variability. • We add a field to the data frame that is the difference of each observation from its annual mean.aggregate(gw. FUN = mean)) Time Series: Start = 1975 End = 2004 Frequency = 1 [1] 35.89667 [15] 40.94750 31.80750 36.66000 44.mean) 29 .71833 34. §3. Q21 : Is there a trend in the per-year amplitude of the annual cycle of groundwater levels? Jump to A21 • Task 21 : Display boxplots of the Anatolia well levels.f$year. main = "Anatolia well 1". nfrequency = 1.17750 [8] 31.53667 34.89000 30.27750 42.74917 54.51417 42.f$gw ~ gw.46167 47.1 for details of this function): > (ann.09167 43.28667 42.41083 38.08583 45.mean <.78417 [22] 46. after correcting for the overall trend.51583 49.

using the match function to find the position in the vector of annual means that corresponds to the year of the observation: > match(gw.f$cycle.as.f$in.f) 'data..yr ~ gw.. $ year : num 1975 1975 1975 1975 1975 . ylab = "Deviation from annual + main = "Anatolia groundwater well 1") 30 .477 -0.mean)) [1] [22] [43] [64] [85] [106] [127] [148] [169] [190] [211] [232] [253] [274] [295] [316] [337] [358] 1 2 4 6 8 9 11 13 15 16 18 20 22 23 25 27 29 30 1 2 4 6 8 9 11 13 15 16 18 20 22 23 25 27 29 30 1 2 4 6 8 9 11 13 15 16 18 20 22 23 25 27 29 30 1 3 4 6 8 10 11 13 15 17 18 20 22 24 25 27 29 1 3 4 6 8 10 11 13 15 17 18 20 22 24 25 27 29 1 3 4 6 8 10 11 13 15 17 18 20 22 24 25 27 29 1 3 5 6 8 10 12 13 15 17 19 20 22 24 26 27 29 1 3 5 6 8 10 12 13 15 17 19 20 22 24 26 27 29 1 3 5 6 8 10 12 13 15 17 19 20 22 24 26 27 29 1 3 5 7 8 10 12 14 15 17 19 21 22 24 26 28 29 1 3 5 7 8 10 12 14 15 17 19 21 22 24 26 28 29 1 3 5 7 8 10 12 14 15 17 19 21 22 24 26 28 29 2 3 5 7 9 10 12 14 16 17 19 21 23 24 26 28 30 2 3 5 7 9 10 12 14 16 17 19 21 23 24 26 28 30 2 3 5 7 9 10 12 14 16 17 19 21 23 24 26 28 30 2 4 5 7 9 11 12 14 16 18 19 21 23 25 26 28 30 2 4 5 7 9 11 12 14 16 18 19 21 23 25 26 28 30 2 4 5 7 9 11 12 14 16 18 19 21 23 25 26 28 30 2 4 6 7 9 11 13 14 16 18 20 21 23 25 27 28 30 2 4 6 7 9 11 13 14 16 18 20 21 23 25 27 28 30 2 4 6 7 9 11 13 14 16 18 20 21 23 25 27 28 30 > gw. $ time : Time-Series from 1975 to 2005: 1975 1975 1975 1975 1975 .f$in..yr <.4 34.5 34.f$year..ann. Now we can display the grouped boxplot: > boxplot(gw.. xlab = "Month".818 -0.8 34.... $ in..yr: num -0. + time(ann. of 5 variables: $ gw : Time-Series from 1975 to 2005: 34.mean[match(gw. time(ann.297 .frame': 360 obs.727 -0. $ cycle: num 1 2 3 4 5 6 7 8 9 10 .7 34.378 -0.f$year.Time Series: Start = 1975 End = 2004 Frequency = 1 [1] 1975 1976 1977 1978 1979 1980 1981 1982 1983 1984 1985 1986 1987 [14] 1988 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 [27] 2001 2002 2003 2004 We subtract the correct annual mean from each observation.mean))]) > str(gw.9 ..numeric(gw .

. For example. this is called smoothing or low-pass filtering. Task 22 : Convert the monthly time series of well levels from 1980 through 1985 into a quarterly series of mean well levels for the quarter. 3. + FUN = mean) > str(gw. The aggregate function changes the frequency of the series. This has arguments: 1.7 .2.. 2. A major use of filtering is to more easily detect an overall trend. or deviations from the overall trend that are of longer duration than one time-series interval. or median. nfrequency = 4. min.q <. 31 . 1986). the new number of observations per unit of time.Anatolia groundwater well 1 q q 4 q q q q Deviation from annual mean (m) 0 2 q q q −2 q q −4 1 2 3 4 5 6 Month 7 8 9 10 11 12 Q22 : Is there an annual cycle? Are all the months equally variable? Are there exceptions to the pattern? Jump to A22 • 3. the function to apply when aggregating. 1980. a quarterly summary of an annual time series would specify nfrequency=4.q) Time-Series [1:24] from 1980 to 1986: 32. FUN (by default sum). • The aggregation function here must could be mean.aggregate(window(gw. the default sum has no physical meaning. depending on objective.2 31.1 Aggregation Sometimes we want to view the series at coarser intervals than presented. Other reasonable choices.2 Smoothing Often we are interested in the behaviour of a time series with short-term variations removed. > gw.6 31. this must be a divisor of the original frequency. would be max. nfrequency (by default 1).7 33 32.

monthly".2. which is a vector of coefficients to provide the weights. . 1986). 2)) plot(gw. 2p + 1 is called the order of the filter. and the following p measurements).2 Smoothing by filtering A simple way to smooth the series is to apply a linear filter. t = p + 1 . quarterl type = "b") plot(window(gw. using the filter function. in reverse time order. type = "b") par(mfrow = c(1. quarterly Anatolia Well 1. n − p (1) The weights wj must sum to 1. main = "Anatolia Well 1. because it can not be computed at the p first and last measurements: p st = j =−p wj yt +j . ylab = "Depth to water table (m)". Note that filtering shortens the time series. main = "Anatolia Well 1. Task 23 : Filter the series with a symmetric seven-month filter that gives full weight to the measurement month. three-quarters weight to adjacent 32 .q. the 1 original measurement. . generally w−j = w+j (symmetry) but this is not required. By default this uses a moving average of values on both sides (before and after) each measurement. The moving average for a given measurement is a weighted sum of 2p + 1 measurements (the preceding p . 1)) Anatolia Well 1. ylab = "Depth to water table (m)".> > + > + > par(mfrow = c(1. The method requires a filter. monthly 40 q q q q q q q q q q 38 40 38 q Depth to water table (m) Depth to water table (m) q q q q qq q q q q q q q q q q q q q 36 q q q q q q q q q q q 34 34 36 q q q q q q 32 q q q q q q q q q qq q q q q q q q q q q q q q q q qq q q q q q q 32 q q qq q q q q q q q q q q q q 1980 1981 1982 1983 Time 1984 1985 1980 1981 1982 1983 Time 1984 1985 1986 Q23 : What are the differences between the monthly and quarterly series? How much resolution is lost? Jump to A23 • 3. 1980.

1875 0.ts(gw. also compute an annual filter (12 months equally weighted).2 <. lty = 2) lines(fgw.filter(gw. "1/4. + ylab = "groundwater depth (m)") > lines(fgw. half weight to months two removed. rep(1. "annual filter".1250 0. col = "red".1. • > > > > > > > > > > fgw. col = "blue". pos = 2) text(1995. 7−month filter groundwater depth (m) 30 1975 35 40 45 50 55 1980 1985 1990 Time 1995 2000 2005 Q24 : What is the effect of this filter? Jump to A24 • Task 24 : Repeat the seven-month filter with equal weights for each month. col = "blue".1875 0. main = "Anatolia Well 1.c(0.0625 0.75.1 filter". pos = 2) 33 . ylim = c(37.filter(gw. xlim = c(1990. 48). col = "magenta".3. pos = 2) text(1995. col = "red") Anatolia Well 1. • > k <. sides = 2. 12)/12) plot.75. and quarter weight to months three removed. rep(1.0625 > fgw <.2.25.5.filter(gw. k) > plot. ylab = "groundwater depth ( title(main = "Anatolia Well 1. sides = 2. col = "red") lines(fgw. 39.1250 0.1/2. three filters") lines(fgw. 0.1 filter". 7)/7) fgw. 0. 1995). 0. "1.k/sum(k)) [1] 0.months.25) > (k <. plot the three filtered series together for the period 1990–1995. 40. 0. 38.2500 0. sides = 2. 7-month filter".5. 1.3 <. col = "magenta") text(1995.ts(gw. 0.

annual filter". lwd = 1. three filters 48 groundwater depth (m) 40 42 44 46 1/4.1 filter 1.3.ts(gw.1/2. > plot. main = "Anatolia Well 1. + ylab = "groundwater depth (m)") > lines(fgw.1 filter 38 annual filter 1990 1991 1992 Time 1993 1994 1995 Q25 : What is the effect of giving equal weights to the measurements in the filter? Jump to A25 • Q26 : What is the effect of the annual filter? Jump to A26 • Task 25 : Plot the annual filter for the complete series.1.5) • 34 . col = "magenta".Anatolia Well 1.

col = "blue". col = "green") lines(lowess(gw. col = "red". col = "green". annual filter groundwater depth (m) 30 1975 35 40 45 50 55 1980 1985 1990 Time 1995 2000 2005 3. f = 1/3). the degree of smoothing is controlled by the size of the neighbourhood. which fits the data points locally. and two other values of the parameter • > > > > > + > + > + plot.3 Smoothing by local polynomial regression Another way to visualize the trend is to use the lowess “Local Polynomial Regression Fitting” method. "Smoothing parameter: 2/3 (default)".ts(gw. f = 1). 31. 33.Anatolia Well 1. "Smoothing parameter: 1/3". pos = 4) text(1990. col = "red") text(1990. pos = 4) 35 . pos = 4) text(1990. ylab = "groundwater depth (m)") lines(lowess(gw).2. "Smoothing parameter: 1". 35. These are weighted by their distance (in time) from the point to be smoothed. This results in a smooth curve. col = "blue") lines(lowess(gw. Task 26 : Display the time series and its smoothed series for the default smoothing parameter (2/3). using nearby (in time) points. main = "Anatolia well 1".

Anatolia well 1

groundwater depth (m)

35

40

45

50

55

Smoothing parameter: 2/3 (default) Smoothing parameter: 1 Smoothing parameter: 1/3

30 1975

1980

1985

1990 Time

1995

2000

2005

Q27 : What is the effect of the smoothing parameter? Which seems to give the most useful summary of this series? Jump to A27 • 3.3 Decomposition Many time series can be decomposed into three parts: 1. A trend; 2. A cycle after accounting for the trend; 3. A residual, also known as noise, after accounting for any trend and cycle. These correspond to three processes: 1. A long-term process that operates over the time spanned by the series; 2. A cyclic process that operates within each cycle; 3. A local process which causes variability between cycles. Each of these processes is of interest and should be explained by the analyst.

Task 27 : Decompose the groundwater level time series.

The workhorse function for decomposition is stl “Seasonal Decomposition of Time Series by Loess”, i.e. using the same trend removal as the lowess function used above. This has one required argument, s.windowstl, which 36

is the (odd) number of lags for the loess window for seasonal extraction; for series that are already defined to be cyclic (as here), this can be specified as s.windowstl="periodic", in which case the cycle is known from the attributes of the time series, extracted here with the frequency function:
> frequency(gw) [1] 12

Then, the mean of the cycle at each position is taken:
> tapply(gw, cycle(gw), mean) 1 2 3 4 5 6 7 41.01700 40.59800 40.25900 39.82733 40.38933 41.22433 42.07233 8 9 10 11 12 43.03767 43.30633 42.93767 42.33233 41.96433

This is subtracted from each value, leaving just the non-seasonal component. Here we show two years’ adjustments numerically, and the whole series’ adjustment graphically:
> head(gw, 2 * frequency(gw)) [1] 34.36 34.45 34.70 34.80 34.88 35.16 35.60 35.86 35.86 35.70 35.48 [12] 35.28 35.22 35.18 34.98 35.20 35.51 35.32 34.45 34.54 34.39 34.18 [23] 33.92 33.73 > head(gw - rep(tapply(gw, cycle(gw), mean), length(gw)/frequency(gw)), + 2 * frequency(gw)) [1] [7] [13] [19] > > > + > > -6.657000 -6.472333 -5.797000 -7.622333 -6.148000 -7.177667 -5.418000 -8.497667 -5.559000 -7.446333 -5.279000 -8.916333 -5.027333 -7.237667 -4.627333 -8.757667 -5.509333 -6.852333 -4.879333 -8.412333 -6.064333 -6.684333 -5.904333 -8.234333

par(mfrow = c(2, 1)) plot(gw, ylab = "depth to groundwater", main = "Original series") plot(gw - rep(tapply(gw, cycle(gw), mean), length(gw)/frequency(gw)), ylab = "difference from cycle mean", main = "Seasonally-corrected series") abline(h = 0, lty = 2) par(mfrow = c(2, 1))

37

Original series

depth to groundwater

30 1975

35

40

45

50

55

1980

1985

1990 Time

1995

2000

2005

Seasonally−corrected series
15 difference from cycle mean −10 1975 −5 0 5 10

1980

1985

1990 Time

1995

2000

2005

Q28 : What has changed in the numeric and graphical view of the time series, after adjustment for cycle means? Jump to A28 • This series, without the seasonal component, is then smoothed as follows. The stl function has another required argument, but with a default. This is t.windowstl, which is the span (in lags,not absolute time) of the loess window for trend extraction; this must also be odd. For the periodic series (s.windowstl="periodic"), the default is 1.5 times the cycle, rounded to the next odd integer, so here 12 ∗ 1.5 + 1 = 19, as is proven by the following example:
> gw.stl <- stl(gw, s.window = "periodic") > str(gw.stl) List of 8 $ time.series: mts [1:360, 1:3] -0.271 -0.75 -1.149 -1.635 -1.127 ... ..- attr(*, "dimnames")=List of 2 .. ..$ : NULL .. ..$ : chr [1:3] "seasonal" "trend" "remainder" ..- attr(*, "tsp")= num [1:3] 1975 2005 12

38

"class")= chr "stl" > tmp <. t..stl(gw.series[.attr(*.stl$time.. $ call : language stl(x = gw. which shows the original series and its decomposition on a single graph.. s.stl) 39 .gw.window = "periodic") $ win : Named num [1:3] 3601 19 13 .attr(*.window = "periodic"....attr(*.stl.attr(*..series[. s.. + "trend"]) [1] 0 > rm(tmp) Q29 : What is the structure of the decomposed series object? A29 • Jump to Task 28 : Display the decomposed series as a graph • The plot function specialises to plot.. "trend"] .window = 19) > unique(tmp$time.attr(*. "names")= chr [1:3] "s" "t" "l" $ deg : Named int [1:3] 0 1 1 .. > plot(gw.. "names")= chr [1:3] "s" "t" "l" $ inner : int 2 $ outer : int 0 . "class")= chr [1:2] "mts" "ts" $ weights : num [1:360] 1 1 1 1 1 1 1 1 1 1 . "names")= chr [1:3] "s" "t" "l" $ jump : Named num [1:3] 361 2 2 .

5 0. main = "Anatolia well 1. the components) on the same scale of a single graph. trend 55 Groundwater level (m) 30 35 40 45 50 1975 1980 1985 1990 Time 1995 2000 Another way to see the decomposition is with the ts. trend".5 −0.data 30 40 50 seasonal 30 35 40 45 50 55 trend remainder 1975 1980 1985 1990 1995 2000 2005 time The components of the decomposed series can be extracted.stl$time.5 1. for example to see just the trend: > plot(gw. "trend"].plot function. + ylab = "Groundwater level (m)") Anatolia well 1. this shows several time series (here.series[. thus visualizing the relative contribution of each component: 40 −2 2005 0 1 2 3 4 −1.5 .

with a two-year window for the seasonal component. average all Januarys). s. decomposition 10 20 30 40 50 Groundwater level (m) seasonal trend remainder 0 1975 1980 1985 1990 Time 1995 2000 2005 Q30 : What are the magnitudes of the three components? Which is contributing most to the observed changes over the time series? Jump to A30 • The decomposition with s. col = c("black".attributes(gw. s.stl <.plot(gw. main = "Anatolia well 1. "red"). decomposition". tmp[i]. For this.series.(i * 4). ylab = "Groundwater level (m)") tmp <. "blue".g. "red")[i]. "blue". This can not account for changes in cycle amplitude with time. 24 . col = c("black".> + + > > + > ts. Task 29 : Decompose the groundwater time series.stl) 41 .window = 2 * frequency(gw) + 1) > plot(gw.stl$time. • > gw.stl(gw. pos = 4) grid() Anatolia well 1.windowstl="periodic" determines the seasonal component with the average of each point in the cycle (e.series)$dimnames[[2]] for (i in 1:3) text(1995. as is observed here.stl$time.windowstl must be set to an odd number near to the cycle or some multiple of it (to average a few years).

3.window)))) so that for a 25-month seasonal window on a 12-month cycle of this example the trend window is the next higher odd number of (1. for s. by default this is: nextodd(ceiling((1. 21.windowstl="periodic" the smoothness parameter is one more than 1. see 3. Note: This is for the case when the period is explicitly given.5/25)) = 20.5/s. The smoothness of the trend depends on the analyst’s knowledge of the process.data 30 40 50 seasonal 30 35 40 45 50 55 trend remainder 1975 1980 1985 1990 1995 2000 2005 time Q31 : How does this decomposition differ from the pure periodic decomposition? Jump to A31 • The smoothness of the lowess fit is controlled with the t.5*period) / (1-(1.e. let’s see how it looks with a smoother trend. i.5 ∗ 12)/(1 − (1.windowstl argument. 42 −2 0 1 2 3 4 −2 −1 0 1 2 . The previous decomposition has a very rough trend.5 times the cycle length. for a finer trend decreased. For a smoother trend this should be increased.

window = 85) > plot(gw. rather. t. s. they are correlated. successive measurements are not independent. −2 0 2 4 6 −2 −1 0 1 2 43 .e.stl(gw.4 Serial autocorrelation In almost all time series.stl <. time periods before the given 6 The first-order summary is the expected value.Task 30 : Recompute and plot the decomposition of the time series with a smoother trend than the default. Since there is only one variable. i.stl) data 30 40 50 seasonal 35 40 45 50 55 trend remainder 1975 1980 1985 1990 1995 2000 2005 time Q32 : What is the difference between this smooth and the default decompositions? Which best represents the underlying process? Jump to A32 • 3.window = 25. • > gw. this is called autocorrelation and is an example of the second-order summary6 The first way to visualize this is to produce scatterplots of measurements compared to others with various lags. either constant mean or trend.

e. Task 31 : Display the auto-correlation scatter plots of groundwater levels for twelve lags (i. This is used also in the lag function. • > lag. up to one year of differences).e. which produces a lagged series with the same indices as the original series. main = "Anatolia well 1". lags = 12. i. cex = 0. 44 30 q qqq q qq qq q q q q q qq q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q qq q q q q qq q q q q q q q q q q q q q q q qq q q q q q q qq q q q qq qq q qq q q q q q q qq q q q q q q q q q q q q qq q q q qq q q q q q q qq q q q qq qq q q q q q q qq q q q q qq q qq qq qq q q q qq q q q q qq qq q q q q q q qq q qq q q qqqq q q q q q q q q qq q q q q qq q q q q q q q q q q qq q q q qq q q q q q q q q q q q q qq qq q q q q qq q q q qq q q qq qq q qqqqq qq qqq q q q q qq q qq qq qq q q q q qq q q q q qq q q q q q q q q q qq q q q q q q q q q q q q q q q qq q q qq q q q q q qqq q q q q q q q q qq q q qq q q q q q q q q q q q q q q q q q q q q q qq q q q q qq q q qq qq qq q q q q qq q qq qq q q qq q q q q q q q q q q q qq qq qq q q q q q q q q q q qq q q q qq q qq q qq qq qq q qq q q q qq q qq qq q q q q q q qq qq q q q qq q q q qq qq q qq qq q q q q q qq q q q q q qqq q qq q q q qq q q qq q q q q q q q q q qq qq q qq q q qq q q q q q q q q q q q qq qq q q q q qq q q qq qq qq q qq q q qq q qq q q q q qqq q q q qq q q q q q q q q qq q q q q q qq q qq qq q q q q q q q q q q q q q q q q gw gw gw q q q q q q q q qq q q qq qq q qq q q q q q q q q qq q q q q q q q qq q q q q q q qq qq q q qq qq q q q q q q q q q q q q qq q q q q q q q q q qqq qq q q q qq q q q qq q q q q q q q q q q q q q q 35 40 45 50 qq q q q qq q q 55 30 q q q q q q q q q q q q qq q q q q q q qq q q q q q q q q q q q q q q q qq qq q q qq q qq q q q q qq q qq q q q q qq q q q q q q q q q q q q q q qq qq q q q q q q q qq q q qq q q qqq q q q q q q q qq q qq q qqq q q qq qq qq q q q q q q q q q qq q q qq q q q qq q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q qqq qq qq qq q qqq q q q q q q q q qq q q q q qq q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q qq q qq q q qq q q q q q q q q q qq q q q q q q q qq q qq q q q q q q qq q qq q q q qq q q qq qq q q qq q q q q q q q q qq qq q qq q q q q q q q qq q q q q q q q qq q q q q q q q qq q qq q q q q q q q q q q q q q q qq q q q q qq q q q q qq q q q qq q q q q q qq q q q q q q q qq q q q q q q q q q q q qq q q q q q qq q q q q q q q q q q qq q q q q q q q q q q q q qq q q q q q q q qq q q q q q qq q q qq q q q q q q q qq q qq q q q q q q q q q q q q q qq q q q qq q q q q q q q q qq qq q q q q q q q q q qqq q q q q q qq q q q qq q q q q q q q q gw gw gw 35 40 45 50 q q q q 55 . lty = 1) Anatolia well 1 30 40 50 60 q q q q q q q q qq q q q q q q q q q q qq qq qq q q q qqq q qq qq qq qq q q q q q q q q qq q q qq q q q q q q q q q qq q qq q q q q qq qq q q q q q qq q qq q q qq q q qq q q qq q qq q q qq q q q q q qq qqq q q q q q q q qqq q q qq qq q qq q q q qq q q qq q qq q q q q q q q q qq q qq q q q q q q qq qq q q q q q q q q q q q qq q q q q q q qq q q q q q q q qq q q qq q q q q q qq qq q q q q qq q q q q q q q q q q 30 40 50 q q q qq q q q q q q q q q q q q q q qq q q q qq q q qq q q q qq q q q qq q q q q q q q q q q q qq q q q q q q q q 60 30 q q q q q qq q q q q q q q qq q q q q q q q q qq q qq q q q qqq q q q qq q qq q q q qq qq q q qq q q qq q q q q qq q q q q q q q q q q qq q q q q q q q q q q q q qq q qq q q q q q q q qq q q q q q q q qq q q q q q q q q q q q q q q qq q q q q q q q q q q q qq qq q q q qq qqq q q q q q q q q qq q q q qq q qq q qq q q q qq q qq qqq q q q q q q q q q q q q q q q q q q q q q q q q q q q qqq q q q q qq q q q qq q qq qq q q q q qq q q q q q q q q q q q q q qq q q q qq q q q q q q q q q q q qq q q q q q q q q q q q q q q q qq q q q q q q q q q q q q qq q q q q q q q q qqq q q qq q q q q q q q q q q q qq q 50 55 qq q q q q q q q q q q q q q qq q q q q q q q q qq q q q qq q q q qq q q q q q q qq q q q q q qq q q q qq q qq q qq qq qq q q q q qq q q qq q q q q q q q q q q q q q q q q q qqq q q q q qq q q q q qq q q q q qq q qq q q q q q q q qq q q q q q qq q q qq qq q q q q qq qq q q q q q qq q q q qq q q qq q q q qq q q q q qq qqq q q qqq q q q q q qq qq q q q qq q q q q q q q q q q q q q q qq q qq q q q qq q q q q q q qq qq q q q q qq q q q q q q qq q q qq q q q q q qq q q q q qq q q q q q q q qq q q q q qqq q q q q q q qq q q q q q q q qq q q qq q q q q q q q q q q q qq q q q q q q q q q q q q q qq qq q q qq q q q q q q qq q q q q qq q q q q q q q q q qq q q q q qq q q q q q q qq qq q q q qq qq q q q q qq q q qq q q qq q q q 45 gw gw 35 40 lag 1 q q qq q q q qq qq qq q q q q q q qq q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q qq q q q q q q q qq q qq q q q q q qq q q q q qqq q q q qq qq q q q q q q q q qq q q q q qq qq qq q q q q q q qqq q q q q qq q q q q q q q q q q q qq q q q q q q q q q q q q q q qq q q q q q q qq q q q q q q q q qq q q q q q qq q q q q q qq q qq q q q q q qq q q q qq q q q q lag 2 q q q qq q q qq q qq q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q qq q q q qq q q q q q q q q q q qq qq q q q q q q qq q q q q q q q q q q q q q q q q q q q qq q q q q qq q q q qq q q q q q q q q q qq q q qq qq qq q q qq q qq q q qq q q q q q q qq q q q qq q qq q q q q q q q q q q q q q q q q q q q qq q q q q q q q qq qq qqq q qq q q qq q q q q qq q q q q q qq qq qq q q q q qq q q q q gw lag 3 q q q qq q q qqq q q qq q q q q q q q q q q q qq q q q q q qq q q q q q q q lag 4 q q q q qq q q q q q q q q q q q q q q lag 5 q q q q q q q q q q q lag 6 q q q q q q q q q q q q q qq q q qq q qq q q q q q q q q q q q q q q q q qq q qq q q q q q q qq q q q q q q q q q qq q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q qq q q q q qq q q q q q q q q q q q q qq q q q qq q q q q q qq q q qq q q qqq q q q q q q q q q q q qq q qq q qq q q q qq qq q qq qq qq q qq q qq qq q q q q q qq q q qq q q qq q q qq q q q q q q q q qq q q qq q q q qq q q q q q q q qq q q qq q q q qq q q q q q q q qq q q q q qq q q q q q q q q q q q q q q q q q qq q q q q q q qq qq q q q q qq q q q q q q q q q q q q q q q qq q q q q q qq q q qq q q q qq q q q qq q q q qq q q q q q q q q q q q q q q q q 30 q q q qq q q qq q q qq q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q qq q q q q q q q qq q q q q q q q qq q q q q q q q q q q q q q qq q q qq q q qq q q q q qq q q q q q q qq q q q q q q q q q qq q q q q qq q q q q q q q q q qq q q q q q q q q q q q q q q qq qq q q qq q q q q q qq q qq q q q q q q q q q q q q q qq q q q q q q q q q q qqq q q q q q q q qq q q q qq q q q q q q q qq q q q q q q qq q q q q q q q qq q qq q qq q q q q q qqq q q qqq q q q q q q q q q q q q qq q qq qq q qq q q qq q q q q q q q q q q q q q qq q q q q q q q q q q q qq q q q qq q q qq q q q q q q q q q q q q q q q q q q qq q q qq q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q qq q q q q qq q qq q q q q q q q q q q q q q q q qq q q q q qq q q qq q q q qq q qq q q q q q q q q q q q q q q q q q q q q q q qq q q qq qq q q q q q qq q q q q q q qq q q q q q q q q q q qq q q q q q q q q qq qq q q q q qq q q q q q q qq q qq q q q q q qq q qq q q q q q q q q q q q q q q qq q q q q q q q q q q qq q q q q q qqq q q q q q q qq q qq qq q q q qq q q q q q q q q q q q q q q q qq q q q q q qq q qq q q q q q qq q q q q q q q qqq q qq q q q q qq q q q qq qq q q q q q q q q q q q q q q q q q q q q q q q 45 gw 50 55 gw 35 40 lag 7 q q q q qq q q q q q q q lag 8 q q q q q qq q q qq q q q qq q q q qq q q gw lag 9 q q q q q q q q q q q q q qq qq q qq q q q q q q q q q q q q q qq q q q q q qq q q q q qq q q q q q q q q q q q q qq q q q q q q q q q q q q q qq qq q q q q q q q q q q q q q q q q qq q qq qq q q q q q q q qq q q q q q q qq q q q qq q q q qq q q q q q q q q q q qq qqq q q q q q q q qq q q qq q qq q q q qq q q qq q q q q q q q q q qq q qqq q q q q q q q qq q qq q qq q q q qq q qq q q qq q qq q q qq q q qq q q q q q q q q q q q q q q qq lag 10 30 40 lag 11 50 60 lag 12 Note that positive values of the lags argument refer to lags before an observation.plot function produces this. + pch = 20. the series is not shifted. The lag.plot(gw.measurement.3.

39 2001 50.84 47.26 46.38 2001 50. • 45 .26 46.98 51.26 46.75 2000 50.07 47. 2001). Jump to A33 • By default if the time series is longer than 150 (as here).75 47. individual measurements are not labelled nor joined by lines.75 47.24 50. up to one year of differences) for 1990 through 1992.67 2001 > lag(window(gw.> window(gw.98 51.11 > lag(window(gw.11 Nov Dec 2000 52.84 47.73 48.42 Nov Dec 1999 48.11 Nov Dec 2000 51.07 47.73 48.39 2001 Q33 : Describe the evolution of auto-correlation as the lags increase. 2001).42 50. 2000.84 47.39 51.11 > lag(window(gw. For shorter series they are.71 52.84 47. 2) Jan Feb Mar Apr May Jun Jul Aug Sep Oct 1999 2000 47.98 51.75 47. 2001).24 50.67 50. 1) Jan Feb Mar Apr May Jun Jul Aug Sep Oct 1999 2000 47. 2001).38 52. 2000.42 2001 > lag(window(gw. Task 32 : Display the auto-correlation scatter plots of groundwater levels for twelve lags (i.67 Nov Dec 1999 48.73 48.e.73 48.11 Nov Dec 2000 52.38 52.26 46. 2000.39 51. -1) Jan Feb Mar Apr May Jun Jul Aug Sep Oct 2000 48.42 50. -2) Jan Feb Mar Apr May Jun Jul Aug Sep Oct 2000 48.98 51.07 2000 50.39 51.42 50.71 52.71 2001 51.24 50.67 50.84 47.26 46.38 52.71 52.71 52.67 50. 2000.73 48.98 51.24 50.38 52. 2001) Jan Feb Mar Apr May Jun Jul Aug Sep Oct 2000 48.07 47.24 50.75 47. 2000.07 47.

2500 1. start = 1990.906 0. start = 1990. plot = F)) Autocorrelations of series 'gw'.2500 0. start = 1990. + main = "Anatolia well 1") Anatolia well 1 38 45 40 42 44 21 32 22 9 34 10 23 11 25 12 29 26 13 17 14 15 27 16 5 2 3 1 28 7 31 33 8 19 20 46 21 32 33 22 9 34 10 23 11 24 35 29 26 13 17 14 15 27 28 16 5 2 1 3 7 31 8 19 20 38 40 42 21 32 33 22 8 9 34 10 23 11 24 25 12 7 31 20 44 46 40 41 42 43 44 window(gw.800 46 39 40 41 42 43 7 7 23 44 8 22 8 22 window(gw.0000 0.818 0.5000 1. start = 1990. lags = 12.5000 0.914 0.4167 1.807 0. start = 1990.1667 1.3333 0.854 0.3333 1.9167 1. end = 1993) window(gw. start = 1990. start = 1990.7500 1.5833 0. end = 1993) window(gw.891 0.988 0. By default acf produces a graph showing the correlation at each lag. start = 1990. start = 1990.898 0. start = 1990.6667 0.950 0.plot(window(gw. for the groundwater example this is 26. end = 1993) 8 19 9 22 23 11 30 18 12 29 26 17 28 16 5 2 3 1 27 15 14 13 7 10 23 11 18 12 29 26 17 27 28 16 5 2 3 4 1 15 14 13 7 10 23 11 6 24 25 6 25 24 6 18 25 24 12 26 17 28 27 15 16 5 2 3 4 1 14 13 39 4 lag 7 20 21 lag 8 20 21 lag 9 21 20 window(gw. end = 1993) 19 9 10 23 11 19 9 10 23 11 25 12 19 10 9 7 6 18 25 24 12 26 13 6 18 24 6 24 25 18 12 13 17 15 16 5 2 3 4 1 14 11 26 13 17 15 16 5 2 3 4 1 14 17 14 27 15 16 5 2 3 4 1 lag 10 38 40 lag 11 42 lag 12 44 46 Q34 : What additional information is provided by the labelled points and joining lines? Jump to A34 • Autocorrelation ranges from −1 (perfect negative correlation) through 0 (no correlation) through +1 (perfect positive correlation). The default number of lags to compute for a single series is 10 · log 10n . end = 1993) window(gw. by lag 0.970 0. start = 1990.1667 0. end = 1993) window(gw. end = 1993).000 0. where n is the lenth of the series. start = 1990. end = 1993) 19 36 24 35 6 30 18 25 12 6 30 18 6 30 29 18 13 14 26 17 27 15 28 16 5 1 2 3 39 4 4 4 lag 1 20 21 32 22 9 10 23 11 24 12 29 13 26 17 14 27 15 5 1 2 3 4 28 16 31 7 33 8 19 21 lag 2 20 32 22 9 10 23 11 7 8 19 21 20 lag 3 45 19 31 6 24 30 18 12 29 13 17 28 27 16 1 3 4 14 15 26 25 2 window(gw.903 0. > print(acf(gw. end = 1993) 22 8 45 39 40 5 5 41 42 25 30 6 18 43 44 22 8 . end = 1993) 8 9 7 19 10 22 8 19 9 22 window(gw.5833 0. end = 1993) 9 10 7 23 11 31 25 6 30 18 24 12 29 13 26 17 14 28 27 15 16 1 2 3 4 lag 4 45 20 21 20 lag 5 21 lag 6 20 21 40 41 42 43 44 window(gw.874 0. to see the actual values the result must be printed.4167 0. It is computed by acf “Autocorrelation” function.0833 0.834 0.0000 1.907 0.> lag.0833 1. end = 1993) window(gw.901 0. end = 1993) window(gw.8333 0.898 0.930 0.903 0. start = 1990.

lag.803 0.8333 1.0 0. Anatolia wel 47 .9167 2.2 0.803 0.0000 2.799 0.0833 0.797 0. main = "Autocorrelation.1.5 1.8 0. Task 33 : Display the autocorrelation of groundwater levels for five years.800 0.6 0.7500 1.0 Lag 1. groundwater levels.4 0. • > acf(gw. groundwater levels. Anatolia well 1 1.0 Q35 : Are successive observations correlated? Positively or negatively? How strong is the correlation? How does this change as the lag between observations increases? Jump to A35 • Clearly the autocorrelation holds for longer lags. groundwater levels.0 0.784 > acf(gw.0 ACF 0.6667 1. main = "Autocorrelation.max = 60. Anatolia well 1") Autocorrelation.5 2.

s.series (using the $ field extraction operator).stl$time. after trend and seasonal components have been removed.series[.0 ACF 0. Anatolia well 1 1. t.6 0.stl <. Task 34 : Display the autocorrelation of the remainder groundwater levels. • Again we use stl to decompose the series.stl(gw. groundwater levels.4 0. remainder from trend") abline(h = 0. col = "red") 48 . "remainder"] plot(gw. lty = 2. The remainder series is extracted from the result as field time. and saved as a separate object for convenience: > > > > > gw.window = 2 * frequency(gw). ylab = "Remainder (m)") title(main = "Anatolia well 1 level.0 0 0.2 0.window = 84) gw.Autocorrelation. using the smooth trend removal and a smoothing window of two years. column "remainder".gw.r <.8 1 2 Lag 3 4 5 Some of the autocorrelation can be explained by the trend and seasonal components.r.

4 0.398 0.Anatolia well 1 level.0000 0.397 > acf(gw.2 0.8 0.0 −0.056 1. main = "Autocorrelation of remainders.2 0.169 -0.0833 0.314 0. remainder from trend Remainder (m) −2 1975 0 2 4 6 1980 1985 1990 Time 1995 2000 2005 > print(acf(gw.1667 0.7500 0.5 1.357 0.659 1.4167 1.193 1.5000 0.2500 0.6667 0.3333 0.9167 -0.887 0.4167 0.231 1.058 -0.222 -0.370 0.5833 -0. Anatolia well 1") Autocorrelation of remainders. Anatolia well 1 1.5 2.468 0.0 The blue dashed lines show correlations that are not provably different from 49 .0 0. plot = F)) Autocorrelations of series 'gw.111 -0.r.098 1.271 2.147 1.0 ACF −0.2500 1.r.769 1.6 0.r'.385 0.270 0.0000 -0.9167 0.4 0.6667 -0.0000 1.5833 0.8333 0.000 0.8333 -0.559 1.0833 -0.0833 0.5000 1.345 0. by lag 0.324 0.0 Lag 1.3333 1.7500 -0.001 2.1667 0.

8333 0.6667 -0. and interpret in terms of the processes causing the groundwater level to vary.4167 1.151 0. For lag k.0000 -0.019 0.0833 0.081 0.2500 -0. the partial autocorrelation is the autocorrelation between zt and zt +k with the linear dependence of zt on zt +1 .159 0.5833 0. The partial and ordinary autocorrelations for lag 1 are the same.049 0.3333 1.029 0.021 0.0833 -0.061 0. .124 1.7500 0.242 1.7500 0. This gives the correlation between measurements and their lagged measurements that is not explained by the correlations with a shorter lag.064 2. Task 35 : Compute and display the partial autocorrelation of the groundwater levels.010 0.123 0.047 1.033 1.0833 -0. Note: The partial autocorrelation function is used to estimate the order of ARIMA models (§4.5 Partial autocorrelation A more subtle concept is partial autocorrelation of a time series. main = "Partial autocorrelation.5000 1. Jump to A36 • 3.988 0.4167 0.9167 -0.3333 0.5833 1.8333 0. Q36 : Describe the autocorrelation of the remainders.085 > pacf(gw.zero.127 0.053 0. plot = F)) Partial autocorrelations of series 'gw'.6667 0.128 0.2500 1.9167 -0.038 0.0000 -0.086 2.5000 0.014 1. • These are produced by the pacf “partial autocorrelation” function.1667 -0. zt +k−1 removed. Anatolia well 1") 50 .179 1.070 1.1667 -0.3). > print(pacf(gw. . by lag 0.

7500 -0.014 0.Partial autocorrelation.3333 1.6667 -0.4 0. > print(pacf(gw.091 -0.058 -0.0 Partial ACF −0.028 1.042 -0.2 0.5833 0.6 0.084 0.5 2.021 0. plot = F)) Partial autocorrelations of series 'gw.8333 0.1667 -0.9167 -0.0833 0. Q37 : What are the partial autocorrelations that are different from zero? Jump to A37 • Task 36 : Display the autocorrelation of the remainder groundwater levels.0000 -0.016 0.0 0.9167 -0.r'.046 -0.2500 -0.004 1.121 0.103 2.057 > pacf(gw.062 2.032 1.2500 -0.5833 1. by lag 0.6667 0.5 1.022 1.067 0.064 1. after trend and seasonal components have been removed. Anatolia well 1 1. • The required remainders were computed in the previous setion.031 -0.015 0.8 0. main = "Parial autocorrelation of remainders") 51 .031 -0.4167 1.r.0 Lag 1.8333 -0.0833 -0.5000 0.2 0.5000 1.3333 -0. the blue dashed lines show correlations that are not provably different from zero.0 As with the graph produced by acf.082 1. using the smooth trend removal and a smoothing window of two years.4167 -0.0833 0.887 0.1667 -0.043 1.0000 -0.r.025 1.7500 0.

ω = 12 is a monthly frequency.6 0. The frequency ω is defined as the number of divisions of one cycle.0 Lag 1. with trend removed and auto-covariance not dependent on position in the series) can be decomposed into sums of sines and cosines with increasing frequencies. 52 . importance of the component) at each frequency.8 0. By the Nyquist-Shannon sampling theorem. that any second-order stationary series (i. So for a one-year cycle (as in the groundwater levels). The periodogram also reveals the relative density (roughly.2 0.0 0. This is based on the theory..0 Q38 : What are the partial autocorrelations that are different from zero? How does this differ from the partial autocorrelations of the original series? What is the interpretation? Jump to A38 • 3. a function is completely determined by sampling at a rate of 1/2ω.g.5 1. In some cases we can guess these already (e. but for some series we do not know a priori. annual cycles for the groundwater wells). The theoretical decomposition of the covariance sequence γt into spectral densities f is: γt = 1 2π +1/2 −1/2 e2π iωt f (2π ωf )dωf where ωf is the frequency expressed as cycles per unit of time.e.5 2. so that if we have a time series with n samples per cycle we can estimate spectral densities for n/2 frequencies.4 0. each of varying amplitude or “power”.Parial autocorrelation of remainders Partial ACF 0. due to Fourier.6 Spectral analysis Another second-order summary is provided by spectral analysis.

. "class")= chr "tskernel" $ df : num 14. which give half-weight to end values. $ coh : NULL $ phase : NULL $ kernel :List of 2 .spectrum(gw. By default it removes any linear trend.. One period per cycle is annual.. so it is usually smoothed with socalled Daniell windows. the inverse of the frequency.1 0.used : int 360 $ orig. spans = c(5... The window width is specified with the spans optional argument.0417 . "name")= chr "mDaniell(2.$ m : int 5 .43 5. which relates the density to frequency.. . . .$ coef: num [1:6] 0. The raw spectrum is too noisy to interpret. optional values are trial-and-error.e. since the function is symmetric. + ω/2.This can be inverted to give the density at each frequency: ∞ f (ω) = γ0 1 + 2 i ρt cos(ωt) where γ0 is the overall covariance. 7)) > grid() > str(s) List of 16 $ freq : num [1:180] 0. the frequency is 12 and so the decomposition is from 0 .0333 0.0667 0. until the main features of the periodogram are revealed.. 6 periods per cycle. In the groundwater example.62 .attr(*.n : int 360 53 . i.1333 0.125 0.. Task 37 : Compute and display a smoothed periodogram of the groundwater levels.92 6.attr(*.1562 0. and it presents the spectrum on a logarithmic scale.3 $ bandwidth: num 0.0833 0. .0726 $ n.5 7.. . • > s <. 6 periods is bi-monthly. . The spectrum is scaled by 1/ω. so that the spectral density is computed over the range −ω/2 . it tapers the series at its ends.1667 . The spectral density is estimated by the periodogram.. The periodogram at a given frequency ω is defined as the squared correlation between the time series X and the sine/cosine waves at that frequency: 2 1 I(ω) = n e t −iωt Xt The spectrum function computes and graphs the power spectrum.28 6. on the default logarithmic scale. $ spec : num [1:180] 7...1667 0. it is displayed for 0 . + ω/2.3)" ..

3133122 0. the inverse of the declared frequency.$ $ $ $ $ $ $ - series : chr "x" snames : NULL method : chr "Smoothed Periodogram" taper : num 0.e.4911276 [7] 3. 360/2 = 180). “2” is half a cycle.4268515 5. so the finest period is ω/2. “3” is one-third of a cycle.2767232 6. here 12/2 = 6.6206158 Series: x Smoothed Periodogram 5e+00 spectrum 5e−03 0 5e−02 5e−01 1 2 3 4 5 6 frequency bandwidth = 0. n = 12) [1] 7. For example “1” is a full period (one cycle). 360 observations). So in this example with ω = 12. half-year).6236966 4.3260429 2. The blue vertical line (upper right corner) gives the 95% confidence interval that the reported density is different from zero. The x-axis of the spectrum shows the period.9218806 6. the spectral density at x = 1 is for one cycle (12 months. The resolution of the decomposition is determined by the length of the time series (here. and this is estimated by the number of total cycles in the series (here. "class")= chr "spec" > head(s$spec.4972852 7. these are the harmonics..4309815 1.9278068 0.1 pad : num 0 detrend : logi TRUE demean : logi FALSE attr(*.0726 Note that the spectrum is displayed on logarithmic scale in the plot. the resulting spectral decomposition is half this length (here. etc. i. each period divided into total cycles: > frequency(gw) [1] 12 > length(gw)/frequency(gw) 54 . one year) and the density at x = 2 is for a half-cycle (6 months.8019706 1. 360/12 = 30 annual cycles).

2258021 0.46666667 0.30000000 0.2987287 0.90000000 0..sort(s$spec.5936610 0..50000000 0.8599072 0.2668042 0.66666667 0.06666667 0.13333333 0.76666667 0.13333333 0.36666667 0.10000000 0.93333333 1.06666667 2.3993403 0.9218806 2.1789095 > ss$ix[hi] [1] 1 2 3 4 5 6 7 30 29 31 8 28 32 9 27 33 10 26 11 34 12 25 [23] 13 24 35 23 22 14 21 15 20 19 16 18 17 60 61 59 36 > sort(s$freq[ss$ix[hi]]) [1] [7] [13] [19] [25] [31] [37] 0.15 > which(hi) [1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 [23] 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 > ss$x[hi] [1] [7] [13] [19] [25] [31] [37] 7.ss$x > 0.80000000 1.06666667 0.5 7.23333333 0.53333333 0.20000000 0.63333333 0.1557876 6.4911276 2.86666667 0.9278068 0.3620958 0.16666667 0. > ss <. $ ix: int [1:180] 1 2 3 4 5 6 7 30 29 31 .96666667 1.1699968 7.26666667 0.03333333 1.70000000 0.36666667 0.20000000 55 .43333333 0.60000000 0.6526631 1.00000000 0.40000000 0.8008863 1.6311885 1.40000000 0.4268515 2.43 5.3133122 0.70000000 0.33333333 0.83333333 0.3351769 0.[1] 30 > head(s$freq.00000000 1.03333333 0.93333333 0.10000000 0.1528798 0.6236966 2.56666667 0. > hi <.90000000 1.1987844 0.26666667 0.2600752 0.62 .5100103 0.33333333 0.2767232 2.5479999 0.20000000 0.50000000 0.86666667 1.60000000 0.76666667 0.3055965 0.return = T) > str(ss) List of 2 $ x : num [1:180] 7.3260429 2.73333333 0.03333333 0.16666667 0.96666667 0.8019706 0.56666667 0.73333333 0. and grouping nearby peaks.3602574 0..46666667 0.63333333 0.2248440 0.92 6.03333333 0.6206158 0.2041092 5.4309815 1.66666667 0.16666667 0.30000000 0.53333333 0.96666667 0.23333333 0. n = length(gw)/frequency(gw)) [1] [7] [13] [19] [25] 0. using the sort function with the optional index.4972852 3.return argument to save the positions of each sorted element in the original vector. decreasing = T.13333333 0.28 6.1681654 6.80000000 1.10000000 2.3815691 0..2024376 4. index.43333333 0.9948927 0.83333333 1.00000000 Q39 : At what frequencies are the largest periodic components of the Fourier decomposition? What is the interpretation? Jump to A39 • We find the largest components by sorting the spectrum.

1)) Anatolia groundwater level 5e+00 Anatolia groundwater level 10 Anatolia groundwater level 6 5e−01 spectrum (dB) 0 1 2 3 4 5 6 spectrum spectrum 4 5e−02 0 1 2 3 4 5 6 5e−03 0 −20 0 −15 2 −10 −5 0 5 1 2 3 4 5 6 frequency annual cycle.spectrum(gw. cycle and remainder. and decibel. 7). smoothed frequency annual cycle. on three scales: linear. indeed we have already (§3. after removing trend and annual cycles. Task 39 : Compute and display the periodogram of the groundwater levels. 7). this harmonic is not provably different from zero. The raw density is shown by specifying the optional log="no" argument. 7). log = "yes". near one (annual). sub = "annual cycle. main = "Periodogram. The log scale display for the spectral density is the default in order to more easily show the lower-power components.gw <.The list of indices is in units of inverse frequency. • > > + > > + > > + > > par(mfrow = c(1. sub = "annual cycle.3) decomposed the time series into trend.r. Task 38 : Compute and display the periodogram of the groundwater levels. main = "Anatolia groundwater level". i. spans = c(5. • > sp. 7). 3)) spectrum(gw. smoothed") > grid() 7 10 · log 10(I(ω)) 56 . log = "no". smoothed The annual cycle is well-known. main = "Anatolia groundwater level". we can see three peaks: near zero (corresponding to no cycles. sub = "annual cycle. smoothed") grid() spectrum(gw. spans = c(5. log = "dB". smoothed") grid() spectrum(gw. and centred on two (6-month). residu + sub = "annual cycle.e. smoothed") grid() par(mfrow = c(1. span = c(5. main = "Anatolia groundwater level". the mean). specifying log="dB" argument shows the spectrum as decibels7 as is conventional in signal processing. spans = c(5. By removing the dominant cycle we may get a sharper view of other frequencies. smoothed frequency annual cycle. log = "dB". log.

r. log = "dB". 1)) 57 . span = c(5.Periodogram. smoothed".gw$freq[which. 2)) spectrum(gw. 1/3 etc. type = "h") main = "Periodogram. There are also no peaks at the harmonics (1/2. years).e.spectrum(gw. grid() s <. this has been removed with the annual cycle. 7).max(s$spec)] = "Periodogram. Zooming in on the first two frequencies. 2).max(s$spec[16:30]) + 15]) [1] 15. smoothed". 7). main sub = "annual cycle. log = "no".max(s$spec[16:30]) + 15] [1] 0. sub = "annual cycle.max(s$spec)]) [1] 51.gw$freq[which.max(s$spec[16:30]) [1] 8 > sp.2333333 > frequency(gw)/(s$freq[which. residuals". smoothed Notice that there is no peak corresponding to one year. xlim = c(0. i. residuals spectrum (dB) −25 0 −20 −15 −10 −5 0 1 2 3 4 5 6 frequency annual cycle.7666667 > frequency(gw)/(s$freq[which. with two different views: > > + > > + > > par(mfrow = c(1.r. one and one-half year.65217 > par(mfrow = c(1. span = c(5. residuals" 2)) [1] 0. xlim = c(0.42857 > which. grid() sp.

5 1. so the lower temporal resolution does not much affect interpretation.5 spectrum (dB) 0.0 0.5 2. Return to Q24 • A25 : The peaks and valleys are further smoothed.Periodogram. residuals Periodogram. Obvious outliers from the overall pattern are the deep levels in one December – March period. In 2002 the amplitude was highest. and some month-to-month irregularities are removed. Return to Q23 • A24 : The peaks and valleys are less extreme.0 Q40 : What are the periods of the highest spectral densities? How can these be explained physically? Jump to A40 • 3. Return 58 . smoothed 1. this shows the year-to-year variation.5 0. residuals 1. smoothed 1. month-to-month irregularities are removed.0 frequency annual cycle.0 −25 0.5 2. Return to Q21 • A22 : There is a definite annual cycle: September has the deepest levels (at the end of the extractive period) and April the shallowest (at the end of the winter rains).0 spectrum 1. less than 2 m.0 0.5 1. Return to Q22 • A23 : The monthly series has thrice the points and thus more detail. Return to Q25 • A26 : The annual cycle is removed.0 frequency annual cycle. Winter months are a bit more variable than summer. It then increases but there are year-to-year differences.7 Answers A21 : Until 1981 the amplitude is small. however the pattern remains clear and the high/low values for each cycle are not much different.0 −20 −15 −10 −5 0 0.

the remainder ranges from ≈ −2 . Return to Q29 • A30 : The average seasonal component has amplitude ≈ ±1. this reflects the annual cycles. Return to Q28 • A29 : The decomposed series has class stl and consists of three named time series: “seasonal”. decreasing to 1/3 adjusts more closely to trends that go over a few years. The smooth trend seems to represent a long-term change in groundwater. also the lines show an evolution of this over time: 59 . here the amplitude increases until about 2000 and then stabilizes. Return to Q32 • A33 : The auto-correlations gets weaker for the first six or seven months. but prior to 1980 they are amplified. “trend”. The rough trend has noise that does not seem to be due to a long-term process. Return to Q31 • A32 : The smoother decomposition has a smoother trend (of course). later it is deeper (positive values). increasing the parameter to 1 removes almost all variation and results in almost a straight line. . 4. after an initial decrease. because the cycles are more closely fit. for example the initial overall decrease in depth for the first three years. For most of the series the seasonal fluctuations have been mostly removed. The parameter value of 1/3 seems most useful here. but all except 1988 are within a more restricted range. .to Q26 • A27 : The default parameter (2/3) shows the overall trend (increasing groundwater) depth) slightly adjusted for local phenonomena. the seasonal component is the same. ≈ ±1. so averaging the cycle over all years amplifies these early small fluctuations.5m depth. the trend ranges over 25m depth and generally increases. The amplitude of the remainder has been reduced.5m. This is because in that period there was not much seasonal fluctuation. The cycles themselves may have different shapes: note the “shoulder” in the decreasing level in 1983-1990. the mean is zero and the numbers represent deviations from the cycle at each position in it. so the remainders are larger and their serial auto-correlation extends over a longer span. and “remainder”. The remainders show strong auto-correlation within each cycle. organized as a matrix of class mts (“multiple time series”) with three columns (one for each series). Thus the trend is by far the largest componet. which is absent from later yeard. the seasonal and remainer are similar orders of magnitude. Return to Q30 • A31 : The seasonal component can change with time. but then strengthens.5m. Thus at the earlier years the groundwater is shallower (negative values). and the unusual extraction in 1988. Return to Q27 • A28 : Numerically. Return to Q33 • A34 : The detailed scatterplot shows which measurements are more or less correlated to the lagged measurement. probably due to increasing extraction for agriculture.

The negative autocorrelation between subsequent years means that relatively wet (dry) years tend to be followed by relatively dry (wet) remainders.e.23 months per cycle.e.e. i. because in a wet (or dry) year the level can not fluctuate rapidly and so tends to stay relatively high (or low). decreasing from 0. The remainders have only a one-month autocorrelation.i. A much smaller peak is at 23/30 = ¯ cycles per year.e. or about 4 years 3 months. the change in level must be smooth. i. which is reflected in the lack of correlation in the year to year remainders (lag 12). after accounting for trend and cycle. Once this is accounted for.1).3. 15. Physically. The removal of the trend has taken away most of the autocorrelation due to the continuous nature of groundwater level change. Return to Q40 • 60 .e. i. These seem to be 0. extraction and recharge at the same seasons each year). i.9878 at lag 1 to 0.e.7965 at two years (24 months).2415. after which the autocorrelation is increasingly negative. i. and thus could be modelling by a first-order autoregressive process (§4. they are then not different from zero (no correlation) until the 16th month. Return to Q37 • A38 : The only partial autocorrelation provably different from zero is the first lag (one month). the other lags have no autocorrelation.51 dB) at the six-month frequency. Return to Q38 • A39 : At frequency 1 (one year) there is a large component (-9 dB). a strong annual cycle. Return to Q35 • A36 : The remainders are positively autocorrelated within one year (up to 11 months). There is also a much weaker component (-16.4 A40 : The highest spectral density is at 7/30 = 0. 51. There is positive autocorrelation within one year.76 artefacts of the particular data series.e. the negative partial autocorrelation appears again at 12 and 13 months. 12 and 24 months). or about 1 year 3 months.7 months. Return to Q39 • ¯ cycles per year. this reflects the fact that groundwater level can not fluctuate rapidly month-to-month. Return to Q34 • A35 : Groundwater levels are highly positively correlated. The remainder represents the local effects after accounting for trend and cycle. Return to Q36 • A37 : Lag 2 has a substantial negative partial autocorrelation: -0. The correlation increases locally at cycle lengths (one and two years. Then months 5 – 9 have slightly positive partial autocorrelations. Removal of the cycle has taken away any autocorrelation due to seasonality (i. whether the correlation is systematically greater or less.

Use the model to simulate similar time series (§8). Use the model to predict into the future. periodic decom Well 1. cycle. The residual is by definition noise in this decomposition. representing the long-term and cyclic behaviour of the series. t.5 0.series.window = 25.series. this would have a mean. amplitude. 2. There are several aims of modelling a time series: 1. periodic decomposition 1.3) that a time series can be decomposed into a trend. with different smoothness: > plot(stl(gw. + main = "Anatolia well 1. and residual using the stl function.5 1980 1985 1990 1995 2000 2005 Time > plot(stl(gw. We begin with an examination of some model forms. s=25. or into the past. The twelve monthly temperatures could then be recovered from these three parameters. 3.1 Modelling by decomposition We have already seen (§3. s. t=85 decomposition") 61 . main = "Well 1. A simple example is modelling monthly temperature as a sine wave.4 Modelling a time series A model of a time series is a mathematical formulation that summarizes the series. Understand the underlying process(es) that produced it.window = "periodic")$time. Recall the two decompositions of the groundwater data. s.5 −0. and phase (the period would be defined as one year). The problem with this approach is that the trend removal is empirical: its smoothness must be adjusted by the analyst. at missing data points.5 seasonal trend remainder −2 1975 0 1 2 3 4 30 35 40 45 50 55 −1. the other two components are a model.window = 85)$time. 4. This formulation has a model form and several parameters.

4. There is no theoretical way to determine these. We now examine some models that do allow prediction. using predictors and responses from a dataframe. The decomposition gives insight into the processes underlying the time series. Task 40 : Build a linear model of the trend in groundwater level. regression is more typically used to model a trend over time. Linear models can also be used for interpolation (gap filling) and extrapolation (prediction).Anatolia well 1. the span (number of lags) of the loess window for seasonal extraction. and evaluate its success and suitability. cyclic components are better-handled by decomposition (§4.window. This may include interactions between cycle and position. as revealed by the decomposition? Jump to A41 • However. (2) t. However. because there is no model of the underlying process for the trend. the second specifies a (1) s.2 Modelling by linear regression The standard techniques of linear modelling can be applied to the time series. t=85 decomposition 2 seasonal trend remainder −2 1975 0 2 4 6 35 40 45 50 55 −2 −1 0 1 1980 1985 1990 1995 2000 2005 Time The first decomposition makes no assumptions other than the annual cycle. 62 .f must be used instead of the raw time series. thus the dataframe gw. s=25. considering each observation as a function of time and position in the cycle. Q41 : What could be the processes that control groundwater level in this well.1).window. there is no way to extrapolate (predict). the span for trend extraction. • The lm function computes linear models.

xlab = "Year") > title("Anatolia well 1.> m. col = "red") Anatolia well 1.00 <2e-16 *** --Signif.f$time).gw). Adjusted R-squared: 0.numeric(gw.f) > summary(m.6562 Task 41 : Plot the time series. p-value: < 2.f$gw.4857 Max 6. • > plot(gw.f) Residuals: Min 1Q Median -4.521 on 358 degrees of freedom Multiple R-squared: 0. Error t value Pr(>|t|) (Intercept) -1. with the fits from the linear model superimposed. + ylab = "Groundwater level (m)".053e+01 -51. data = gw.2e-16 3Q 1.130e-01 1.8445 -0.887. linear trend Groundwater level (m) 30 1975 35 40 45 50 55 1980 1985 1990 Year 1995 2000 2005 Q42 : How much of the variation in groundwater level is explained by this model? How well does it appear to explain the trend? What is the average annual increase in depth? Jump to A42 • 63 .1 ' ' 1 Residual standard error: 2.01 '*' 0.576e+03 3.' 0. data = gw.534e-02 53. type = "l". codes: 0 '***' 0.001 '**' 0.gw) Call: lm(formula = gw ~ time. y = fitted(m. linear trend") > lines(x = as.f$time.9134 -1. col = "darkgreen".3193 Coefficients: Estimate Std.gw <.lm(gw ~ time.05 '.64 <2e-16 *** time 8.8867 F-statistic: 2809 on 1 and 358 DF. gw.

gw.6461 Max 7.e.gw.gw)["time"] time 0.f) > summary.3) Call: lm(formula = gw ~ I(time^3) + time. most of the problem is at low fitted values.4775 64 .3 <.f) Residuals: Min 1Q Median -4.8866509 > coefficients(m. i.Note that the adjusted R 2 and slope can be extracted from the model object: > summary(m. > plot(m.7362 -1.squared [1] 0. The anova function compares two models. data = gw. so perhaps fitting a cubic trend might give a better model. There is also a large discrepency near 40 m.gw. > m.gw)$adj. which = 1) • Residuals vs Fitted 8 159 q q 158 q 157 q q q q q q q q q q qq qq q q q q qq q qq q q q q q q q q q q q qq q q qq q q q q q q q q q q q q q q q q q 6 q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q 4 q q q q q q q q q q q qq q q qq q q qq q q q qq qq q q q q q qq q q q q q q qq q qq q q q q qq q q q q q qq q q q q q qq q q q q q q q q q q q q q q qq q q q q q q q q q q q q qq q q qq q q q q q q q q q q q q q q q q q q q q q q q qq q q q qq q q q q q q q q q q q q qq q q q q q q q q q qq qq q q q q q q q q q q q q qq q q q q q q qq q q q q qq qq q q q q q qq q q q q q q qq q q q q q qq q q q q q q q q q qq q qq q q q q q qq q q q q q qq q q q q q qq q q q q q q q qq q q q qq qq q q q q q q Residuals −6 30 −4 −2 0 2 35 40 Fitted values lm(gw ~ time) 45 50 The diagnostic plot shows that this model violates one assumption of linear modelling: there be pattern to the residuals with respect to fitted values.9403 -0. data = gw. which corresponds to the anomalous year 1988.lm(m.1648 3Q 1. The trend seems to have two inflection points (around 1980 and 1985). fitted values. the beginning of the time series.lm(gw ~ I(time^3) + time.8130209 Task 42 : Display a diagnostic plot of residuals vs.r.

25 36.424e-09 *** --Signif.758e+00 -5.845 1.gw) Analysis of Variance Table Model 1: gw ~ I(time^3) + time Model 2: gw ~ time Res. + col = "red") Anatolia well 1.746 1.163e-07 6.8969 F-statistic: 1563 on 2 and 357 DF.42e-09 *** time -2. y = fitted(m.05 '.001 '**' 0.917e-06 3. cubic trend") > lines(x = as. codes: 0 '***' 0.1 2 358 2274. Error t value Pr(>|t|) (Intercept) 2.numeric(gw. gw. col = "darkgreen". p-value: < 2.' 0.Coefficients: Estimate Std.001 '**' 0.f$gw.403 on 357 degrees of freedom Multiple R-squared: 0.1 ' ' 1 > plot(gw.01 '*' 0. codes: 0 '***' 0.f$time. xlab = "Year") > title("Anatolia well 1. type = "l".746 3. + ylab = "Groundwater level (m)".985e+03 5.gw.05 '.f$time).3).3. Adjusted R-squared: 0.2e-16 > anova(m.8975.' 0.197e+01 3. m.1 ' ' 1 Residual standard error: 2.Df RSS Df Sum of Sq F Pr(>F) 1 357 2062.4 -1 -212.864e+04 4.96e-08 *** I(time^3) 1.01 '*' 0.14e-08 *** --Signif.062 3.gw. cubic trend Groundwater level (m) 30 1975 35 40 45 50 55 1980 1985 1990 Year 1995 2000 2005 Q43 : Is the cubic trend model better than the linear trend? A43 • Jump to 65 .

0153 59.4744 -57. data = subset(gw.lm(gw ~ time.9157 F-statistic: 3511 on 1 and 322 DF.gw.f$year > + col = "red") Anatolia well 1.f$year > + 1977)) > summary(m.1978) Call: lm(formula = gw ~ time.4399 -0. y = fitted(m.1978).0702 Coefficients: Estimate Std.2e-16 > plot(gw.26 <2e-16 *** --Signif. The subset function is used to limit the time series in the dataframe: > m. gw.05 '.1 ' ' 1 Residual standard error: 2. type = "l". "Year") 1978") 1977]). gw. col = + ylab = "Groundwater level (m)".f$time[gw. p-value: < 2. linear trend since 1978 • 3Q 1.gw. linear trend since > lines(x = as.f$gw. gw.f. Adjusted R-squared: 0. codes: 0 '***' 0.0422 -1.gw.f.3550 Max 7. Groundwater level (m) 30 1975 35 40 45 50 55 1980 1985 1990 Year 1995 2000 2005 66 . data = subset(gw.Clearly the series to about 1978 is different from that after.f$time.01 '*' 0.3310 "darkgreen". xlab = > title("Anatolia well 1. perhaps the extraction did not begin until then? Task 43 : Model the trend since 1978.9068 0.' 0.001 '**' 0.147 on 322 degrees of freedom Multiple R-squared: 0.916.86 <2e-16 *** time 0. Error t value Pr(>|t|) (Intercept) -1763.3608 30.numeric(gw.f$year > 1977)) Residuals: Min 1Q Median -5.1978 <.

1978)["time"] time 0..gw.2050) num [1:46. > gw.8130209 > coefficients(m.r.2050 <.. The confidence interval of the fit is returned if the interval argument is provided. . + level = 0.95) > str(gw. graph this with its 95% confidence interval of prediction.r. .9067699 Q44 : Does this model fit better than the linear model for the entire time series? Is the average annual change different? Which would you use in extrapolating into the future? Jump to A44 • Task 45 : Predict the groundwater level in January 2010. this time with a sequence of times at which to predict.squared [1] 0.9) fit lwr upr 1 59.gw.8866509 > summary(m. • The generic predict method specialized to the predict.squared [1] 0. • Again we use the predict. "dimnames")=List of 2 .$ : chr [1:46] "1" "2" "3" "4" .$ : chr [1:3] "fit" "lwr" "upr" 67 .frame(time = 2005:2050). + interval = "prediction". interval = "prediction".7 55. data.lm function when applied to a linear model object..gw..9157392 > coefficients(m.5 57. 1:3] 54.1978)$adj.1978. Compare the model since 1978 with the model for the whole • > summary(m.4 58.gw)$adj.lm(m. data. level = 0.6 56.predict.82443 Task 46 : Predict the groundwater level from 2005 to 2055.attr(*.3 .1978.Task 44 : series.frame(time = 2010)..24676 55.gw)["time"] time 0.lm method.gw..66909 62. the "prediction" option returns the upper and lower bounds of the prediction at the specified confidence level: > predict(m.

Recall that this component can be identified by decomposition (§3. gw. the seasonal component of a series is not interesting.1 Modelling a decomposed series If we are interested in modelling an overall trend. "lwr"]. xlim = c(1978.f$gw)). col = "red".> + + + > > > > + > > > > > plot(gw. lty = 2) lines(2005:2050. "extrapolation") Anatolia well 1. "fit"]) lines(2005:2050.numeric(gw.3). • We first add a field to the time series dataframe and then use it for modelling.2050[. lty = 2) lines(x = as. lty = 2) text(1990. 68 . ceiling(max(gw.2050[. 60.1978). col = "blue") text(2030. linear trend since 1978") grid() abline(v = 2005. The residual noise quantifies the uncertainty of the parametric fit.2050[. "upr"])))) title("Anatolia well 1.2050[. linear trend since 1978 100 Groundwater level (m) 60 70 80 90 fit extrapolation 30 1980 40 50 1990 2000 2010 Year 2020 2030 2040 2050 Q45 : Do you expect the groundwater level to be below 80 m by 2040? What factors determine how long the modelled trend may continue in the future? Jump to A45 • 4. gw. col = "darkgreen".gw. ylab = "Groundwater level (m)".f$time. 60.f$year > 1977]).f$time[gw. gw.2. "upr"]. xlab = "Year". col = "blue") lines(2005:2050. ylim = c(floor(min(gw. col = "red". if this is removed the smoothed trend and residual components can be used as the basis of trend modelling. type = "l". gw. "fit". 2050). y = fitted(m.f$gw. Task 47 : Fit a linear trend to the non-seasonal component of the groundwater level.

.f$time. xlab = "Year") title("Anatolia well 1.9693 -0. codes: 0 '***' 0. data = subset(gw. p-value: < 2. ylab = "Groundwater level (m)".1106 Coefficients: Estimate Std.1 ' ' 1 Residual standard error: 1.7 34.9538 -0.f$gw. $ year : num 1975 1975 1975 1975 1975 . + gw..5 34. col = "blue".. s. $ in.f$nonseas.nonseas). gw. type = "l".01 '*' 0.gw.757e+03 2.9389. pos = 4.f$year > 1977)) > summary(m.Again.70 <2e-16 *** time 9.' 0.series[.gw.284e-02 70.5 34..001 '**' 0.gw.stl <. pos = 4.f$year > 1977]).802 on 322 degrees of freedom Multiple R-squared: 0. > m. Error t value Pr(>|t|) (Intercept) -1. data = subset(gw. col = "red") text(1995.05 '. of 6 variables: $ gw : Time-Series from 1975 to 2005: 34.stl$time.727 -0. Adjusted R-squared: 0. non-seasonal component") lines(x = as.297 . 38.gw. gw. the trend since 1978 is modelled..036e-01 1..477 -0. 34.4 34.f$year > 1977)) Residuals: Min 1Q Median -3.8 34..stl(gw.9 . > gw. col = "darkgreen".. y = gw. col = "red"..36 <2e-16 *** --Signif.2621 69 .f$nonseas <.yr : num -0.f..numeric(gw.8 35. 36.numeric(gw.558e+01 -68. + "remainder"] > str(gw. col = "blue") lines(x = as.frame': 360 obs.f$time). col = "darkgreen".378 -0.f) 'data. "non-seasonal component") text(1995.stl$time.f$time[gw.8 35.2e-16 > + > > > + > > > plot(gw. pos = 4.8762 Max 8. $ nonseas: Time-Series from 1975 to 2005: 34.818 -0.nonseas <.window = 2 * frequency(gw) + 1) > gw. "time series") text(1995.9387 F-statistic: 4950 on 1 and 322 DF.6 .lm(nonseas ~ time. "linear trend") 3Q 0. linear trend since 1978. y = fitted(m.f.3 35.series[. $ time : Time-Series from 1975 to 2005: 1975 1975 1975 1975 1975 . "trend"] + gw..nonseas) Call: lm(formula = nonseas ~ time.. $ cycle : num 1 2 3 4 5 6 7 8 9 10 .

summary(m.02299186 > tmp/summary(m.1978)["time"] time 0. Consider (1) the amount of variability explained.1978)$adj.squared .squared) [1] 0.r.nonseas)["time"] time 0.9067699 Q46 : By how much does the proportion of variability explained by the model change when the seasonal component is removed before modelling the trend? Jump to A46 • Q47 : By how much does the slope of the trend change when the seasonal component is removed before modelling the trend? Jump to A47 • 70 .squared [1] 0.gw.02510743 > coefficients(m.9035588 > coefficients(m. linear trend since 1978.1978)$adj.gw.r.gw.r. • > (tmp <.Anatolia well 1.summary(m.gw.gw.nonseas)$adj. (2) the slope of the trend. non−seasonal component Groundwater level (m) 40 45 50 55 time series non−seasonal component 35 linear trend 30 1975 1980 1985 1990 Year 1995 2000 2005 Task 48 : Compare this linear model (with the seasonal component removed) to the linear model of the series since 1978 computed in the previous subsection.

In this case the trend is quite strong so this is not an issue. further the apparent trend may not be real.4. this is a requirement for using ordinary least squares (OLS) to fit a linear model. In the trend analysis of the previous section (§4.gw.lm function. Hirsch et al.2 Non-parametric tests for trend A non-parametric test is one that does not assume any underlying distribution. but we will visualize the regression diagnostics with the plot. the confidence intervals computed for the slope may not be accurate. • There are various formal tests. > par(mfrow = c(1. One approach is to use a robust regression [2] to estimate the trend its confidence limits. 2)) > plot. [8] present the Mann-Kendall test for trend. Task 49 : Check that the residuals of the trend analysis are IID. which = 1:2) > par(mfrow = c(1. 1)) Residuals vs Fitted 159 q q 158 q 157 q Normal Q−Q 159 q q 158 q 157 q q q 8 Residuals 4 q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q qq q q q q qq q q q q q q q q q q q q q qq q q qq q q q q q q q qq q q q q q q q q q q q q q q q q q q qqq q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq qq q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q qq q Standardized residuals 6 q 2 4 −2 q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q qq 0 2 −4 −2 30 35 40 45 50 55 0 −3 −2 −1 0 1 2 3 Fitted values Theoretical Quantiles Q48 : Do the residuals appear to be IIND? Jump to A48 • Since the residuals do not meet the criteria for OLS.2.1) we assumed that the residuals (after accounting for seasonality and trend) was independently and identically normally-distributed (IIND).lm(m. • 71 . using the whichplot.nonseas. Another approach is to use a non-parametric test. Still. Task 50 : Check for a trend in the groundwater levels time series since 1978.2. we discuss how to detect trend in a series where OLS models are not justified. which has been included in the Kendall package written by Ian McLeod (co-author of [7]). fitted values) and 2 (normal quantile-quantile plot of residuals).lm argument to select graphs 1 (residuals vs.

f$year > 1977)$nonseas) > MannKendall(gw. The SeasonalMannKendall function does the same under the alternative hypothesisis that for one or more months the subsample is not distributed identically. In the present case the series is clearly seasonal.22e-16 > gw. 1978.1978) tau = 0.frequency(ts) + n <. 1 ≤ k < j ≤ n where n is the number of years.892.f. 12. estimate the slope B by their median. . 2-sided pvalue =< 2.These should give similar results.c(d. • > MKslope <.window(gw. gw.function(ts) { + f <. main="individual slope estimates".ts[i + (k-1)*f])/(j-k)).nonseas.e. we remove the seasonality).1978) tau = 0. > require(Kendall) > gw. i = 1 . The differences (xij − xik ) are for the twelve months indexed by i and the years indexed by k < j and then normalized by the number of years between each pair of values. 2-sided pvalue =< 2. .ts(subset(gw. the slope is estimated as zero (no trend). Note that if the number of positive and negative differences are equal.1978 <.The MannKendall function of the Kendall package computes this test for a time series. 2005) > SeasonalMannKendall(gw. resulting in the probability that a τ -value this large could have occurred by chance in a series with no true trend. xlab="slope") + print(summary(d)) 72 . also displaying a histogram of the individual slope estimates.NULL + for (j in n:2) + for (k in (j-1):1) + for (i in 1:f) + d <. So each dijk is a slope between two years for a given month. Then. so we can either compute the seasonal test for the series or the non-seasonal test for the decomposed series (i.length(ts)/f + d <.1978 <.22e-16 The τ -value is a test statistic that is then compared to a theoretical value. Q49 : What is the probability that there is no trend in this series? Jump to A49 • Hirsch et al.862. Task 51 : Write a function to compute this non-parametric slope. (ts[i + (j-1)*f] .. [8] also propose a non-parametric estimator of slope: compute all differences between the same month in all pairs of years: dijk = xij − xik j−k .nonseas. + hist(d.

2.9132 1. 7. Speed River. 73 .9050 Mean 3rd Qu.3). 1st Qu.3 A more difficult example The trend is not always so obvious. January 1972 through January 1977.905 Median 0.1978)) Min. Ontario.9700 0.omit to account for the possibility of missing values in the time series. Task 52 : Estimate the non-parametric slope with this function and compare it to the parametric (OLS) estimate. in some time series. • The appropriate slope for comparison is the linear model of the decomposed series (with seasonality removed): > print(MKslope(gw.nonseas)["time"] time 0. this will be used in the following example (§4. Guelph. 0. a monthly time series of phosphorous (P) concentrations in mg l-1.1130 Max. 4.omit(d))) + } Note the use of na.gw. -6.+ return(median(na.9035588 individual slope estimates 2500 Frequency 0 500 1000 1500 2000 −5 0 slope 5 These are very close in this case.2.6877 [1] 0. the Kendall package includes the sample dataset GuelphP.1400 > coefficients(m. For example. and the deviations from IIND residuals much stronger. and hence missing differences.

0 qq q qq qq q q q 1972 1973 1974 1975 Time 1976 1977 1978 Q50 : Describe this time series qualitatively (in words). 0. 2-sided pvalue =1.002667 [1] -0. ylab = "P concentration.275000 • NA's 20 74 .2 q qq q qq q qq qq qq 0. > SeasonalMannKendall(GuelphP) tau = -0.8 q q 0. if present.042000 -0.090540 -0. mg l−1 0.MKslope(GuelphP)) Min. + main = "Speed River water quality") Speed River water quality q • 1. Does there seem to be a linear trend? Jump to A50 • Task 54 : Test for a trend and. type = "b". > data(GuelphP) > plot(GuelphP.135100 -0.0 q q P concentration. Median Mean 3rd Qu.Task 53 : Load this dataset and plot as a time series.562.b <. -1. estimate its slope.4 q 0.758e-07 > print(guelphP. 1st Qu. mg l-1".6 q q q q q q q q q q q q q q q q q q qqq q q qqq q q q q q q q q q q q q q 0.05633333 Max.056330 -0.

which is especially useful for forecasting. the magnitude of αl is the degree of autocorrelation with the 75 .5 slope 0. relative to the white noise. The order p of the process controls the number of previous values of Yt considered. [3] developed an approach to time series analysis known as ARIMA (“autoregressive (AR) integrated (I) moving averages (MA)”).0 −0. s < t .individual slope estimates 100 Frequency 0 20 40 60 80 −1. The strength of the correlation. The AR process is defined such that each Yt in the sequence is computed from some set of previous values Ys . what is its slope? Is this slope meaningful? Jump to A51 • 4. plus white noise Zt . We first examine the “AR” and “MA” aspects. This white noise is independently and identicall distributed (IID) at each time step. and then add the “I”.3. the series is centred by subtracting the overall mean µ and considering the differences: p (Yt − µ) = l=1 αl (Yt −l − µ) + Zt (2) where the Zt form a white noise (purely random) sequence {Zt }. 4.1 Autoregressive (AR) models The simplest model form is the autoregressive (AR) model.3 Autoregressive integrated moving-average (ARIMA) models Box et al.0 Q51 : Is there a trend? If so. To simplify computations. gives the continuity of the series. Here the values in the series are correlated to some number of immediately preceding values.

This indicates an AR(1) model. lags = 1. lag 1 scatterplot" Anatolia well 1 remainders. Note that the mean of the remainders should be zero. examine the autocorrelation as a scatterplot.lth previous value. by analogy with the light spectrum) smoothed out by the autocorrelation. lag 1 scatterplot q 6 q q q 4 q q q q q q qq q q qq q q q qq q q qq q q q q q q q qq q q q q q q qq q q q q q q q q q q q qq q qq qq q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q qq qq q q q q q q q q q q q q q q q q q q q q q q q q q q qqq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q qq q q q q q q q q q qq q q q q q q qq q q q q q q q qq q q q q q q qq qq q q qq q q q q q q q q q q q qq qq q q q q q q q q q q q q q q q q q qq q q q q q q q q q qq q q q q q q q q q q q q qqqq qq q q q q q q q qq q q q q q q q q q q q q q q q gw. this was already done in §3.r.plot function: > lag.06420781 Task 55 : Estimate the autocorrelation parameter for the remainders for the groundwater levels of well 1. after the first order was accounted for. main = "Anatolia well 1 remainders.r) [1] -0. if we’ve accounted for the trend and cycles. The new (Yt − µ) are computed from previous values of (Yt −l − µ) up to the order. The low-frequency (“red”) random variations are preserved. plus some new IID white noise {Zt }.5 we saw that the remainders for the groundwater levels of well 1 had no partial autocorrelations. in this case it is very close: > mean(gw. because the white noise represented by the sequence {Zt } has the high-frequency variations (“blue”.r −2 0 2 −2 0 2 lag 1 4 6 76 . AR(1) models The simplest model is the AR(1) model: (Yt − µ) = α1 (Yt −1 − µ) + Zt (3) This has the same form as a linear regression.plot(gw.4. In §3. This series is sometimes called a red noise process. and the single parameter α1 can be computed in the same way. • First. using the lag.

gw.1 0. xlab = "Lag 1". here we need a shift in order to relate the previous month’s level to the current month’s level.05 '.2370 -0.7884.r. We saw in §3.2e-16 > abline(m. The shift is effected by subscripting. gw.0304 3Q 0.887310 0.2)] .1)] .424 <2e-16 *** --Signif.1)) [1] 0.r. We must first produce a time-series offset by one month.r) .4 that the lag function lags the time series but does not shift it.lag(gw.r.r. using the [] operator.lag1 <.1) summary(m.9367 -0.0 ~ gw.r. We compute this relation first with the standard lm method.r.8879416 77 .2109 Max 1.lag1) > (alpha. p-value: < 2.1.0 <.r.mean(gw.lm(gw.029559 0.r[2:(length(gw.lag1) Call: lm(formula = gw. note we must relate an observation to the preceding observation (time flows in one direction!).5253 Coefficients: Estimate Std.r. 1)[1:(length(gw.95 gw. lag-1 series") m.1 <.' 0.0 ~ gw.5593 on 356 degrees of freedom Multiple R-squared: 0. subtracting in each case the mean: > gw.r) > gw.01 '*' 0.1 <. ylab = "Original") title(main = "Anatolia well 1.1) Residuals: Min 1Q Median -5.r) > > > > plot(gw.r. remainders.0.r.1 ' ' 1 Residual standard error: 0.001855 0.001 '**' 0.mean(gw.063 0. Adjusted R-squared: 0.r.7878 F-statistic: 1327 on 1 and 356 DF. series vs.024360 36.Clearly a linear relation between the series of remainders and its first lagged series is justified. We first construct the two series.r.r) .0 ~ gw. codes: 0 '***' 0.cor(gw. Error t value Pr(>|t|) (Intercept) 0.

remainders. ] [1] 0. §8. 1)[1:(length(gw) + 2)] .Anatolia well 1.max = 1. because of the inherent continuity within the annual cycle: > cor(gw[2:(length(gw) . lag(gw. lag−1 series q 6 q q q 4 q q q q q q q q q q q q q q q q qq q qq qq q qq q qq q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q qq q q q q q q q q q q q q qqq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q qq q q q q q q q q q q q qq q q q q q q qq q q q q q q qq q q q q q q q q qq q q q q q q q q q qq q qq q q q Original 0 2 q −2 −2 0 2 Lag 1 4 6 Q52 : How much of the variation in the remainder of groundwater level (after accounting for trend and seasonality) is accounted for by knowledge of the remainder at the previous lag? Jump to A52 • Note that the comparable figure for the uncorrected series is much higher.1)] .1]: 2 2 σZ = (1 − α2 )σY (4) 2 where σY is the variance of the time series. For sampled time series of length n.r. This is computed as [20. .mean(gw)) [1] 0. plot = F)$acf[2. the series variance σY 78 . That is. 3.4. the noise is reduced from the observed noise in the series by the autocorrelation – that much of the noise is accounted for by the model. the innovation variance is the variance of the white noise of Eq.mean(gw). This illustrates the “red shift” 2 mentioned above.8879 may be due to how the two computations deal with the ends of the series.8868833 Note: The slight difference between this estimate 0.8869 and the estimate directly from linear correlation 0. Finally.9934396 The correlation coefficient should be the first autocorrelation calculated with the acf “autocorrelation” function. lag. > acf(gw. as shown in §3.3. series vs.

. AR(3) . this is known as the first-order Markov process. 1 [.] -0. For the example series of remainders of groundwater level.2 is estimated from the sample variance sY .1^2) * var(gw.max = 5.15111996 > pacf(gw.04679133 [4.] 79 . it is possible that the current state is also influenced by earlier states. we saw from the partial autocorrelations that this is a reasonable assumption.03321400 [5. .] 0.r) .ar. lag.] 0.88688330 -0.02809817 -0.1 <.02513076 [1. 1 [. plot = F)$acf .1] 0.2) * + (1 .] 0.24150277 [3. models In an AR(1) model the entire prior behaviour of a series is given by the previous value and the one autocorrelation coefficient.r) par(mfrow = c(1. However. lag. 1)) pacf(gw.08180894 -0.r) [1] 1.3118009 We will return to this example and simulate an AR(1) series with this these fitted parameters in §8.alpha.max = 5. this is the case for the original series of groundwater levels: > > > > > par(mfrow = c(1.] [2. plot = F)$acf . AR(2).] [5. other than the immediately preceding one.98782910 [2.] [4. the true correlation α is estimated ˆ and the noise must be corrected for bias: as α 2 sZ = n−1 2 ˆ 2 )sY (1 − α n−2 (5) In the current case: > var(gw. .1)/(length(gw.469715 > (var.r) .r)) [1] 0.] -0.(length(gw.02235775 -0.1] [1.] [3.r. . 2)) pacf(gw) pacf(gw.

it is also stored with the model object. but also solves them for all orders from AR(1). The red-noise variance is then estimated as: 2 ˆ2 sZ (2) = (1 − α 2) n−1 2 2 ( 1 − r1 )sY n−2 (6) where r1 is the sample correlation at lag 1. the parameters are fit simultaneously. 80 . These generalize to any higher order. which relate the parameters to the sample autocorrelations.0 Series gw. the default "yule-walker" is used.5 2.5 2.0 −0. Task 56 : Fit AR(n) models to the remainder series. The red noise variance is printed with the solution. • Here we show the optional method argument.6 Partial ACF 0. until the higher-order fit is not better. AR(2) .6 0.4 0. For the AR(2) model these are: r1 r2 ˆ1 + α ˆ2 r1 = α ˆ1 r2 + α ˆ2 = α which can be solved as simultaneous equations.2 0. as the degree of the AR model increases.8 Partial ACF 0.0 Lag 1. as judged by the AIC.0 0.8 0.2 0.5 1. this is the case but not quite at the level of significance.0 Once the preceding lag is taken into account (high positive correlation.5 1. The ar function not only solves these equations. . the most common method is with the Yule-Walker equations.2 0. . Thus the original white noise is reduced yet further.r 0. Even for the remainders.Series gw 1.0 0. high continuity) we see that the second lag is negatively correlated (lack of continuity). Q53 : What is the physical interpretation of this result? Jump to A53 • For higher-order AR models.0 Lag 1.4 0.

pred [1] 0. > (ar.r. 81 .9594 -0.3146 Q54 : What order of AR model is selected by the ar function? How well is the remainder series modelled by an AR(2) process? How much improvement is this over the AR(1) process? Jump to A54 • Note: An AR(1) model is always stationary.r.max = 1)) Call: ar(x = gw.r$ar [1] 0. but higher-order models may not be.max = 1) Coefficients: 1 0.1 <.gw. method = "yule-walker")) Call: ar(x = gw.r <.1$ar [1] 0.r. §8. order.r.8868833 sigma^2 estimated as 0.3133392 > ar.g. The ar function reports non-stationarity in the fit. using the order.> (ar. we re-fit the AR(1) series also.gw. method = "yule-walker") Coefficients: 1 2 0.gw.3.2]) for details.ar(gw.r. We will return to this example and simulate an AR(2) series with these fitted parameters in §8.3133 > ar.0818 Order selected 2 sigma^2 estimated as 0.ar(gw. Wilks [20. See texts (e.gw.max argument to force ar to only fit this order.r.r$var.95943829 -0.gw.08180894 For comparison. The parameters must jointly satisfy some constraints.8869 Order selected 1 > ar. order.

The theory behind ARMA models is that a time series. 4. MA models are often used to model apparent trends. An overall mean level µ .e. white noise. can be considered as the contribution of four components: 1. we re-fit the AR(2) model of the well level residuals (§4. conventionally represented as {Zt }. where values in the series are correlated to some number of immediately preceding values. a random component with zero mean and some constant variance. 0): 82 . but with no correlation between successive values of this noise.3. 4. These are ARMA models with an additional element: the degree of differencing applied to the series before ARMA analysis.4.3. ARMIMA models are conventionally specified with three components (p. The time series results from random noise. this can be subtracted from the series. p : the AR order. Thus. plus the white noise for the current time: q Yt = j =1 βj Zt −j + Zt (7) The βj weight the relative contributions of the previous values. d: the degree of differencing. q: the MA order. so that the observed time series is the result of these two types of random processes. not a true trend. To illustrate. This requires at least two arguments: the series and the order. d. written as a sequence of values over time {Yt }.3. 3. 2. the expected value and covariance do not depend on the position in the series.4 ARIMA models The “I” in ARIMA stands for “integrated”. where values in the series are some linear combination of earlier values of white noise. 4. An autoregressive (AR) component. Differencing is applied so that the series is second-order stationary. which (if any βj = 0) can “drift” into an apparent trend. which in fact is the result of the stochastic process. ARIMA models are fit with the arima function. 2. which are: 1. leaving a series centred on zero. A moving average (MA). 0.2 Moving average (MA) models The MA process is simply a linear filter of some previous white noise. q). i.3.3 ARMA models Autoregressive moving-average (ARMA) models combine the AR and MA explained above.1) with arima. The order is (2. and 3.

aic = 606.r$ar [1] 0. The values Y are indexed by the cycle number η and the position in the cycle τ . for a model fit by arima in field coef: > ar. as well as the order l. i.gw.0) fit give the same coefficients as the AR(2) Jump to A55 • The coefficients for model fit by ar are in field ar. both are modelled together.3078: Q55 : fit? Does the ARIMA(2.τ −l − µτ ) + Zη. 83 .τ . model the series.gw.5 Periodic autoregressive (PAR) models To fit an AR model.2341 log likelihood = -299.τ − µτ ) = l=1 αl.96236062 -0. With periodic autoregressive (PAR) models. If there is an overall trend it must be removed before modelling. 0. αl.0.r.95943829 -0.τ . order = c(2.e. Clearly a cyclic series is not first-order stationary. Yη.0851 s. Putting these together.r. 0))) Call: arima(x = gw. i. just the cyclic component.3.0524 0. the white noise depends on the position in the cycle as well: Zη.arima(gw. 0)) Coefficients: ar1 ar2 0.r <.τ (Yη.0925 0.e. These models are widely-used for modelling monthly rainfall series or streamflows. 0. we had to establish stationarity. 4.08180894 > arima.0525 intercept -0. yet it seems somehow unnatural to remove the cycle.r$coef ar1 ar2 intercept 0. order = c(2. The autoregressive parameter is also indexed by the position in the cycle. The PAR process is defined like the AR process.09246881 We will examine model fitting in detail just below. and add the cycle back in. Equation 3 is replaced with: p (Yη. with with a fluctuating average µτ instead of an overall average µ .e.gw.τ . 0.τ (8) Note: Note that a PAR model can not have any trend.95 sigma^2 estimated as 0.9624 -0.08513407 -0.48. They are appropriate when the periodic component is much larger than the white noise. Finally.> (arima.

1978 <. which.2 a linear trend was established for that period: > coef(m.9067699 We subtract the fits from this model from each observation (using the fitted extractor function on the linear model)..9 29.fitted(m.1978 .These are fit with the arima function. d.gw..5568 -0. 2005) > str(gw.. "names")= chr [1:324] "37" "38" "39" "40" . lty = 2) 84 . > gw. ylab = "Deviations from linear trend.2813 -0. m". 1978..5 30.3 30. "names")= chr [1:324] "37" "38" "39" "40" .0457 -0.gw.1199 -0. Task 57 : Fit a PAR model to the de-trended time series of groundwater levels of Anatolia well 1 since 1978.1978) > str(gw.gw. + main = "Anatolia well 1") > grid() > abline(h = 0.5924 .1 29.0 <.4 30.1978)) Named num [1:324] 30.3607555 time 0.. We also need to extract the time-series window. > plot(gw. q).1978. by specifying the optional seasonal argument.attr(*. to get the de-trended series.window(gw..0) Time-Series [1:324] from 1978 to 2005: 0.. using window. > frequency(gw.1978) (Intercept) -1763. In §4..5 .0. .1978. like the ARIMA order. > gw. • Recall that the behaviour before 1978 was qualitatively different than that since.2 30.gw. ..3 30.4 30. we suspect that extraction began in 1978.attr(*.1978) [1] 12 > str(fitted(m.9 .1978) Time-Series [1:324] from 1978 to 2005: 30.1978. is a list of the three components (p.

0. 0))) Call: arima(x = gw.Anatolia well 1 Deviations from linear trend. a highamplitude cycle tends to be preceded and followed by a similar amplitude Note: The frequency can be specified as part of the seasonal argument.5293 log likelihood = -354.1142 -0. 0)) Coefficients: ar1 ar2 1. as given by the frequency function.1978. seasonal = c(0. seasonal = c(1. aic = 719. 0) corresponds to the same cycle each year.0.gw.0492 0. 0.gw.4272 0.arima(gw. 0))) Call: arima(x = gw. 0. 0).1978. Higher-order AR represent autocorrelation of cycles year-to-year.5159: 85 . 0)) Coefficients: ar1 ar2 1. 0.0492 intercept -0.27 sigma^2 estimated as 0. 0.0640 sar1 0.arima(gw. 0).63. aic = 763.1978 <. 0. A seasonal order of (0. 0. order = c(2.5983: > (par.3306 -0.0026 0.4585 s. order = c(2.e. m −4 −2 0 2 4 6 1980 1985 1990 Time 1995 2000 2005 We now fit a PAR model. 0.0. 0).1978.0639 0. 0. order = c(2. + seasonal = c(1. + seasonal = c(0. e.1978. with AR(2) for the non-seasonal part (as revealed by our previous analysis) and different AR orders for the seasonal part.3326 log likelihood = -377. but defaults to the known frequency of the series.g.0.1978 <.0593 intercept -0. 0).32 sigma^2 estimated as 0.0046 0.0. > (par. order = c(2.66.e.2404 s. 0. 0.

1978 <.0571 intercept -0.3604 0.0657 -0. so modelling an observed time series with ARIMA models is intended to fit a good empirical model that can be used for this purpose.0562 sar2 0. then the model is ready to use for process interpretation or forecasting.1978. 0. 0).84 sigma^2 estimated as 0.2144 0.> (par. order = c(2. Task 58 : Plot the groundwater time series and evaluate its stationarity.4 Modelling with ARIMA models ARIMA models were developed primarily for forecasting. The fit of the models. There are two indications that a series is not stationary: ˆ A time series that appears to have different overall levels or degrees of autocorrelation in different sections of the series. + seasonal = c(2. 4. Diagnostic checking of model suitability. Modelling in their approach has three stages: 1. Interpretation in terms of underlying processes is not straightforward. • 86 . also zoom in on a three-year window to see the fine structure.4923: log likelihood = -347.arima(gw. 0.92. 3. 0). 0. These stages are iterated until the model is deemed “suitable”. 2.gw. Q56 : How much does modelling the seasonal component improve the fit? Which degree of autoregression among seasons is indicated? Jump to A56 • 4.0.8162 aic = 707.0618 0. Model identification. lower is better. 0)) Coefficients: ar1 ar2 1.0. 0. seasonal = c(2. order = c(2.1 Checking for stationarity The first step in model identification is to determine if any differencing is needed.1691 s. is given by the AIC. 0. Parameter estimation. ˆ A correlogram (ACF) that does not decay to zero.1978.0635 sar1 0.4. accounting for number of parameters. 0))) Call: arima(x = gw.0118 0.e.

delta-1 plot(diff(window(gw. 1990. 2)) 87 . • > > > + > par(mfrow = c(1. non-constant expected value?) Does the variance appear to be constant? Jump to A57 • A trend can be removed with one difference. main = "Anatolia well 1". ylab = "Groundwater level (m)") par(mfrow = c(1. 1994).e. 1994)). ylab = "Groundwater level (m). delta-1") par(mfrow = c(1. 2)) Anatolia well 1 46 Anatolia well 1 55 Groundwater level (m) 50 Groundwater level (m) 1975 1980 1985 1990 Time 1995 2000 2005 45 40 35 30 39 1990 40 41 42 43 44 45 1991 1992 Time 1993 1994 Q57 : Does there appear to be a trend and/or cycle (i. ylab = "Groundwater level (m)") plot(window(gw. main = "Anatolia well 1". main = "Anatolia well 1". 2)) plot(gw. 1990. Task 59 : Plot the first difference of the groundwater time series and evaluate its stationarity. main = "Anatolia well 1". also zoom in on a three-year window to see the fine structure. ylab = "Groundwater level (m).> > > + > par(mfrow = c(1. 2)) plot(diff(gw).

2)) Anatolia well 1 2.5 0.5 1991 1992 Time 1993 1994 Task 61 : Repeat. • 88 . ylab = "Groundwater level (m).5 1. main = "Anatolia well 1". delta-2") par(mfrow = c(1. 1994))). main = "Anatolia well 1". • > > > + > par(mfrow = c(1. ylab = "Groundwater level (m). delta−2 1975 1980 1985 1990 Time 1995 2000 2005 0 −1. d plot(diff(diff(window(gw.5 −1.e.0 −5 −0. with the third differences. 1990. non-constant expected value?) Does the variance appear to be constant? Jump to A58 • We difference the series once more: Task 60 : Plot the second difference of the groundwater time series and evaluate its stationarity. delta−1 Groundwater level (m).0 1.0 Anatolia well 1 Groundwater level (m). delta−2 5 Groundwater level (m). delta−1 1975 1980 1985 1990 Time 1995 2000 2005 0 −2 −6 −4 −1 1990 0 1 2 1991 1992 Time 1993 1994 Q58 : Does there appear to be a trend and/or cycle (i.0 0.Anatolia well 1 Anatolia well 1 2 Groundwater level (m). 2)) plot(diff(diff(gw)).

1990.> > + > + > par(mfrow = c(1. delta−3 Groundwater level (m). 1994)))). main = "Anatolia well 1". ylab = "Groundwater level (m). 1)) 89 . delta-3") plot(diff(diff(diff(window(gw. • > > > > > > par(mfrow = c(2. 2)) plot(diff(diff(diff(gw))). Task 62 : Plot the autocorrelation functions for the original time series and the first three differences. main = "Anatolia well 1". 2)) Anatolia well 1 Anatolia well 1 15 Groundwater level (m). delta-3") par(mfrow = c(1. delta−3 1975 1980 1985 1990 Time 1995 2000 2005 10 5 −5 −3 −2 −1 0 0 1 2 1991 1992 Time 1993 1994 There seems to be little change between the second and third differences. ylab = "Groundwater level (m). Another way to look at the stationary is with the autocorrelation function plotted by acf. 2)) acf(gw) acf(diff(gw)) acf(diff(diff(gw))) acf(diff(diff(diff(gw)))) par(mfrow = c(1.

If the process is AR(p ).2 Determining the AR degress The next step in model identification is to examine the autocorrelations and partial autocorrelations using acf and pacf.5 0.8 0.5 2.0 Lag 1.0 1. So. We can see this nicely for the groundwater data since 1978. two differencing operations seem to result in a more or less second-order stationary time series.0 −0.6 ACF 0.5 1.2 0.4 0. Jump to A59 • In conclusion.5 2.4.4 ACF 0.2 0.2 ACF 0. > pacf(gw.6 0.5 1.8 ACF 0.5 1.0 Lag 1. the partial autocorrelation is zero at lag ≥ p + 1.0 0. the sample partial autocorrelation plot is examined to identify the order: find the lag where the partial autocorrelations for all higher lags are not significantly different from zero.0 Lag 1.5 0.0 Series diff(diff(gw)) 1.Series gw 1.4 0.0 Lag 1.1978) 90 .5 1.0 0.0 0.0 −0.0 1. 4.4 0.0 Series diff(diff(diff(gw))) 0.2 0. respectively.0 0.0 −0.2 0.0 −0.0 Series diff(gw) 0.0 Q59 : Describe the behaviour of the ACF with increasing differencing.0 0.6 0.0 0.8 0.5 2.5 2.

1978)) 0.0 0. 2.5 1.3 −0.5 2.0 Lag 1. However.1978)) pacf(diff(diff(gw.6 0. 9. 5 .1 0. we evaluate the PACF of the differenced series: > > > > par(mfrow = c(1.1978))) 91 . although the absolute correlation coefficients decrease with increasing differencing.5 2. since we’ve already determined that two differences are needed for stationarity.Series gw.0 Here the significant partial correlations are at 1.2 −0. 12 and 13 months (“lag” on this plot refers to the 12-month cycle).0 Partial ACF −0. 1)) Series diff(gw.4 0.0 Lag 1. .5 1.0 0.3 Partial ACF 0. > acf(diff(diff(gw.2 0.1 Series diff(diff(gw.1978) 0.1978 1.1 −0.8 0.4 0.0 0. .5 1. 2)) pacf(diff(gw.1 0.0 −0.1978))) par(mfrow = c(1.0 These also show partial autocorrelation to 13 months.0 Lag 1.2 −0.2 0.5 2.2 Partial ACF 0. So we could try to fit an AR(13) model.

1978.8971 0. the tsdiag function produces three diagnostic plots for ARIMA models: 1. and q (the MA order).2 0.5805 0. 0))) Call: arima(x = gw.1978. aic = 708. We can see the effect of a poor model fit by under-fitting the groundwater levels with an AR model: 92 .arima(gw.8617 0. order = c(13. > (m.0834 ar11 -0. 2. ARIMA models may also declare a periodic (also called seasonal) component.4 0.7707 0.0 0.ar <. 3. as explained above. d (the degree of differencing). the Ljung-Box statistic for the null hypothesis of independence in the time series of residuals.8 0. these are conventionally known as p (the AR order).1978)) 1.5 1.0835 ar3 -0.0715 ar6 -0.8759 0.e.0801 ar10 -0.8893 0.0 0.0719 ar9 -0.6 0.8569 s.0820 ar13 -0.76 The third step is model checking.8682 0.0 Lag 1. This must be supplied with three parameters. with the same parameters. 0)) Coefficients: ar1 -0.0708 0. 2. order = c(13.4767: log likelihood = -340.38.8651 0.5 2.0798 ar5 -0.0830 ar4 -0. autocorrelation function (ACF) of residuals (should have no significant autocorrelation).0839 ar12 -0.0 The second step is to estimate the model parameters.0556 ar8 -0.Series diff(diff(gw.2637 0.0828 sigma^2 estimated as 0. which specify the model type. 2.0553 ar7 -0.e. using the arima function.0 ACF −0. Task 63 : Calibrate an ARIMA model for the groundwater level.2 0.9231 s. 0.0816 • ar2 -0.9057 0. 0. standardized residuals (should show no pattern with time).

0 0.0 ACF −0.0 Lag 1.2 0.1978. At a more appropriate order the diagnostics are satisfactory: > m. and the low p-values of the Ljung-Box statistic after lag 4.5 2. 0)) > tsdiag(m. 6.2 0.0 0.ar) Standardized Residuals 6 −8 −6 −4 −2 0 2 4 1980 1985 1990 Time 1995 2000 2005 ACF of Residuals 1. this means that we can not reject the null hypothesis of serial dependence at these lags.5 1.> m.ar <. 2. 2.4 q q q q q q q 2 4 lag 6 8 10 Notice the significant autocorrelation of the residuals at 4. c(13.0 p values for Ljung−Box statistic 1. c(3.ar <.8 0.ar) 93 .4 0.0 0.arima(gw.6 0.8 q 0. 12 and 13 months.2 0.1978. 0)) > tsdiag(m.arima(gw.0 q 0.6 q p value 0.

0 0.ahead = 12 * (2013 .6 0. • For ARIMA models..8 . here months: > p. m".2 0.6 0.ar <. > plot(p.3 52.69 1.Arima function.0 0. lty = 2) 94 . ARIMA models can be used to predict.2 0..4 0. Task 64 : Predict groundwater levels from 2005 through 2012.9 53. this returns both the predictions and their standard errors. The argument n.0 Lag 1.05 1. using the AR(13) model just fit.. + main = "Anatolia well 1") > lines(p.5 2. $ se : Time-Series [1:96] from 2005 to 2013: 0. ylim = c(35.2005)) > str(p.8 0.ahead gives the number of prediction points.0 0.8 2 4 lag 6 8 10 These diagnostics look very good.5 Predicting with ARIMA models Finally.ar) List of 2 $ pred: Time-Series [1:96] from 2005 to 2013: 54 53. the generic predict method specializes to the predict.32 1. ylab = "Predicted groundwater level.ar$se..5 1.1 53.0 q q q q q q q q q q p value 0. col = "red".Standardized Residuals −10 −8 −6 −4 −2 0 2 1980 1985 1990 Time 1995 2000 2005 ACF of Residuals 1.ar$pred + p. 4.0 ACF 0.54 1.ar.predict(m. 75).ar$pred.0 p values for Ljung−Box statistic 1.4 0. n.72 .

due to recharge in the winter (rains) and extraction in the summer.9. how far ahead would you be willing to use this predicted series? Jump to A60 • 4.6 Answers A41 : (1) There is a long-term trend.1. the explanation for this is unclear. since the quantity of far exceeds annual rainfall. to a slightly shallower level until 1980 and then steadily to a deeper level. which we see in (2) there is a seasonal cycle in groundwater level. The increasing fluctuation may be caused by more extraction but is also balanced by more rainfall. The overall trend is fairly well-explained by the line. which may reflect rainfall differences. here 0. The slope of the trend (groundwater level vs. 0. lty = 2) > grid() Anatolia well 1 Predicted groundwater level.89. The trend is most likely due to increased extraction for irrigation. col = "red".2005.813 Return to Q42 • A43 : Yes.> lines(p.ar$se. (3) The remainder is of similar magnitude to the annual cycle (±2m) and is strongly auto-correlated. the ANOVA 95 .p. m 40 50 60 70 2006 2008 Time 2010 2012 Q60 : What happens to the prediction as the time forward from the known series increases? What happens to the confidence intervals? As a groundwater manager.ar$pred . time interval) varies a bit over the 1980. The average annual increase is given by the slope coefficient. Return to Q41 • A42 : The proportion of variation explained is given by the adjusted R 2 . the proportion of variation explained has increased to 0. The very large remainder in 1988 was discussed in §2. however the magnitude of the fluctuation appears to increase from about 1983–1995 and has since stabilized.

21 mm less drawdown per year. at a certain point the groundwater level will be too deep to pump economically and will stabilize at that level. the fit without the seasonal component predicts 3.shows a highly significant improvement in fit. about 1 in 107 . 96 . That is. There seems to be a seasonal component.5% compared to the fit to the original time series. 44.907 for the shorter series. very strong underprediction). Return to Q44 • A45 : The trend can not continue forever – for example. and 52 m and low (negative) at 34. when the process was different. except for some spikes in 1975. Some observations are missing. 46. The cubic adjusts somewhat to the initial part of the series.89 for the longer series..92 vs. 48. This is a high degree of continuity. instead there seems to be a discontinuity in 1974. 0.056 (mg l-1) yr-1. There is certaintly a trend. this is the year 1988 anomaly. estimated as -0. Also. fitted values: high (positive) residuals around fitted values of 32. 0. Return to Q51 • A52 : Proportionally.e. Return to Q47 • A48 : There are definite discrepencies. Return to Q46 • A47 : The slope of the trend is almost identical. Firsty. is due to sampling variation is very small. Return to Q45 • A46 : The variance explained increases 2. going two steps forward with a single AR(1) model would over-state the continuity. The steeper slope ignores the earliest part of the series.813 for the whole series to 0. increasing from 0. from high to low concentrations. Return to Q52 • A53 : After correcting for immediate continuity (since this is ground water. future rainfall patterns and future demand for irrigation water may change. the highest residuals are far too positive (i. a bit better. Return to Q43 • A44 : The proportion of variation explained for the shorter series is 0. to be expected) the lag-2 level or remainder introduces a negative correction. Return to Q49 • A50 : The series from 1972 – 1974 has large fluctuations and high concentrations. There is no linear trend. 48 m. there is clear periodicity in the residuals vs. even after accounting for trend and cycle. Return to Q50 • A51 : The probability that the observed trend. Return to Q48 • A49 : Effectively zero.788 of the variance is explained. after 1974 these are both much lower. Second. because there is less overall variability. and so seems to better reflect current conditions. The average annual increase has changed considerably.

compared to an AR(2) model. further the expected values seem to follow an annual cycle.e. compare to 0. Return to Q58 • A59 : The ACF of the original series does not decay to zero. the ACF shows clear cyclic behaviour.9594 and -0. which was removed from the time series fit with the non-seasonal model. So. there are slight differences. Both are indications that this is not a first-order stationary series. So. The prediction does not seem very useful to the manager. This is largely removed by the second difference and completely by the third. for the ARIMA(2. Most of the variance reduction was from the original series to the AR(1) series.0851 Return to Q55 • A56 : The standard error of the best seasonal ARIMA fit is 0. from 0.0) fit these are computed as 0. this indicates a trend. Return to Q56 • A57 : There seems to be a clear trend to deeper levels since 1982. Once this is removed by the first difference. but there still seems to be an annual cycle.4923. Return to Q54 • A55 : No. Return to Q53 • A54 : An AR(2) series is selected. this is not first-order stationary.9624 and -0.0818. this is somewhat higher. Return to Q59 • A60 : The prediction becomes more uniform the further out we predict. but they rapidly become much larger than the annual cycle. The variance increases over time.e. So. thus the lag-2 component can be modelled. The residual variance decreases slightly. The variance increases over time. the seasonal model also includes the seasonal component. the fit is worse. i. the cyclic behaviour is damped and approaches the overall linear trend. Return to Q57 • A58 : There is no apparent trend. The upper and lower confidence limits become wider and also are damped. this is not second-order stationary.3133.3146 to 0. it seem more logical to use the average behaviour of the cycles in the known series and add it to the trend. i. Return to Q60 • 97 .3133 for the non-seasonal model. The AR(2) fit gives the two coefficients as 0.0. But. this is not second-order stationary. the AR(2) series is only a small improvement.

This is included in the Kendall package as a sample dataset GuelphP.. has a new sewage treatment plant improved water quality?)..1 A known intervention We return to the example of the phosphorous (P) concentrations briefly presented in §4. Guelph. is there a new source of pollutants?).33 NA 0. These interventions may be one-time (e.g. Ontario.365 0.g.35 0. Determining the effect of a known intervention (e. January 1972 through January 1977.g. "title")= chr "Phosphorous Data. ˆ Any of these may be multiple. y = 1.g. + main = "Speed River water quality") > abline(v = 1974 + 2/12. known as transfer functions: ˆ An immediate. reaching a new equilibrium over time. lwd = 2) > text(x = 1974 + 2/12.g..2. mg l-1". which has two main uses: 1. showing the known intervention.1" > plot(GuelphP. type = "b". 655] that in February 1974 a new P treatment was introduced in the sewage treatment plant. one day’s streamflow is affected). Each of these may affect the series in several ways. rainfall. 5. a monthly time series of phosphorous (P) concentrations in mg l-1.. Ch.Speed River. ˆ An immediate. Identifying probable unknown interventions (e. It is known [7.1972.. + pos = 4) 98 . permanent effect (e. 2. a pollutant is at a different level and maintains that level).g. p. Hipel and McLeod [7. col = "red"..19 0. then relaxing to the original condition.1-1977. 19] is an excellent explanation of intervention analysis. Task 65 : Load this dataset and plot as a time series.51 0. "known intervention".47 0. ˆ An immediate effect. temperature) but may also be influenced by human activities.Guelph. single effect (e. ˆ An asymptotic effect. streamflow with controlled releases from dams).attr(*. ylab = "P concentration. col = "red".3.5 Intervention analysis Time series may arise completely or mostly from natural processes (e.8 . Speed River.65 0. • > require(Kendall) > data(GuelphP) > str(GuelphP) Time-Series [1:72] from 1972 to 1978: 0.g.. damming a river) or pulsed (e. ˆ Any of these may be delayed.

460 0.157 0.400 0.200 0.190 0.070 0.058 0. start = NULL. medians.140 0.1 <.4 q 0.091 0.300 [23] 0. medians.0 qq q qq qq q q q 1972 1973 1974 1975 Time 1976 1977 1978 Note the use of the abline function to add a line to the graph.079 99 .028 0. mg l−1 0.825 1.6 q q q q q q q q q q q q q q q q q q qqq q q qqq q q q q q q q q q q q q q 0.150 0.071 0. or variances.omit(as.079 0.0 q q known intervention P concentration.055 0.590 0.vector(window(GuelphP.116 0.2 q qq q qq q qq qq qq 0.action") [1] 6 19 25 attr(. Q61 : Describe the effect of the new P treatment. Task 66 : Split the series at February 1974 and compare the means.385 0."na.120 0.650 0.8 q q 0.omit(as.140 0.900 [12] 0.065 attr(.Speed River water quality q 1.108 0.vector and remove the NA’s with na. and there are missing values.240 0.050 0.220 0. or variances.104 0.495 1.106 0.047 0.330 0.140 0.056 0. specifying the v argument to specify a vertical line at the indicated date.510 0.130 0."class") [1] "omit" > (guelph.360 0.na.042 0.094 0.na.107 0.120 0.121 0.086 0.065 0.169 0.080 0.295 0.110 0. + end = 1974 + 1/12)))) [1] 0.150 0.120 0.190 0.058 0.270 0.000 0.vector(window(GuelphP.omit: > (guelph.470 0.180 0. + start = 1974 + 2/12)))) [1] [12] [23] [34] 0.066 0.350 0. • Since these statistics do not involve time series.065 0.120 0.2 <.365 0.079 0.100 0.110 0. we convert to ordinary vectors with as.097 0. end = NULL. Jump to A61 • How can we reliably quantify this difference? One obvious way is to split the series and compare the means.

"class") [1] "omit" > mean(guelph.[45] 0.1) [1] 0.1) [1] 0.365 > median(guelph. Return to Q62 • 100 .4430435 > mean(guelph.2) [1] 0.1159556 > median(guelph.action") [1] 15 attr(."na. and the month-to-month variability is much less.2) [1] 0. So we need to build a time-series model that includes the intervention.106 > sd(guelph.2) [1] 0. and especially standard deviation (variability).1) [1] 0.2839124 > sd(guelph. Return to Q61 • A62 : The later series has a much lower mean. because the observations are not independent – they are clearly serially and seasonally correlated. Jump to A62 • We would like to state that these are significant differences (not due to chance) but we can’t use a t -test.2 Answers A61 : The mean P concentration decreases dramatically. median.114 attr(. 5.07789123 Q62 : Describe the difference in the two sub-series.

5 34. and plot it.8 34. main = "Anatolia well 2") Anatolia well 2 Groundwater level (m) 5 1975 10 15 20 1980 1985 1990 Time 1995 2000 2005 Task 68 : Create a “multiple time series” object covering the common time period of the two wells.intersect(gw.union pads non-overlapping portions of one or more series with NA’s. if they are in a common object..ts(scan("anatolia_alibe. these produce an object of class mts “multiple time series”. 1:2] 34. + frequency = 12) > plot(gw. "dimnames")=List of 2 . with one column vector for each original series.4 34. In the present case both have the same period so there is no difference.intersect limits the multivariate series to their common times. convert to a time-series object.txt").7 34.ts function can plot several series together.2. Two (or more) time series can be bound together with the ts. whereas ts.6 Comparing two time series Task 67 : Read the second well’s time series in R. • > gw.ts. These differ in that ts.$ : NULL 101 . > gw2 <. .intersect and ts.2) > class(gw2) [1] "mts" "ts" > str(gw2) mts [1:360. • The plot.2 <. ylab = "Groundwater level (m)".attr(*.. gw.union functions..9 . start = 1975.

80 Max. :57.87 1975 34.type = "single".type = "multiple".$ : chr [1:2] "gw" "gw. main = "Anatolia wells 1 and 2". :29.attr(*.79 2004 55.55 Median :13.. : 5.2 1975 34. col = "red") > lines(lowess(gw.2 Min. + ylab = "Groundwater depth (m)") > lines(lowess(gw. "class")= chr [1:2] "mts" "ts" Task 69 : Plot the two wells’ time series on the same graph.45 13.36 13.74 Mean :13. groundwater level) to the same scale: > plot(gw2. Nov Dec gw gw.:34. f = 1/3).93 > summary(gw2) gw Min...:46.attr(*. plot. :22.2.63 Mean :41.2" . plot.:16.:11. main = "Anatolia.90 Median :41. • > plot(gw2.67 2004 54.58 3rd Qu. This has the effect of stretching or compressing the response (here.19 Max.34 1st Qu. "tsp")= num [1:3] 1975 2005 12 .55 15..83 15.68 gw.85 3rd Qu.90 1st Qu. f = 1/3). two wells") 102 . col = "red") Anatolia wells 1 and 2 Groundwater depth (m) 10 1975 20 30 40 50 1980 1985 1990 Time 1995 2000 2005 Jan Feb .47 Another way to see the two series is each on their own panel.

e.max which by default is the integer nearest 10 · log 10N/m. gw.5833 0.Anatolia.ccf(gw. Note that the highest correlation between two series might not be at lag 0 (same time).0833 0.864 0. almost two years.710 0.0000 0.3333 -0.8333 -1. so the cross-correlation is between xt +k of the first series and yt of the second.1667 -1. two wells gw gw.824 0.1667 -0.874 0.693 0. by lag -1.802 0.841 0. So a positive lag has the first series ahead of the second.853 0.4167 0.2500 0. i.5833 0. • The ccf function has an argument lag.6667 -1.2500 -0. one series may lag ahead or behind the other (for example.4167 -0.831 0.5000 0. By convention the first series named is moved ahead of the second when computing.811 0.5000 -1.827 0.0000 -0.9167 -0.6667 0. here 23.3333 -1.2)) Autocorrelations of series 'X'.745 0.716 0.776 0.7500 103 .6667 -0.790 0.709 0.8333 -0. stream flow at different distances from the source).819 0.724 -1.1667 0. Task 70 : Compute and display the cross-correlation between the two wells.712 0.7500 -0.7500 -1.2500 0.759 0.4167 -1.5833 -1.3333 0.703 0.2 5 1975 10 15 20 30 40 50 1980 1985 1990 Time 1995 2000 2005 Q63 : Is there a systematic difference between the wells? Do they appear to have the same temporal pattern? Jump to A63 • The obvious question is how well are the two series correlated? Function ccf computes the cross-correlation of two univariate series at a series of lags. > (cc <.0833 -1.733 0.815 -0.681 0. This is computed in both directions.5000 -0.0833 0. a negative lag the second is ahead of the first.

here in years.0. well 2") gw & gw.709 0.67 -1.. > str(cc) List of 6 $ acf : $ type : $ n.6667 0. the same month and year? What is the correlation at one month positive and negative lag? Jump to A64 • Q65 : Why is this graph not symmetric about zero-lag? Jump to A65 • Task 71 : Find the highest correlation and its lag.used: $ lag : $ series: $ snames: .863 1.8 −1 0 Lag 1 Q64 : What is the correlation at zero lag. 1.693 0. num [1:45.863 1.873 1. The structure (using str) shows that the acf field of the object returned by ccf has the correlation. and the lag field has the lag. To get the lag in months.3333 0...881 1.2500 0.863 1.0 0. Anatolia well 1 vs. chr "correlation" int 360 num [1:45. multiply by the cycle length (i.879 0. • The max function finds the maximum in a vector.809 0.4 0.885 0.823 0.2 ACF 0.839 > plot(cc.max identifies its index.853 0.e. 1. 12).890 1. i.860 0. 1] 0.83 -1.856 1. and which.58 -1.703 0.8333 0.e..762 0.6 0.852 1.771 0.71 .9167 0.860 1.1667 0. main = "Cross-correlation.786 0.5 .855 1.75 -1.2 0.4167 0.7500 0.681 0.888 1. chr "X" chr "gw & gw.5000 0.0000 0. 1] -1.5833 0.attr(*.2" "class")= chr "acf" 104 .8333 0.0833 0.

Both have annual cycles but the trend towards deeper levels is much more pronounced in the first well. lagging it positively) is not the same as shifting it behind.864.8896892 > cc$acf[i]^2 [1] 0. Return to Q65 • A66 : The highest correlation is 0. Return to Q63 • A64 : At lag 0 the correlation is 0. Think for example of correlating January rainfall of one station with March in another (first station lagged +2) or correlating it with November (lag -2). how closely correlated are the two series? Jump to A67 • 6.which.853 Return to Q64 • A65 : Shifting one series ahead of the other (i.7915468 > cc$lag[i] [1] 0.> max(abs(cc$acf)) [1] 0. The timing of rapid extractions and recharges is different for the two wells. with the first well ahead by one month it is 0.max(abs(cc$acf))) [1] 27 > cc$acf[i] [1] 0.8896892 > (i <. The second well appears to have more rapid fluctuations.89 at lag 4.e. there is no reason to think these will have the same correlation.1 Answers A63 : The second well is much closer to the surface than the first.3333333 > 12 * cc$lag[i] [1] 4 Q66 : What is the highest correlation and its lag? Which series leads? Jump to A66 • Q67 : Looking at the graphs and the numeric results.874. for the first well behind by one month 0. the first well’s records are moved 105 .

792. Return to Q67 • 106 . the coefficient of variation (proportion of variation explained. R 2 ) is only 0. the highest correlation is not even 0.9. Return to Q66 • A67 : Although the two wells are in the same region with similar climate and land use.forward to match the second well’s records.

If there is a trend the estimation of and from deviation from trend.4]. Rainfall can also be estimated. a simple interpolation from nearby values will usually give satisfactory results. This assumes the year is not overall wetter or drier. the model can be used to predict at a gap. Estimatation from other cycles For a cyclic series (e. Estimation from the autocorrelation structure If the series is autocorrelated. or even to extend the series. a (multivariate) regression equation can be developed for the time series as a function of the other series.). or one weather station in a group).g. 107 . groundwater depths) and if the gap is not long (e. either global or (more commonly) some local function such as a local polynomial or spline. For example. these are both linear filters. another way to interpolate from nearby values is to fit a function to the series. dewpoint.g. The interpolation can be linear from the two nearest values. Interpolation If the series is expected to be continuous (no abrupt changes. Functional interpolation Again if the series is expected to be continuous and if the gap is not long. missing June rainfall for one year in a 30-year series could be estimated as the mean of the other 29 June rainfalls. A typical example is a missing daily weather observation (temperature.2). e. without any gaps. or as a weighted average of nearby values.g. We first list several approaches to gap filling and their areas of application. Several of these are then illustrated with examples. which may not be a reasonable assumption. Estimation from a model If the time series is modelled either by structural decomposition or ARIMA. etc. a gap can be estimated from the observations with which it should be correlated. one gauging station among a network. §19.7 Gap filling Many uses of time series require complete series. as revealed by the correlogram and autocorrelation analysis. This function can then be evaluated at the missing value’s time. An example is a periodic function fit to an annual cycle. In practice many series contain such gaps because of mechanical failure or observer error. an example is the daily rainfall records from Lake Tana (§2. annual) a missing value may be estimated by some function (typically the mean) of values at the same position in the cycle. a single missing well depth). Estimation from other series If the series with missing values is well-correlated to one or more other series (e. Gap filling is reviewed by [15. but its autocorrelation structure is usually much weaker.g.

we simulate gaps by removing known records from a series.90 34. 29.62 46. • 108 .89 Median 41. we can then assess the success of the method by comparing the reconstructed series to the known series.90 34. size = 5)) [1] 268 102 182 206 > sort(ix) [1] 12 102 182 206 268 12 Third.58 46.80 Max.68 NA's 5 Median 41. using the sample function.70 Mean 3rd Qu. initially the same as the known series: > gwg <. replacing them with the missing value constant NA: > summary(gwg) Min.89 > gwg[ix] <. 1st Qu. before and after deletion? Jump to A68 • Task 73 : Plot the simulated series. 57. 57. we use set..1 Simulating short gaps in a time series To illustrate gap filling.gw Second.4 34.. So your results match these. Recall. pick five separate positions to delete. this is in time series gw: > str(gw) • Time-Series [1:360] from 1975 to 2005: 34. set up the series with simulated gaps.5 34. 1st Qu.seed(44) > (ix <. 41.9 . 29.92 Max.NA > summary(gwg) Min. Task 72 : Remove five months at random from the first Anatolia well. showing the dates with missing observations.8 34. First.63 Mean 3rd Qu.7.7 34. > set.68 Q68 : What is different in the summary of the simulated time series.seed to set the random number generator to a known starting position. 41. delete the values at these positions.sample(length(gw).

action = "na.window = 25. + na. s. Clearly.omit or na.window = 85)) > geterrmessage() [1] "Error in na. I know that this will produce an error. so it is no longer periodic.stl <.window = 85.3. • We use the best decomposition from §3. and show the error message with the geterrmessage function: > try(gwg. .fail.window = 85.action. i. . a fact we know! The usual R approach to missing values is to specify a na.stl(gwg. t. stl correctly reports this.e. t. to avoid a fatal error I thus enclose the expression in a call to the try function. lwd = 1) grid() Groundwater depth. and residuals.window = 25. with gaps".exclude.stl(gwg. with a smoother trend and two-year window on the seasonal amplitude.exclude")) > geterrmessage() [1] Error in stl(gwg.> + > > > plot(gwg.default(as. t. col = "gray") abline(v = time(gw)[ix].stl <. ylab = "Groundwater depth (m)" sub = "Dates with missing values shown as red bars") abline(h = min(gw). s. main = "Groundwater depth.action = "na. 109 . the gaps have to be filled before analyzing the series. with gaps Groundwater depth (m) 30 1975 35 40 45 50 55 1980 1985 1990 1995 2000 2005 Time Dates with missing values shown as red bars Task 74 : Try to decompose the time series with gaps into a trend.ts(x)) : missing values in object\n" The error message tells us that the stl function failed because there are missing values .exclude") : series is not periodic or has less than two periods The missing observations have been taken out of the series. s. such as na.window = 25. . col = "red". na. in this case neither help: > try(gwg. seasonal component.

and the [] indexing operator. Task 75 : Predict the ground water depths at the dates with missing values.250 1983.917 $y [1] 43. 7.385 39.417 1990.7.86 33. > gw.1) that we know the positions in the original time-series object gw for which we deleted the values to make series gwg: > print(ix) [1] 268 102 182 206 > gw[ix] [1] 42. the approx function implements this.620 35. using linear interpolation. • Recall (§7.083 1992.linear) List of 2 $ x: num [1:5] 1997 1983 1990 1992 1976 $ y: num [1:5] 43.linear <.2 Gap filling by interpolation In the previous example we are missing single observations.1 Linear interpolation The simplest interpolation is as the average of neighbouring (non-missing) values. xout = time(gw)[ix]) > str(gw.350 Compare these to the known values: 110 .083 1975.6 35.083 1992.fill.525 34.fill.417 1990.approx(x = time(gwg).5 34.3 > print(gw.fill.665 41.linear) $x [1] 1997.89 41.083 1975. y = gwg.28 > gwg[ix] [1] NA NA NA NA NA 12 The dates of these are found with the time function on the original series. specifying series to be interpolated as the x and y values.46 35.86 39.2. using the positions of the missing observations as indices: > time(gw)[ix] [1] 1997.917 We now call the aspline function.7 41. The first attempt to fill the gaps is to interpolate from neighbouring values.4 39. and the missing times as the points to be interpolated (argument xoutapprox).250 1983.

col = "red".fill.linear$y.083 1992.86 39. col = "red".fill.linear$y [1] 43.225 The maximum difference is 0.66 m.2 Spline interpolation Jump to A69 • The linear interpolation only takes into account neighbouring values.665 41.fill.89 41. -0. sub = "reconstructed values: red. gw.gw. true values: green Q69 : How well did the linear interpolation fill the gaps? 7.665 -0. gw.fill.fill.917 > gw. -0.083 1975.> time(gw)[ix] [1] 1997. • > + > + > + > plot(gwg.350 > gw[ix] [1] 42.417 1990. One example is a spline.46 35. 111 .fill.linear$y.28 > summary((gw[ix] .86 33.620 35.linear$x. Task 76 : Plot the reconstructed points on the time series with missing values.525 34.250 1983. gw[ix].385 39. cex = 2) points(gw.linear$x. 1st Qu. pch = 20) points(time(gw)[ix].160 Mean 3rd Qu.linear$y)) Min. pch = 20) Gap−filled time series Groundwater depth (m) 45 50 55 q q q q q q 40 q q q 35 q q q q q 30 1975 1980 1985 1990 1995 2000 2005 Time reconstructed values: red.239 -0. a higher-order local curve fits to a neighbourhood. col = "darkgreen".2. 0.525 Median -0.070 Max. main = "Gap-filled time series". tr ylab = "Groundwater depth (m)") points(gw. along with the original series.

• > plot(gwg.37511 39.041830 Max. computed by three methods (linear.29984 Compare these to each other and the linear interpolator: > summary(gw.gw.aspline$y.spline$y) Min.05661 3rd Qu. (2) for a six-month window centred on March 1997. > require(akima) Task 77 : Predict the ground water depths at the dates with missing values. The base R stats has a spline function that is similar but less sophisticated.which is a smooth function that passes through known points and preserves several derivatives.linear$y) Min.302700 -0.46 35.spline <.021490 -0. gw.fill.22234 34. col = "red". and the missing times as the points to be interpolated (argument xoutaspline).spline(x = time(gwg). based on methods developed by Akima [1]. Median Mean -0.gw. specifying series to be interpolated as the x and y values.063630 3rd Qu.fill.linear$y) Min.69033 35. 0.fill.fill. -0.013770 > summary(gw.05172 -0.fill. xout = time(gw)[ix]))$y [1] 43. A nice implementation of 1D splines is the aspline function of the akima “ Linear or cubic spline interpolation for irregular gridded data” package.spline$y .aspline <. Median Mean -0. • We now call the aspline function.07033 Max. 1st Qu. we will compare the Akima and default splines.33328 39.28 > (gw. type = "l".gw.fill.63377 35.05016 -0.76171 41. using spline interpolation.007018 3rd Qu.86 39. 0.fill.17678 34. y = gwg.32851 > (gw.fill. Akima spline) along with the original series: (1) for the whole series. 1st Qu. 0.fill. 0.aspline$y .056550 Median Mean 0.89 41. usually two. + ylab = "Groundwater depth (m)") > points(gw.045570 > summary(gw. 1st Qu.fill.66710 41.34820 -0.86 33. 0.009886 -0.aspline(x = time(gwg). > gw[ix] [1] 42. 0.aspline$y .aspline$x. y = gwg.002099 Max.028670 -0.094610 -0. xout = time(gw)[ix]))$y [1] 43. main = "Gap-filled time series".09671 Task 78 : Plot the reconstructed points on the time series with missing values. default spline. 112 .

5. col = "darkgreen". pch = 20) text(1997. 35. gw. "default spline".fill. ylab = "Groundwater depth (m)") points(gw. cex = 2) points(time(gw)[ix]. 43.fill.+ > + > + > + > + > + > > > > > > cex = 2) points(gw.fill. type = "b". pch = 20) points(time(gw)[ix].5.fill. cex = 2) points(time(gw)[ix].fill.linear$x.fill. gw. col = "blue".fill.spline$x. col = "blue".linear$x.aspline$y.fill. gw[ix].fill. gw. pch = 20) points(gw.spline$y. col = "brown". 32.aspline$y. gw.spline$y.5). col = "brown". col = "brown". gw[ix].5. col = "dark green". pos = 2) Gap−filled time series Groundwater depth (m) 45 50 55 q q q q q q q q q 40 q q q 35 q q q q q q 30 linear default spline Akima spline true value 1975 1980 1985 1990 Time 1995 2000 2005 > + > + > + > + > + > + > + > > > plot(window(gwg. cex = 2) points(gw.spline$x.fill.fill.fill.spline$y. cex = 2) points(gw. cex = 2) points(gw. pch = 20) text(2000. pos = 2) text(2000.fill.linear$x. col = "brown". pch = 20) points(gw.fill. col = "darkgreen". gw. gw. start = 1997. main = "Gap-filled time series". gw. col = "brown".fill. gw[ix]. col = "darkgreen". cex = 2) points(gw. col = "blue". gw. pos = 2) 113 . pos = 2) text(2000. "linear".linear$y. "true value".linear$x. gw. 31. "linear".linear$y. 45). "Akima spline". pos = 2) text(2000. col = "darkgreen". col = "red". col = "red".fill. pch = 20) points(gw. end = 1997. pch = 20) points(gw.spline$x.aspline$x.aspline$x.aspline$x. col = "blue".aspline$y.fill.fill.fill. col = "brown". col = "red". gw[ix]. ylim = c(42. col = "red".fill. gw.fill.linear$y.spline$x.fill. gw. cex = 2) points(gw.linear$y.spline$y. pch = 20) points(time(gw)[ix]. 34. col = "blue".

7. pos = 2) > text(1997. no systematic problem but the result of carelessness or occasional problems with observations.4 1997.r)) [1] 5 > gwg. then fill in their values: > gwg.3.r <.0 q q q q q q q q q q 43.0 42.. First.5. "true value".na(gwg.fill. col = "blue".3 Simulating longer gaps in time series The linear and spline interpolators had little problem with single gaps in a smooth series.gw.8 34.r[ix] <. "default spline".7 34.1 1997. copy the series with gaps.5.0 q q q Groundwater depth (m) 44.3 1997.0 1997.r) • Time-Series [1:360] from 1975 to 2005: 34. pos = 2) > text(1997.5 1997.9 .0 linear default spline Akima spline true value 1997. What happens with longer gaps? 7.5.4 34.5 34.5.gwg > sum(is. 42.> text(1997. + pos = 2) Gap−filled time series 45. 42. > sum(is. 42.na(gwg.2.aspline$y > str(gwg.1 Non-systematic gaps These are like the short gaps. col = "dark green". "Akima spline"..2 Time Q70 : How different are the three interpolators? Jump to A70 • Task 79 : Make a full time series with the reconstructed values.r)) [1] 0 7. 114 . col = "red".

083 1997.917 1992.667 1986.167 1998.083 1998.417 1986.68 NA's 72 > plot(gwg.167 1987.89 Mean 3rd Qu.500 1978.667 2002.083 1975.90 35.833 23 102 165 274 338 27 33 37 41 42 51 52 56 63 109 113 120 127 135 140 142 144 149 168 173 177 182 183 190 206 208 211 278 284 285 286 295 305 309 312 313 359 > gwg[ix] <. • Again we use set.500 1976.250 1985.750 2001.750 1975. 57.917 1978.667 1979.333 1987.417 1981.917 2002. we also create a sorted version of the missing-observations vector.167 1982.833 1998. 41.083 1993.917 1987.sample(length(gw). ylab = "Groundwater depth (m)".333 1990.833 1979.500 2001.167 1990.000 1977.917 1988.333 1992.333 1981.667 2004.000 1986.250 1999.000 2003.750 1976.583 2001.seed(44) > (six <.750 1984.583 1988.167 1995.000 1980.250 1977. For convenience in plotting.88 Max.Task 80 : Remove one-fifth of the months at random from the first Anatolia well.500 2000.833 1975.17 Median 41.500 1984.750 1988.667 1992. main = "72 missing values") > abline(v = time(gw)[six].333 1988. size = length(gw)/5))) [1] 1 4 8 12 19 21 [17] 70 79 81 88 93 101 [33] 151 152 154 158 159 161 [49] 218 219 251 258 268 271 [65] 321 322 324 331 334 335 > time(gw)[six] [1] [8] [15] [22] [29] [36] [43] [50] [57] [64] [71] 1975.NA > summary(gwg) Min.333 1986.667 1993. . 29.750 2000. > gwg <.250 1997.167 1979.gw > set. col = "red". using the sort function.083 1990. 1st Qu.083 2000.667 1980.417 1998.750 1996.333 2002.seed so your results will be the same.sort(ix <.583 1978.500 1997.66 46.250 1982.667 1984.167 1983.917 1976.583 1989.667 2001.500 1987.500 1988.583 1983.750 1989. lwd = 1) > grid() 115 .

Median Mean 3rd Qu.fill. 0.linear$x.aspline$y . col = "darkgreen".84030 Max. pch = 20) 116 . 1st Qu. for approx we must specify the rule to be used at the end of the series (since in this case the first observation was missing).fill.linear$y) Min. xout = time(gw)[six]. + rule = 2) > gw.fill.031430 Max.gw[six]) Min. gw. col = "brown".linear$y. -5. y = gwg. col = "brown".linear <.fill.linear$y .77200 -0.aspline$y. type = "l".20900 3rd Qu.1508 > summary(gw.aspline$x.linear$y. 1st Qu. pch = 20) points(gw. gw[ix]. gw.fill.fill. to get a value we specify ruleapprox=2: > gw. gw. 0. cex = 2) points(gw.02997 -0.003634 -0.gw.approx(x = time(gwg).fill.72 missing values Groundwater depth (m) 30 1975 35 40 45 50 55 1980 1985 1990 Time 1995 2000 2005 Task 81 : Fill these with linear interpolation and Akima splines.7950 > summary(gw.06574 Max. cex = 2) points(time(gw)[ix]. gw[ix]. compare the results.1470 -0.625700 -0.1750 -0. gw. 0. cex = 2) points(gw.579800 plot(gwg.aspline$y.gw[six]) Min.fill.077060 -0. col = "red". the default ruleapprox=1 returns NA. • Again we use approx and aspline. Median Mean -5.fill. 0.aspline$y . y = gwg.011060 > + > + > + > + > + > > 3rd Qu. main = "Gap-filled time series".fill. 1st Qu.aspline(x = time(gwg).fill. pch = 20) points(time(gw)[ix].fill. 0. ylab = "Groundwater depth (m)") points(gw.linear$x. col = "red".aspline <.1980 0.fill. col = "darkgreen".fill.0050 -0.14990 -0. Median Mean -0. xout = time(gw)[six]) > summary(gw.aspline$x.

linear$x.fill.fill. col = "brown".linear$y.fill.numeric(gw). "Akima spline". col = "brown".aspline$x. pos = 2) text(1989. col = "red".aspline$y. pos = 2) Gap−filled time series q q q q q q q q q q q qq q q q q q q q q q q q q q q q 55 q q q Groundwater depth (m) q q q q qq q q q q q qqq q q qq q q q qq q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q qq q q q q qq q qq q q q q q q q q q q q q qq q q q q q q q qq qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q 35 40 45 50 30 linear Akima spline true value 1975 1980 1985 1990 Time 1995 2000 2005 The period from mid-1987 to mid-1988 has. pch = 20) text(1989. type = "l". so we plot these in detail: > + + > + > > + > + > + > + > > > plot(window(gwg. 43. 31. "true value".linear$x. pch = 20) points(gw.fill. cex = 2) points(gw. col = "red". lty = 1) points(gw. col = "red". type = "b". pos = 2) 117 . gw. gw. start = 1987. "linear". lty = 1) points(gw. pos = 2) text(1989. by chance.> text(2000.fill. gw. type = "b") points(time(gw)[ix]. col = "brown". 32. sub = "True values in dark green". pch = 20.aspline$x. pos = 2) > text(2000. pos = 2) > text(2000. col = "dark green". ylab = "Groundwater depth (m ylim = c(39. col = "dark green".fill. cex = 2. col = "red".fill. 47)) points(x = time(gw). col = "darkgreen". cex = 2. "true value". gw[ix].5. main = "Gap-filled time series". col = "darkgreen".5. 46.aspline$y. type = "b". col = "brown". lty = 2. 44. 34.linear$y.fill. y = as. many missing values. "Akima spline". "linear". gw. end = 1989).

Gap−filled time series

q
q

q
q

46

q

linear

q

q

Groundwater depth (m)

44

q
q q

q
q

q q
q q

Akima spline

q

42

q q q
q q q q

true value

q q
q q q

q

q q
q q q

q

q

q q
q q q q q

q
q q

40

q q q q

q q
q q q q

q
q q

q q
q q q

q

q q

1987.0

1987.5

1988.0 Time True values in dark green

1988.5

1989.0

Q71 : How well did the interpolators fill the longer gaps? Jump to A71 • 7.3.2 Systematic gaps These are when observations are not made for some block of time, perhaps because of a broken instrument or budget problems. The most interesting case is when an entire cycle is missing.

Task 82 : Simulate a missing year in the groundwater depth records.

We know the years of the record, extracted from the full date with the floor function:
> unique(floor(time(gw))) [1] 1975 1976 1977 1978 1979 1980 1981 1982 1983 1984 1985 1986 1987 [14] 1988 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 [27] 2001 2002 2003 2004

Select a “typical” year, 1997; remove it; to make the display easier to use just consider the series since 1990, using the window function. We use the which function to find the array indices for 1997.
> gww <- window(gw, start = 1990) > gwg <- gww > (six <- sort(which(floor(time(gwg)) == 1997))) [1] 85 86 87 88 89 90 91 92 93 94 95 96 > gwg[six] <- NA > summary(gwg)

118

Min. 1st Qu. 39.07 43.92

Median 47.21

Mean 3rd Qu. 47.78 51.71

Max. 57.68

NA's 12

> plot(gwg, main = "Missing year 1997", ylab = "Groundwater depth (m)")

Task 83 : Interpolate the missing values with linear interpolation and Akima splines. •
> gw.fill.linear <- approx(x = time(gwg), y = gwg, xout = time(gww)[six], + rule = 2) > gw.fill.aspline <- aspline(x = time(gwg), y = gwg, xout = time(gww)[six]) > summary(gw.fill.linear$y - gww[six]) Min. 1st Qu. -2.0560 -0.9367 Median 0.6662 Mean 3rd Qu. 0.3758 1.6390 Max. 2.7060

> summary(gw.fill.aspline$y - gww[six]) Min. 1st Qu. -1.5410 -0.4932 Median 0.8016 Mean 3rd Qu. 0.5986 1.5850 Max. 2.7190

> summary(gw.fill.aspline$y - gw.fill.linear$y) Min. 1st Qu. -0.114700 -0.008196 > + + > + > + > + > + > + > > > > Median 0.261100 Mean 0.222800 3rd Qu. 0.443500 Max. 0.515500

plot(window(gwg, start = 1996, end = 1999), main = "Gap-filled time series", type = "l", ylab = "Groundwater depth (m)", ylim = c(43, 49)) points(gw.fill.aspline$x, gw.fill.aspline$y, col = "red", cex = 2, lty = 1, type = "b") points(gw.fill.aspline$x, gw.fill.aspline$y, col = "red", pch = 20) points(gw.fill.linear$x, gw.fill.linear$y, col = "brown", cex = 2, lty = 1, type = "b") points(gw.fill.linear$x, gw.fill.linear$y, col = "brown", pch = 20) points(time(gww)[six], gww[six], col = "darkgreen", cex = 2, lty = 1, type = "b") points(time(gww)[six], gww[six], col = "darkgreen", pch = 20) text(1999, 45, "linear", col = "brown", pos = 2) text(1999, 44, "Akima spline", col = "red", pos = 2) text(1999, 43, "true value", col = "dark green", pos = 2)

119

Gap−filled time series
49 48

Groundwater depth (m)

47

q
q

q
q q q q q

q
q q q q

46

44

q qq qqqqqq qq qqq qqq q qqq q qq q q q q
q q q q q q q q q q q q q q q q q q q q q

45

linear

Akima spline

q
q

q
q

43

q
q

true value 1997.5 Time 1998.0 1998.5 1999.0

1996.0

1996.5

1997.0

Q72 : Are these satisfactory interpolators? Another method is clearly called for. 7.4 Estimation from other series

Jump to A72 •

For irregular series like rainfall or stream levels, where there are rapid changes in short times, and often periods with zeroes (rainfall) or low values (baseflow of streams), gap-filling with interpolation will not work. One simple method is to estimate from correlated time series, that do have some observations in common, and have observations at the same times as the missing values. This is complicated by the fact that correlations over the whole series may not be consistent. For example, correlations of rainfall records in a dry season may be very good (everything is low or zero most days) but poor in a wet season (even if it rains on the same days, the amounts may be considerably different). To examine this method, we use two additional rainfall stations in the Lake Tana basin, following the procedures from §2.2.

Task 84 : Read the CSV files for stations WeteAbay and Dangila into R objects and examine their structures. •
> tana.2 <- read.csv("Tana_Daily_WeteAbay.csv", skip = 1, + header = T, colClasses = c(rep("integer", 2), rep("character", + 12)), blank.lines.skip = T, na.strings = c("N.A", + "NA", " ")) > str(tana.2)

120

'data.frame': $ Year: int $ Date: int $ Jan : chr $ Feb : chr $ Mar : chr $ Apr : chr $ May : chr $ Jun : chr $ Jul : chr $ Aug : chr $ Sep : chr $ Oct : chr $ Nov : chr $ Dec : chr

217 obs. of 14 variables: 2000 NA NA NA NA NA NA NA NA NA ... 1 2 3 4 5 6 7 8 9 10 ... "0" "0" "0" "0" ... "0" "0" "0" "0" ... "0" "0" "0" "0" ... "0" "0" "0" "12.9" ... "0" "0" "0" "12.6" ... "0" "12.6" "10.4" "10.3" ... "9" "0.3" "8.4" "46.6" ... "8.4" "4.4" "6.6" "6.4" ... "14" "0" "10.9" "1.4" ... "10.1" "27.8" "0" "20.2" ... "0" "0" "0" "0" ... "0" "0" "0" "0" ...

> sum(is.na(tana.2[, 3:14])) [1] 0 > tana.3 <- read.csv("Tana_Daily_Dangila.csv", skip = 1, + header = T, colClasses = c(rep("integer", 2), rep("character", + 12)), blank.lines.skip = T, na.strings = c("N.A", + "NA", " ")) > str(tana.3)

'data.frame': $ Year: int $ Date: int $ Jan : chr $ Feb : chr $ Mar : chr $ Apr : chr $ May : chr $ Jun : chr $ Jul : chr $ Aug : chr $ Sep : chr $ Oct : chr $ Nov : chr $ Dec : chr

620 obs. of 14 variables: 1987 NA NA NA NA NA NA NA NA NA ... 1 2 3 4 5 6 7 8 9 10 ... "0" "0" "0" "0" ... "0" "0" "0" "0" ... NA NA NA NA ... NA NA NA NA ... NA NA NA NA ... NA NA NA "0" ... "3.4" "7.8" "8.6" "13.3" ... "44.6" "22.3" "3.6" "22" ... "5.4" "1" "37.9" "2.2" ... "4.8" "2.2" "8.3" "4.5" ... "0" "0" "0" "0" ... "0" "0" "0" "0" ...

> sum(is.na(tana.3[, 3:14])) [1] 383

Task 85 : Set the trace values and any measurements below 0.1 to zero. •
> > + + > + require(car) for (i in 3:14) tana.2[, i] } for (i in 3:14) tana.3[, i] { <- recode(tana.2[, i], "c('TR','tr','0.01')='0'") { <- recode(tana.3[, i], "c('TR','tr','0.01')='0'")

121

2[.rm = TRUE) [1] 0 Task 86 : Organize the daily values as one long vector of values.2.2[tana. 3:14]) == "0.1). 28.row + month.3[. 30.row:(yr.3[tana. 31.col]) } } str(tana.rm = TRUE) [1] 0 > sum(c(tana. 30.2.ppt <. 31) tana.01".first. 122 .3[.+ } > sum(c(tana. na. tana. 3:14].col] .3$Year)) [1] 1987 1988 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 [14] 2000 2001 2002 2003 2004 2005 2006 > tana.c(0. 31. tana. na. tana. 31. tana.2[. 3:14]) == "tr".2.first. 3:14]..ppt <.3[.c(tana. 31.NULL for (yr. "FEB"] NULL > + > > + + + + + + > month. 3:14].first.2[yr. 31.days <.2$DATE == 29.ppt) chr [1:2555] "0" "0" "0" "0" "0" "0" "0" "0" . month. 3:14]) == "0". length = (2006 2000 + 1))) { for (month. tana. by = 32. as required for time series analysis.3$DATE == 29. "FEB"] NULL > tana..rm = TRUE) [1] 0 > sum(c(tana.2[.col in 3:14) { tana.days[month. na. 3:14].rm = TRUE) [1] 0 > sum(c(tana.2$Year)) [1] 2000 2001 2002 2003 2004 2005 2006 > sort(unique(tana.3[. 30. • Which years do we have? > sort(unique(tana$YEAR)) [1] 1981 1982 1983 1984 1985 1986 1987 1988 1989 1990 1991 1992 1993 [14] 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 > sort(unique(tana. 31.row in seq(from = 1. 0.ppt. na. 3:14]) == "TR". 30.2.2[.

.2. by = 32. the frequency argument specifies a cycle of 365 days and the start argument specifies the beginning of each series: > tana.na(tana.first.ppt <.3. > length(tana.ppt <. Task 88 : Plot the two time series. frequency = 365) > str(tana..ppt.ppt. sub = "Missing dates with red bars") abline(h = 60.col in 3:14) { tana.row.2.c(tana.row + month.. yr. "").first.> length(tana. ylab = "mm".ppt).ppt) Time-Series [1:7300] from 1987 to 2007: 0 0 0 0 . • Again. y = 60. and the original station for comparison • > + > > + > plot(tana. recycle = T).3.3.ppt)/365 [1] 7 > > + + + + + + > tana.NULL for (yr.ppt <. > tana.col] .3. col = "red") grid() 123 . month.ppt.3.col]) } } str(tana. the ts function is used to convert the series. "l". frequency = 365) > str(tana.3[yr..row:(yr.ts(tana.2...days[month.row in seq(from = 1.ppt) chr [1:7300] "0" "0" "0" "0" "0" "0" "0" "0" .ppt).first.ppt)/365 [1] 20 > rm(month. main = "Lake Tana rainfall. col = "gray") points(xy.ts(tana.first.3. Station 1".ppt) Time-Series [1:2555] from 2000 to 2007: 0 0 0 0 .ppt. length = (2006 1987 + 1))) { for (month.ppt <. tana.days.3. start = 1987.2.coords(x = time(tana. start = 2000.3. pch = ifelse(is.col) Check that this is an integral number of years: Task 87 : Convert this to a time series with the appropriate metadata.1). month.

coords(x = time(tana. "l".2. "l".ppt. ""). Station 2 60 ll l ll l ll l ll ll l ll l ll mm 0 20 40 2000 2001 2002 2003 2004 2005 2006 2007 Time Missing dates with red bars > + > > + > plot(tana. Station 1 mm 60 100 l ll l l llllllll l l 0 1980 20 1985 1990 1995 Time Missing dates with red bars 2000 2005 > + > > + > plot(tana. col = "gray") points(xy. we use the is.na(tana. ylab = "mm". y = 60. recycle = T).ppt).na(tana. main = "Lake Tana rainfall.coords(x = time(tana. sub = "Missing dates with red bars") abline(h = 60. Station 3 60 80 llllllll l l ll ll l ll lllllllllllllllllllll l llllllllllllllllllllllll mm 0 20 40 1990 1995 Time Missing dates with red bars 2000 2005 Q73 : Do the three stations cover the same time period? Do they have the same missing dates? Does every date have at least one observation (from at least one of the stations)? Jump to A73 • To answer this.na logical function to determine whether an observation is a missing value. and the which function to identify these 124 . ylab = "mm". ""). recycle = T).ppt).3. y = 60.3.ppt). pch = ifelse(is. sub = "Missing dates with red bars") abline(h = 60.2. Station 2".3. main = "Lake Tana rainfall. pch = ifelse(is.2. col = "gray") points(xy. Station 3".Lake Tana rainfall.ppt. col = "red") grid() Lake Tana rainfall. col = "red") grid() Lake Tana rainfall.ppt).

315 2006.575 2006.miss.074 2006.ppt) > str(t3) 125 .236 2006.244 2006.2.2.miss.819 2006.ppt)[miss.077 2006.ppt))) [1] [14] [27] [40] [53] 2216 2279 2367 2430 2493 2217 2280 2368 2431 2494 2218 2306 2369 2432 2520 2219 2307 2370 2433 2521 2220 2308 2371 2459 2522 2221 2309 2397 2460 2523 2247 2310 2398 2461 2524 2248 2336 2399 2462 2550 2249 2337 2400 2463 2551 2275 2338 2401 2489 2552 2276 2339 2402 2490 2553 2277 2340 2428 2491 2554 2278 2341 2429 2492 2555 > (time.positions in the series.1.573 2006.2.na(tana.miss. time.miss.068 2006.3] miss.740 2006.na(tana.321 2006.658 2006.3.3.734 2006.488 2006.153 2006.3) Task 89 : Find the common period of the three time series.intersect(time.time(tana.2.time(tana.989 2006.233 2006.825 2006.822 2006.564 2006.567 2006.miss.1] The intersect set operator gives the sets where two time series share the same missing dates.2)) [1] 0 > length(miss.2]) [1] [8] [15] [22] [29] [36] [43] [50] [57] [64] > > > > 2006.miss. tana.1.405 2006. using the ts.910 2006. extracted by the time function.230 2006.238 2006.2 <.241 2006.ppt)) time.742 2006..400 2006.intersect(time.miss.649 2006.3)) [1] 96 > length(miss.ppt)[miss. + time. and plot them on one graph.318 2006. tana.745 2006.827 2006.408 2006.082 2006.3.660 2006. > (miss.time(tana.663 2006.which(is.159 2006.2.490 2006.ppt.12 <.miss. miss. • This is the same procedures as in §6.485 2006.23 <. i.1. time. time.which(is.992 miss.079 2006.intersect(time.403 2006. time.which(is.907 2006.na(tana.1 <.912 2006.ppt)) time.652 2006.ppt)[miss.intersect function to create a “multiple time series” object of class mts: > t3 <.156 2006. > length(miss.2.737 2006.miss.071 2006.miss.3)) [1] 64 > rm(miss.397 2006.493 2006.986 2006.e.3.411 2006.901 2006.570 2006.482 2006.816 2006.3 <.ts. time. miss.1 <.326 2006.997 2006.2 <.3 <.miss.904 2006.995 2006.ppt.984 2006.830 2006. We then use these indices to print the times.323 2006.intersect(tana.578 2006.13 <.miss.655 2006.1.

ppt 0 10 20 30 40 50 60 0 20 40 60 0 20 40 60 80 100 2000 2001 2002 2003 2004 2005 2006 2007 Time Task 90 : Compare the stations as 2-D scatterplots. Return to Q68 • A69 : Fairly well. However. for March 1997 it missed the high-water stand substantially.2. "tsp")= num [1:3] 2000 2007 365 . 1:3] "0" "0" "0" "0" "0" "0" "0" . main = "Lake Tana rainfall.chr [1:2555.ppt tana.. especially when the missing value was in a linear section of the graph (steady increase or decrease in depth). 126 .3. 3 stations 120 tana.ppt" . 3 stations") Lake Tana rainfall. "class")= chr [1:2] "mts" "ts" > class(t3) [1] "mts" "ts" > plot(t3.3.attr(*.ppt" "tana. Return to Q69 • A70 : The three interpolators are very similar for the gaps in linear sections of the graph. however for the extremum of March 1997 there are substantial differences.$ : NULL . The pairs method plots all columns of a dataframe against each other: 7. The median and mean change slightly because of the deleted values.$ : chr [1:3] "tana.. this is because that point happens to be a local extremum and not in a linear section of the curve.attr(*.attr(*.5 Answers • A68 : After deletion there are five NA (missing) values. .2.ppt" "tana. "dimnames")=List of 2 .ppt tana...

both spline interpolators are much closer (about 30 cm) to the true value. Return to Q70 • A71 : For gaps in the linear sections of the graph. with longer gaps around extrema. both interpolators performed reasonably well. The first and second stations have no missing dates in common. But the strong seasonal effect is completely missing. However. Return to Q71 • A72 : The interpolators reproduce the overall trend in groundwater depth during the missing year (slightly decreasing). but they start in 1981 (Bahir Dar). both underestimate the extremes. Return to Q73 • 127 . whereas the third station has some of the same missing dates as the other two stations. Return to Q72 • A73 : The stations all end in December 2006. Akima splines performed better than linear interpolation. the spline has a very attenuated seasonality as well. as in early 1988. 1987 (Dangila). and 2000 (Wete Abay).

1$ar). main = "Remainders 1989 -.3.gw. 1)) 128 . Task 91 : Simulate three realizations of fifteen years of groundwater level remainders with the AR(1) model from §4. after accounting for trend and seasonality.gw. For example. lty = 2) } plot(window(gw.2004". 1)) for (i in 1:3) { plot(arima.r. lty = 2) par(mfrow = c(1.gen = rnorm. the simulation uses random-number generation from various probability distributions. these are then used as inputs to decision-theoretical models.sim function also has an optional sd argument to specify the white noise. In this case we specify only an AR component. ylab = "modelled") abline(h = 0. The arima. allows direct evaluation of the probability of an extreme event such as prolonged drought. the coefficients are in field ar of that object. 1989. • Recall that the fitted AR(1) model is in object ar.r.1 AR models In §4. 1989 + 15). to see how well they reproduce the series.1. ARMA and ARIMA models.gw.r. The arima. according to a calibrated model. rand. This requires a model (§4) with a stochastic (random) component.sim function simulates AR.3. main = paste("Simulated AR(1) process". 8. n = 12 * 15. Simulation of climatic time series is a key element of synthetic weather generation [19]. and the red-noise (residual) variance in field var. simulating hundreds of years of rainfall.1. Simulations are used to generate many probable “states of nature”.1$var.r. i). sd = sqrt(ar. We can simulate the process with these models.sim(model = list(ar = ar.pred: > > + + + + + > + > > par(mfrow = c(4. and compare with the actual series. ylab = "actual") abline(h = 0. MA.1 we constructed AR(1) and AR(2) models of the groundwater levels.8 Simulation To simulate a time series is to generate a series with defined statistical characteristics.pred)). R is particularly strong in this area.

Simulated AR(1) process 1 3 modelled −3 −2 −1 0 1 2 0 50 Time 100 150 Simulated AR(1) process 2 modelled −2 −1 0 1 2 3 0 50 Time 100 150 Simulated AR(1) process 3 3 modelled −2 −1 0 1 2 0 50 Time 100 150 Remainders 1989 −− 2004 actual −2 −1 0 1 2 3 1990 1995 Time 2000 Q74 : How well do the simulations reproduce the structure of the actual series? Jump to A74 • 129 .

main = "Remainders 1989 -. The fitted AR(2) model is in object ar. 1989.gen = rnorm.sim(model = list(ar = ar. 1989 + 15). n = 12 * 15. i).pred)). ylab = "actual") abline(h = 0.2004". > > + + + + + > + > > • par(mfrow = c(4.r$ar).r$var.r. lty = 2) } plot(window(gw.Task 92 : Repeat for the fitted AR(2) model.gw. ylab = "modelled") abline(h = 0.gw.gw. sd = sqrt(ar. 1)) 130 .r. main = paste("Simulated AR(2) process". 1)) for (i in 1:3) { plot(arima. rand. lty = 2) par(mfrow = c(1.

Simulated AR(2) process 1 modelled −3 −2 −1 0 1 2 0 50 Time 100 150 Simulated AR(2) process 2 3 modelled −3 −2 −1 0 1 2 0 50 Time 100 150 Simulated AR(2) process 3 modelled −2 −1 0 1 2 0 50 Time 100 150 Remainders 1989 −− 2004 actual −2 −1 0 1 2 3 1990 1995 Time 2000 Q75 : How well do the AR(2) simulations reproduce the structure of the actual series? Jump to A75 • 131 .

Q75 • Return to 132 .8.2 Answers A74 : The AR(1) simulations seem somewhat noisier than that actual series. The scale seems correct. Return to Q74 • A75 : The AR(2) simulations seem to match the actual series better.

URL http: //www.1007/978-0-387-88698-5.at/~leisch/Sweave. 1996. TU Wein. Journal of Computational and Graphical Statistics. Time series analysis: forecasting. 1 A [10] F Leisch. 16 [7] Keith W. Techniques of trend analysis for monthly water quality data. 1. 1982. Jenkins. A new method of interpolation and smooth curve fitting based on local procedures. Englewood Cliffs. 2 [12] Andrew V. 1 [13] R Development Core Team.ci. 5(3):299–314. 72 [9] R Ihaka and R Gentleman. Journal of the ACM. John Wiley & Sons. International Institute for Geo-information Science & Earth Observation (ITC). 1 [6] John Fox. Inc. R: A language and environment for statistical computing. New York. 71. URL http:// www.References [1] Hiroshi Akima. Austria. James M Slack. 2002. 1989. 1 133 . Diggle. Vienna. Oxford. 1993. 3. 2002. Time series: a biostatistical introduction. 3rd edition. R Foundation for Statistical Computing. Introduction to the R Project for Statistical Computing for use at ITC. 71 [3] George E. ISBN 9780444892706. New York. Alternative methods of regression. Metcalfe and Paul S. Ian McLeod. 1 [5] Peter J. Prentice-Hall.P. An R and S-Plus Companion to Applied Regression. 3rd edition.ca/faculty/aim/1994Book/. Sage Publications. 2006. Number 45 in Developments in Water Science. R News. DOI: 10. and control.stats.org. ISBN 3-900051-070. and Gregory C.R-project. 1 [14] D G Rossiter. R: A language for data analysis and graphics.ac. 98 [8] Robert M. NJ. Sweave User’s Manual. Use R! Springer. 2. 17(4):589–602. part I: Mixing R and L TEX. 112 [2] David Birkes and Yadolah Dodge.nl/personal/rossiter/teach/R/RIntro_ITC. Introductory Time Series with R. John Wiley & Sons.R-project. 2009. and Richard A Smith.itc. Reinsel.org/doc/Rnews/. 1994..uwo. USA. 2009.pdf. 18(1):107–121.6 edition. 1994. Statistics and data analysis in geology. 2 [11] F Leisch. Hirsch.tuwien. Box.1 edition. 71. 1970. Enschede (NL). Gwilym M. 75 [4] JC Davis. Thousand Oaks. Hipel and A. Sweave. Elsevier. 2004. URL http://www. Oxford Statistical Science Series. 1. URL http://CRAN. CA. URL http://www. December 2002.P. Oxford Science Publications. Vienna (A). Time series modelling of water resources and environmental systems. Water Resources Research. 5. Cowpertwait. 2(3):28–31.

McGraw-Hill. 23(3): 329–357. 1993. 12 [18] WN Venables and BD Ripley. Springer. 1. Analysis and modeling of hydrologic time series. The weather generation game: a review of stochastic weather models. 2002. R Data Import/Export. Academic Press. Handbook of hydrology. 1 [19] D S Wilks and R L Wilby. Modern applied statistics with S. 1. Time series analysis and its applications : with R examples. version 2. 2006. fourth edition. New York. 2009. New York. 1 [17] R Development Core Team. Salas. Progress in Physical Geography.1–19. Maidment.r-project. New York. In David R. The R Foundation for Statistical Computing. 78. International Geophysics Series 59. New York. Statistical methods in the atmospheric sciences. Stoffer. Wilks. 107 [16] Robert H. Vienna. Shumway and David S.72.[15] Jose D.9. 128 [20] Daniel S. 81 134 .0 (2009-04-17) edition. editor. pages 19.org/doc/manuals/R-data. SpringerVerlag.pdf. 1995. URL http://cran. 1999. 2nd edition.

55. 103. 31. 125 NA constant. 53 spline. 98 lag. 124 Kendall package. 71 plot. 4 quantile. 8 end.ts. 116 attributes. 112 anova. 115 sort. 67. 12 filter. 15. 18. 18 predict. 31 mts class. 116 s. 15 start. 36. 125 is. 108 na. 35. 5. 48 ~ operator.Arima. 99 acf. 36 MannKendall (package:Kendall). 32 fitted. 112 stack. 3 SeasonalMannKendall (package:Kendall). 80. 104 mean. 39 plot. 50. 94 predict. 77. 109 na. 108. 39 plot. 44. 36–38. 25.table. 41.seed. 109 na.sim. 4. 30 max. 72 match. 17 car package. 4 file.lm. 28 by. 101 points.plot. 4 print. 71–73. 112. 46.exclude. 83 arima. 84 floor. 78. 81. 99 aspline (package:akima). 62.window function argument. 28. 104 cycle. 118 frequency. 42 sample.na. 3 boxplot.contiguous. 110 $ operator. 82–84. 5. 11.stl. 31 median. 116 ar. 77 lowess. 7.ts. 98 intersect. 28 read. 94 predict. 110. 29. 5 diff. 16 ccf. 16 require. 2. 85 geterrmessage.numeric. 101. 28 [] operator.action. 48. 4 stats package. 90 aggregate.csv. 31 akima package.vector. 14 read. 73. 126 plot. 90 pairs. 11. 92 arima. 64 approx. 13 recode. 77 lag. 5. 44. 28 abline. 112 stl. 37. 73. 51. 20 na. 109 pacf. 89.omit. 28. 38. 6. 110. 25 deltat. 28. 72 set. 115 spectrum. 25 as. 61. 109 135 . 25 c.show. 76 lm. 18. 109 GuelphP dataset. 67 print. 99. 18. 13. 128 as.Index of R Concepts [[]] operator. 14.lm. 108 scan. 16 rule function argument. 31 min.

18. 3. 10. 5. 38. 118. 14. 66 sum.plot. 6. 104 which. 25. 110. 17. 118 xout function argument. 99 which. 15 v graphics argument.str. 25 t. 71 which.window function argument.intersect. 109 ts. 125 try.union. 3. 10 window. 125 ts. 10. 11. 124 which function argument. 101.min. 101 tsdiag.max. 4. 104 subset. 42 time. 123 ts. 84. 7. 112 136 . 92 unique. 40 ts. 110. 31 summary.

Sign up to vote on this title
UsefulNot useful

Master Your Semester with Scribd & The New York Times

Special offer for students: Only $4.99/month.

Master Your Semester with a Special Offer from Scribd & The New York Times

Cancel anytime.