Attribution Non-Commercial (BY-NC)

5 views

Attribution Non-Commercial (BY-NC)

- Managerial Decision Analysis Case study - Akron Zoological Park
- Craig Armstrong - Balcony Scene (Romeo & Juliet)
- Dan Zanger Trading Rules
- Design of Offshore Structures–
- Practice for Dynamic Analysis of Fixed Offshore Platform- Petronas Technical Standards
- Sma Weekly
- Survey of Stochsatic Models
- Institutional Trading , Volume
- Issues in Measuring and Modelling Poverty - Ravallion 1996
- Short Term Statisitics of Wave Observed by Buoy
- 37415747-Gold-Project
- wave crest distributions observations and second-oder theory
- Waves
- 39384754 Offshore Eng Examples
- IACS_REC_34
- FINMAN Mod14_3e_ 031512
- Ground Motion Prediction and Intensity Conversion Relations for the European Region_M.sorensen
- Rosen and Kit - Extreme Statistics Paper, 1981
- Terence C. Mills - The Econometric of Modelling of Financial Time Series
- Challenges of Forecasting Demand for e Commerce

You are on page 1of 11

1 Trends

We all know that the temperature varies in a cyclic manner over the year. In addition,

there are considerable variations from year to year. It may come as a surprise, but there

exists no complete theory for the year-to-year variations, or whether this winter will be

warmer than last year’s winter. In science, there is a belief that this variability will never

be fully explained (The chaos of the weather system).

This note considers yearly mean temperatures, as these are found at www.rimfrost.no,

and the focus will be on statistics rather than climate science.

Starting with the temperature record from Trondheim, the yearly means (or averages) are

shown in Figure 1. First of all, we observe considerable gaps (missing data) in the series.

The graph also shows that the year-to-year variations appear to occur without any obvious

regularity. Finally, there seems to be some slow variation in the mean temperature, where,

in particular, the newest bunch of data appear to be somewhat higher than the previous

recordings. This slow variation is called a trend.

10

8

Temperature, oC

0

1800 1820 1840 1860 1880 1900 1920 1940 1960 1980 2000

Year

Figure 1: Time series of yearly mean temperatures in Trondheim. Data are recorded at diﬀerent

locations within the city and may therefore have some small systematic oﬀsets.

The temperature measurements are called a time series. For the yearly means, the time-

step is 1 year, and we write the measurements as {Xi }, where Xi is the measured value at

time ti .

In a trend analysis Xi is expressed as a sum of two parts,

Xi = Ti + ri , (1)

where Ti is the trend and the remainder, ri = Xi − Ti , is called the residual. We prefer

that the trend is slowly varying, whereas the residual should vary from point to point with

1

Yearly mean temperatures and trend curve, Blindern, Oslo

10

Temperature, oC 8

2

1800 1820 1840 1860 1880 1900 1920 1940 1960 1980 2000

Year

Trend

10

Temperature, oC

8

6

4

2

1800 1820 1840 1860 1880 1900 1920 1940 1960 1980 2000

Year

Residuals

4

Temperature, oC

-2

-4

1800 1820 1840 1860 1880 1900 1920 1940 1960 1980 2000

Year

Figure 2: Data, trend and residuals for yearly mean temperatures from Blindern, Oslo. The trend

curve is computed by means of the Hodrick-Prescott filter (described below).

there is no unique or best way of making this division. Without going too far into the

theory, let us say that the trend curve is acceptable if it looks reasonable. Clearly, such

a curve can never be scientifically ”true”. An example of a trend curve and the residuals

for the temperatures from Blindern, Oslo, is shown in Fig. 2. The trend curve is slowly

varying, and the residuals spread out evenly around the trend. This is exactly what we

appreciate for a good trend curve.

One can say that a good trend curve is what you would draw by hand. The statistical

tradition has been to fit trend curves that are straight lines or polynomials of degree 2 — 3.

Attempting to fit polynomials of higher order is seldom a good idea. The trend curve in

Fig. 2 is not a polynomial, but it is possible to make up nice trend curves using 3rd order

polynomials glued together (spline curves).

The trend curve in Fig. 2 is produced using a very simple principle. We are looking for the

trend T in the same points as we have the data. Since the trend curve should be centred

2

in the middle of the data, it is reasonable to require that

X

N

MSE = (Xi − Ti )2 (2)

i=1

is small, but not so small that the curve becomes too irregular. Obviously, MSE = 0

implies that Ti = Xi , which is useless.

It is simple to see that what is called the second diﬀerence,

measures how straight the trend curve is around ti . If Ti−1 , Ti , and Ti+1 lie on a straight

line, Di = 0. Thus, the quantity

X

N−1

DEV = Di2 (4)

i=2

measures how straight the full curve is: If DEV = 0, all points lie on a straight line. It is

reasonable trying to make both MSE and DEV small, but since these requirements are in

conflict, one instead minimizes the sum

while an increasing λ straightens the trend curve. In the limit when λ → ∞, the curve

approaches the mean square linear regression line. This is illustrated in Fig. 3. By varying

λ, we choose how straight the curve is, but it is not always easy to say what is the best.

These trend curves, which seem to cover what we need, are called Hodrick-Prescott curves,

and the algorithm the Hodrick-Prescott filter, named after the people who introduced the

method to the economists in the 1990s (E. C. Prescott got the Nobel Prize in Economics

for 2004 together with the Norwegian Finn Kydland). Nevertheless, the method is much

older, dating back at least to the 1920s. We shall not discuss how to choose λ, but rather

rely on the subjective impression of the result.

So far, we have disregarded that it is not quite straightforward to compute the trend curve

by minimizing HP in Eqn. 5. This amounts to solving a linear system of equations for

{Ti }. When the number of data points is large, up to 200 here, this requires a computer

with appropriate software. According to the Internet, there exist free add-ins available for

Microsoft ExcelT M (MatlabT M has been used here).

If one wants to keep things simple, the old-fashioned Moving Average (MA) over M points

is the first that comes to mind,

i+[M/2]

1 X

Ti = Xj . (6)

M

j=i−[M/2]

number so that Ti is an average centred at ti ). The obvious problem with this formula is

that the computations run oﬀ the ends, also called the end-eﬀect. One possibility to avoid

3

10 Blindern , Oslo, λ=100

Temperature, oC

8

6

4

2

1800 1820 1840 1860 1880 1900 1920 1940 1960 1980 2000

10 λ=1500

Temperature, oC

8

6

4

2

1800 1820 1840 1860 1880 1900 1920 1940 1960 1980 2000

10 λ=100000

Temperature, oC

8

6

4

2

1800 1820 1840 1860 1880 1900 1920 1940 1960 1980 2000

Year

Figure 3: Examples of trend curves for varying values of λ in the Hodrick-Prescott filter.

4

Yearly means and a Moving Average, 21 years, Blindern, Oslo

10

8

Temperature, oC

6

1800 1820 1840 1860 1880 1900 1920 1940 1960 1980 2000

Year

Figure 4: Data and moving average over 21 years for the data from Blindern. See the text how

the moving average is extended all the way to the ends.

this would be to reduce M near the ends, but that would make the trend curve unstable in

the end-zones compared to the mid-range. Another choice is to fit a linear regression line

to M points and use that for the [M/2] points closest to the ends. The result for M = 21

(years) is shown on Fig. 4. The trend curve is a straight line near the ends, and the curve

has more small wiggles than the Hodrick-Prescott curve. There is no universal best way

to deal with the end-eﬀect, and the use of a straight regression line at the ends will not be

suitable in all cases (e.g. when dealing with physically positive data, where the line may

give negative values).

The wiggles on the MA-curve are reduced by repeating the MA-operation on the current

MA-curve several times. After k repetitions (or iterations), where k ≥ 4, we obtain a

moving average with an approximately Gaussian weight function. Apart from a smoother

curve, k repetitions of an M-point moving average roughly amounts to one moving average

over √

Mk = k × M (7)

points. However, contrary to the single MA operation, repeated averaging puts more weight

on neighbouring points compared to points further away. Fig. 5 illustrates the iterative

MA for the temperature data from Paris.

Even if iterative MA (IMA) and MA are similar in the interior, they diﬀer slightly at the

ends. A similar eﬀect appears also to be present for the HP curve shown in Fig. 6.

When comparing Hodrick-Prescott and iterative MA, there should be correspondence be-

tween the parameters (M, k) for the IMA and the λ-parameter in Eqn. 5. A rough test,

based on artificial data, gave the following relation between Mk and λ:

√

(Mk = k × M, and k was equal to 4 in the study). For λ = 1400, used above, this formula

suggests Mk ≈ 24.

As a subjective conclusion, it therefore seems that a 12 year MA (11 or 13 years are equally

good) iterated 4 times is a reasonable choice for the temperature data considered here.

5

15

14

13

Temperature, C

o

12

11

10

9 Data

8 MA 29 pts.

IMA 15 pts, 4 it.

7

1750 1800 1850 1900 1950 2000

Year

Figure 5: Data from Paris. One moving average over 29 points (blue), and a moving average over

15 points, repeated 4 times (red).

15

14

13

Temperature, oC

12

11

10

9 Data

8 Hodrick-Prescott

IGM 11 pts, 4 it.

7

1750 1800 1850 1900 1950 2000

År

Figure 6: Similar to previous figure. Iterative moving average (MA over 11 points and 4 iterations)

(red), and the Hodrick-Prescott curve with λ = 1400 (blue).

6

2 Missing Data

When looking through temperature data at rimfrost.no, we often find that some data are

missing. Restoring a data series by filling in missing data in a reasonable way is therefore

a common problem. An occasional missing month is not serious,— an average of the mean

temperature in the neighboring months would suﬃce in most cases. If yearly means are

missing, it would still be possible to find a reasonable trend curve if we only miss a few

years in a long data series. Missing data could then be restored by filling in the trend value.

Obviously, one could also fill in a random residual, but this would only make the graph

look nicer without really adding information to the data.

If we return to the Blindern temperatures in Fig. 2 and compute the so-called auto-

correlation function of the residual, we obtain the result in Fig. 7.

Blindern, Oslo

1

0.8

Autocorrelation function

0.6

0.4

0.2

-0.2

-0.4

0 2 4 6 8 10 12 14 16 18 20

Time difference (years)

Figure 7: The auto-correlation function for the residuals in the yearly means from Blindern, Oslo.

The correlation drops nearly to 0 after only one year. This does not mean that the residuals

are completely independent, but it does imply that it is very hard to predict the residual

for one year from the neighboring years (at least for this and other locations with a similar

climate). The trend curves discussed above require complete data sets for their construc-

tion, and for a quick but primitive restoration, with only a few missing data, the simplest

method is to fill in extra data by linear interpolation. However, in this case, it is important

to flag restored data on a graph and visually inspect that the trend curve is reasonable.

For larger gaps, like the Trondheim series shown in Fig. 1 this would be too primitive. An

example of the simple approach for the Warszawa time series is shown in Fig. 8. Whereas

simple interpolation in this case apparently works well for single missing points and occa-

sional larger gaps, it introduces an artificial hump in the trend for the missing years 1940

— 1950. Nevertheless, as long as the trend curve is visually inspected and the restored data

points are clearly flagged, also this may be acceptable. Instead of a linear interpolation, it

is more reasonable to interpolate between averaged data on both sides of the gap.

Instead of trying to restore data by looking only at the series itself, it is often better to

include data from neighboring locations. In the following example we return to the data

from Trondheim shown in Fig. 1, which, for some reasons unknown to this author, have

large gaps. Since Norway has a relatively high density of measurement sites, it is reasonable

7

Yearly mean, Warszawa

14

Data

Inserted data

12

Hodrick-Prescott, λ = 1400

Temperature, degC

IMA, 11 pts., 4 it.

10

Year

Figure 8: Example showing avaiable data (blue), data inserted using linear interpolation (red),

and the resulting trend curves.

to use stations near Trondheim for filling in the missing data. Trondheim is located 30 km

from Værnes and about 60 km from Selbu, and both locations have overlapping data sets

with Trondheim.

The general methodology is as follows. First select a group of stations and data that cover

the gaps with good margin. Then extract a subset of the data where all stations have data.

This reduced set is called the calibration set and is used for establishing a relation between

the neighboring stations and the target station. Denoting the temperature at the target T0

and the neighbours T1 , T2 , · · · , Tn , the typical expression has the form

T0 = a0 + a1 T1 + · · · + an Tn , (9)

and is called a multivariate linear regression. The constants a0 are found by minimizing

the expression

1 X

C

J (a0 , a1 , · · · , an ) = (T0c − a0 − a1 T1c − · · · − an Tnc )2 (10)

C c=1

where the c runs over the calibration data set (The number of calibration data, C, should

exceed n). This involves solving the so-called Normal√ Equations and will not be discussed

here. When the a’s are determined, the size of J will be a measure of the precision by

which T0 may be obtained from T1 , T2 , · · · , Tn .

When carrying out this in practice, one should be restrictive when selecting stations to be

included. As a general rule, it is important to avoid nearly redundant stations and stations

with little/no influence on the result.

Fig. 9 shows the Selbu/Trondheim and the Værnes/Trondheim calibration data set com-

prising 47 years of data.

Apart from the systematic diﬀerence between the sites (the bias), the variability between

Selbu and Trondheim is slightly greater than between Værnes and Trondheim.

The idea is now to try to predict the Trondheim temperature from the Selbu and Værnes

temperatures, and the result came out as shown in Table 1. Order 0 simply means com-

pensating for the bias between the stations, Order 1 is linear regression, and Order 2 is

8

Calibration data Calibration data

8 8

7 7

Trondheim, oC

Trondheim, oC

6 6

5 5

4 4

3 3

2 2

2 3 4 5 6 7 8 2 3 4 5 6 7 8

o

Selbu, C Værnes, oC

Figure 9: Simultaneous yearly means for Selbu and Trondheim, and Værnes and Trondheim (47

common years).

Order Formula

TT = 0.48◦ C + TS (0.26◦ C)

0

TT = −0.38◦ C + TV (0.18◦ C)

TT = 0.79◦ C + 0.93 × TS (0.25◦ C)

1

TT = 0.00◦ C + 0.93 × TV (0.17◦ C)

2 TT = 0.07◦ C + 0.17 × TS + 0.78 × TV (0.17◦ C)

Table 1: Results of the regression analysis. TT , TS , and TV are the temperatures at Trondheim,

Selbu and Værnes, respectively. The RMS prediction error, quantifying the reliability of the

result, is given in the brackets behind each formula.

9

Trondheim, predicted temperature (oC)

8

2

2 3 4 5 6 7 8

Trondheim, actual temperature (oC)

Figure 10: Actual vs. predicted temperature in Trondheim based on the calibration data set.

multilinear regression. The prediction error for the Værnes-Trondheim linear regression

formula is slightly higher (0.173◦ C), than the prediction error for the multilinear regression

(0.168◦ C). However, the improvement by going to Order 2 is negligible, and if one should

select one of the relations in Table 1, the bias-compensating formula TT = −0.38◦ C+TV is

clearly the simplest.

Figure 10 shows the actual Trondheim temperature vs. the predicted temperature from

the multivariate regression. It could be mentioned that it is good practice to split the

calibration data set into two parts; the first one is used for deriving the calibration formula

and the second one for checking the result.

With 3 stations available, it is possible to predict one that is missing from the two others.

This is shown in Figure 11. In the present case, the prediction error is smaller than 0.2

degrees, and therefore insignificant compared to the year-to-year variations at each of the

locations.

3 Acknowledgement

Thanks to Dr. Stephen F. Barstow, Fugro Oceanor (FOAS), for useful comments and

corrections.

10

8 Selbu

Temperature oC

2

1920 1930 1940 1950 1960 1970 1980 1990 2000 2010

8 Værnes

Temperature oC

2

1920 1930 1940 1950 1960 1970 1980 1990 2000 2010

8 Trondheim

Temperature oC

2

1920 1930 1940 1950 1960 1970 1980 1990 2000 2010

Year

Figure 11: Restored yearly means in the time series from Trondheim, Værnes and Selbu. Only

points where both the other stations were available have been restored.

11

- Managerial Decision Analysis Case study - Akron Zoological ParkUploaded byNariman H. B.
- Craig Armstrong - Balcony Scene (Romeo & Juliet)Uploaded bymelanie78310
- Dan Zanger Trading RulesUploaded bytestpat22
- Design of Offshore Structures–Uploaded byInayath Hussain
- Practice for Dynamic Analysis of Fixed Offshore Platform- Petronas Technical StandardsUploaded byJaga Nath
- Sma WeeklyUploaded byTheodoros Maragakis
- Survey of Stochsatic ModelsUploaded byhkocan
- Institutional Trading , VolumeUploaded byAnand Shankaran
- Short Term Statisitics of Wave Observed by BuoyUploaded byEgwuatu Uchenna
- Issues in Measuring and Modelling Poverty - Ravallion 1996Uploaded bydharendra
- 37415747-Gold-ProjectUploaded byShubhendu Goyal
- wave crest distributions observations and second-oder theoryUploaded byHuidong Zhang
- WavesUploaded byIrvin Alberto
- 39384754 Offshore Eng ExamplesUploaded bylildee64
- IACS_REC_34Uploaded byGobinda Sinha
- FINMAN Mod14_3e_ 031512Uploaded byManuel Chase
- Ground Motion Prediction and Intensity Conversion Relations for the European Region_M.sorensenUploaded byAlex81SP
- Rosen and Kit - Extreme Statistics Paper, 1981Uploaded bydov_rosen
- Terence C. Mills - The Econometric of Modelling of Financial Time SeriesUploaded byBruno Turetto Rodrigues
- Challenges of Forecasting Demand for e CommerceUploaded byAkshay
- LogisticUploaded byFrancisco Valladares Conde
- 45-Weekly and Monthly Seasonal CyclesUploaded bylowtarhk
- Theories of Protest and the Revolutions of 1989Uploaded byAmritpal Bhagat
- Econometrics.pdfUploaded bybizjim20067960
- Short Term Trading - Getting Started in Momentum-based Swing TradingUploaded bydiglemar
- Ch 04 Forecasting Chapter 4Uploaded byashwin joseph
- Block fiUploaded bym_rameshpillai
- regressiiUploaded byAnoushka Koppineni
- ALL CHART minitab practiceUploaded byfenny

- aq - religion p 1 culminating activityUploaded byapi-273157235
- PNABM700.pdfUploaded byMilan Ilic
- singh-2015-ijca-906991.pdfUploaded byGill Johan
- ITS410 Report.docUploaded bySofiahMahmmud
- TFM_(Fco_Javier_Fernandez_Medina).pdfUploaded byDavid Israel
- eng_mod1.docxUploaded byTivar Victor
- civil engineering orientationUploaded bySofia Garcia
- PF Simulation Using MatlabUploaded byFakhruddin Arrazi
- Lesson 4 Crystal Structure FilesUploaded byGhina Imani
- EE lectUploaded byPinky Bhagwat
- Guidance and counsellingUploaded byjinijini1
- STM32 microcontroller debug toolbox.pdfUploaded byHồ Chí Nhân
- BiogeographyUploaded bymkprabhu
- _Little The analyst total response to his patient needs.docxUploaded byDaniela Andronache
- It Project Study Guide.Uploaded byAhmed Alzahrani
- Sartre - EthicsUploaded byИгор Радуловић
- ANA ELISA IFA.pdfUploaded bymeds1313
- assignment 1 - finishedUploaded byapi-321047083
- ENTHALPYUploaded bypcsj
- Configstore LogUploaded byGabriel Balint
- Coaxial Heli DynamicsUploaded byRohit Gupta
- Milan_Nikolić CURRICULUM VITAEUploaded byMilan Nikolic
- VPS Hosting India Cheap: Expectations Vs. RealityUploaded bycloudmanaged9741
- Ultrasound-induced albumin gelation method for the preparation of nanostructured hydroxyapatite.pdfUploaded byAnonymous AfOWGy
- advanced algebra syllabusUploaded byapi-355172826
- bad is stroger than goodUploaded byAlina Cosma
- Participatory Rural Appraisal by Bishnu BhandariUploaded byEulampius Frederick
- 9789400747708-c1Uploaded byelvergalargfa
- modelicaUploaded byresel
- Empirical analysis of attributes influening bank selection in UAE.pdfUploaded byMuaiad

## Much more than documents.

Discover everything Scribd has to offer, including books and audiobooks from major publishers.

Cancel anytime.