This action might not be possible to undo. Are you sure you want to continue?
of Illinois, Urbana-Champaign http://sal.agecon.uiuc.edu/ June 24, 2003 Introduction This is a brief introduction to the exploration and modeling of variograms using Yvan Pannatier’s Variowin 2.21 software package. This packages is freely available from http://www-sst.unil.ch/research/variowin/index.html . However, there is no manual available on the web. The “official” manual is the book by Pannatier (Springer Verlag, 1996) that contains a disk with an older version. The data used in this tutorial are the Baltimore house prices (baltprice.dat) and the Los Angeles ozone data (laozone.dat), both obtainable from the SAL sample data repository http://sal.agecon.uiuc.edu/stuff/data.html. The data sets come in the format required by Variowin. Program Basics Variowin consists of a collection of four programs (as .exe files) that need to be run separately. You will be using three of the four in this tutorial: Prevar2D (a utility to construct a distance matrix for all point pairs in the data set), Vario2D with PCF (exploring variograms) and Model (fitting theoretical variogram models). You can start each of the programs in the usual way, by clicking on the matching shortcuts (see Figure 1), or by running the executables (prevar2d.exe, vario2dp.exe and model.exe) in the directory where Variowin was installed.
Figure 1. Variowin programs shortcuts for Prevar2D, Vario2D and Model. The data input files for Variowin need to be in a specific format (Geo-EAS), common to many geostatistical software packages. Each data file starts with a header line containing a descriptive title. Next follows a line with the number of variables. The following set of lines contains the variable names, one per line. Next are the actual values, with a new line for each observation, and the values separated by tabs or spaces, but not by commas. The last line in the file should be a blank line. For example, in Figure 2, the first few lines of the file baltprice.dat are shown. This data set contains six variables: an ID (STATION), X and Y coordinates, house sales price (PRICE), residuals from a first order trend surface regression (r_p_1) and residuals from a second order trend surface regression (r_p_2).
Figure 2. Input data file in Geo-EAS format. Creating a Pair Comparison File (pcf) The pair comparison file (with a pcf file extension) contains the distances between observations in a binary format. It is used instead of the ascii input file (the .dat file) for all subsequent analyses in Variowin. Once you have created a baltprice.pcf file, there is no further need for the baltprice.dat file. Start the Prevar2D program by double clicking its short cut or by running the executable. This brings up the Prevar2D welcome window, as in Figure 3. Click OK to move on. Next, you see a File Open dialog to select the input data file (Figure 4). This is a little tricky since it is not in the usual windows file explorer format. However, click on the [..] item to move up in the directory tree and continue navigating until you are in the working directory, and select the baltpr~1.dat file, as in Figure 5. Note that Variowin still uses the DOS style file length limitations, so that files longer than 8 characters are truncated. Next, the coordinates of the points need to be specified. The Settings > XY Coordinates menu item (Figure 6) brings up a dialog to select the X and Y coordinates (Figure 7).
Figure 3. Prevar2D opening screen.
Figure 4. File open dialog in Prevar2D.
Figure 5. Opening the baltprice.dat input file in Prevar2D.
Figure 6. Specifying X, Y coordinates in Prevar2D. Once the X-Y coordinates are specified, the Run menu becomes enabled, as shown in Figure 8. Click on this to start the computation of the distance file. At the end of the process, the summary output appears, which simply lists the number of pairs for which the distances were computed, as shown in Figure 9. Practice Use the laozone.dat file to create a distance pcf file using X_Coord and Y_Coord as the coordinate variables (these are projected coordinates).
Figure 7. Selecting the variables for the X, Y coordinates in Prevar2D.
Figure 8. Run menu enabled in Prevar2D.
Figure 9. Summary output of Prevar2D.
Variogram Cloud Plot A variogram cloud plot is created in the Vario2D program of the Variowin suite. This program requires a pcf file as input, not a data file. Start the program by double clicking on its shortcut or by running the executable in its directory. A welcome screen appears, as in Figure 10. Clicking OK removes the welcome screen and activates a File Open dialog. Navigate the directories using the [..] button until you reach the working directory with the pcf file, as in Figure 11. Opening the pcf file activates the menu items.
Figure 10. Vario2D welcome screen.
Figure 11. Vario2D open pcf file. Simple mapping functionality is available in the Map! Item on the Data menu, as shown in Figure 12. Selecting this function creates a point map in an x-y coordinate system, as illustrated in Figure 13. You can change the look of the points in the map with the Settings … menu (Figure 14, default symbols for scatter plot). The map will later be linked to the variogram cloud plot. Selecting (by mouse click) any of the points reveals the associated values in a popup, as in Figure 15. Click on OK to remove the popup.
Figure 12. Vario2D Data menu with Map function.
Figure 13. Map in Vario2D.
Figure 14. Settings for scatterplots and variography.
Figure 15. Identify values associated with a point.
A variogram cloud plot is constructed by selecting Calculate > Variogram Cloud in the menu (Figure 16). This brings up a dialog to specify the variable to be analyzed, as well as some parameters, such as the maximum distance and direction parameters, shown in Figure 17. If you did not change the settings (i.e., kept them as in Figure 14), the default is “direct” variography for a single variable. The Maximum distance should be set to the “distance of reliability,” roughly ½ of the maximum distance for all pairs. Variowin does not provide you with this maximum distance in an obvious way, but you can obtain it indirectly. Select Calculate > Variogram Cloud and enter a large value in the text box for Maximum distance, such as 200. Variowin will bring up a Warning that the maximum distance must be between 0 and 127.96, as in Figure 18. The distance of reliability would therefore be 64. For now, you can leave the default of 70 as is. Also leave the “angular tolerance” to the default of 90. This option is useful for directional variogram cloud plots, but in this exercise you are only considering an isotropic variogram (no directional effects). Finally, you must select a variable from the list in the dialog. Choose PRICE and click on OK to generate the cloud plot, as in Figure 19.
Figure 16. Variogram cloud plot command.
Figure 17. Direct variogram cloud parameter settings.
Figure 18. Maximum distance warning. 7
Figure 19. Variogram cloud plot for Baltimore house price data. Practice Using the laozone.pcf file you created, make a map of the points. Use the identify feature to check on the values for some of the points. Create a variogram cloud plot for the maxday variable. Experiment with changing the maximum distance. Linking Map and Variogram Cloud Plot The points in the variogram cloud plot are linked to a pair of points in the location map. This provides you with a means to check outliers in the cloud plot. An outlier would be a point in the cloud plot that is much higher than the other points for that distance. This suggests that the locations in question are much more different (larger squared difference) than is the case for other pairs that are a similar distance apart. For example, consider the cloud plot in Figure 19 together with the map in Figure 13 (recreate these if they are not present on your desktop). In the cloud plot, click on the outlier for distance 16, as shown in Figure 20. A dialog will appear listing the pair that corresponds to this point, as well as the distance that separates them (h), the value of the variable (z) and the variogram value (squared difference, variogram). The two points are now also connected in the map by a red line, as shown in the bottom half of Figure 20. If you select the two records in the dialog and click on “Keep selected pairs on map and quit” the arrowed line will turn black and remain on the map. Next, select the outlier in the cloud plot at distance 30 and note how the two line segments in the map connect to a common point, record 53 (Figure 21). Now check the values in the dialog again (or click on the points in the map) to see what may cause this. Station 53 has a house sales price of 8, whereas its “neighbors” (for the given distance band) have sales prices of respectively 145 and 165. The squared differences between these prices and the value for Station 53 is much higher than for other pairs a similar distance apart, suggesting Station 53 might be an outlier (or a data recording error).
Figure 19. Linked variogram cloud plot and map. You can experiment some more with the Baltimore data, constructing a variogram cloud plot for the variables r_p_1 (residuals from a linear trend surface) and r_p_2 (residuals from a quadratic trend surface), and assessing whether the trend has removed the indication of an outlier. Also check other outlying points in the cloud plot and the locations with which they correspond. Practice Use the laozone.pcf file to assess the existence of outliers in the variogram cloud plot for the variables maxday (maximum July 96 daily ozone emission) and av8top (average of the 8 highest readings per day). Hint: focus on the pair #26 and #31.
Figure 21. Outlier in linked map and variogram cloud plot.
Variogram A variogram is also calculated in the Vario2dp program. If this program is still active, make sure you have the balprice.pcf file selected (if not, restart the program and load the file). The variogram is part of the Calculate menu (Figure 16). Select Calculate > Directional Variogram to start the process. In the dialog, select r_p_1 as the variable, and leave all the other settings to their default values, as in Figure 22. Click OK to calculate the variogram. A graph will appear that shows the estimates for each distance bin (lag) as well as how many pairs were used in the computation, as in Figure 23. There are 12 circles on this graph, corresponding to h = 0 (zero distance) and 11 distance bands. Note how the largest distance is 81 (a little over the distance of reliability). Also note how the 10
graph decreases at higher distances, which is not supposed to happen: as points are further apart, they are supposed to be less similar, hence the variogram should increase with distance up to a point and then become more or less flat. Start another calculation (Calculate > Directional variogram) and change the number of lags to 8. The new variogram is a little more acceptable and has a maximum distance of 60, as shown in Figure 24. However, the variogram still shows somewhat of an upward trend, suggesting that a spatial trend may still be present. Carry out a third calculation, now using the residuals from the second order trend surface. The new variogram (Figure 25) is almost flat beyond distance 15, suggesting that the range of spatial autocorrelation ends at that distance (points more than 15 distance units apart show no change in their variogram with increases in distance and thus are not spatially correlated).
Figure 22. Variogram dialog.
Figure 23. Variogram for first order trend surface residuals (lags = 11).
Figure 24. Variogram for first order trend surface residuals (lags = 8)
Figure 25. Variogram for second order trend surface residuals (lags = 8). You can experiment by changing some of the settings, such as the number of lags, or by using a different estimator, such as a Madogram (in the Settings dialog). Note that the correlogram visualized in Variowin is not the usual, but expressed as a difference. As a result, it does not go down with increased distance, but goes up. Finally, compute the variogram for the PRICE variable itself and try to explain why the exercise started with the trend surface residuals instead. Make sure you save one of the variograms as a “var” file. With a variogram window active, select File > Save as, and specify a file name (8 character limit). The new file will be saved in the working directory.
Practice Construct variograms for the maxday and av8top variables in the LA ozone data set. Assess the sensitivity of the graph to the choice of settings. Try to formulate some tentative conclusions about the range of spatial correlation. Fitting a Spherical Variogram Various theoretical variogram models are fit to the data in an empirical variogram with the Model program in Variowin. Start this program by double clicking on its short cut or by running the executable in the Variowin program directory. Make sure you saved a .var file at the end of the variogram computation. Otherwise, you first need to go back to Vario2pd, compute the variogram and save the result. Starting the Model program brings up the usual Welcome window, as in Figure 26. Click on OK to open the File Open dialog, as in Figure 27. The same dialog can also be obtained later from the menu as File > Open. Select the var file you saved in the previous Vario2pd session and click OK.
Figure 26. Model welcome screen.
Figure 27. File open dialog for var file.
The next dialog is used to specify the variogram data (Experimental Variogram) to which the fit will be applied. This is useful when several var files have been loaded. In our case, there is only one listed in the dialog. Select r_p_2 omnidirectional, as in Figure 28. Click on OK to bring up the Model user interface with menu items and two windows, shown in Figure 29. This dialog can also be generated by selecting the Model item in the main menu. The interface contains two main windows, the one on the left is to select the theoretical model and its parameters, the one on the right shows the fit of the model to the experimental variogram. Variowin allows one to fit additive structures to the variogram, but that will not be pursued here (experiment with this later by filling in values for the parameters for the 2nd and 3rd structure). For now, only a single model will be fit.
Figure 28. Experimental variogram dialog.
Figure 29. Model user interface.
The first model will be a spherical variogram model. Select this specification in the Model drop down list of the model dialog, as shown in Figure 30. Next, set Dir to 90, which is required for an isotropic variogram (no directional effects). Either type in the value of 90 or use the slider bar to move the value. Also, specify the range and sill as 15 and 320, respectively, as illustrated in Figure 30. Once you specify the parameter values, the model fit is calculated and shown on the top two lines of the dialog (smaller number is better). It is compared to the best fit found so far, so when the fit on the second line is better than the current fit, you are moving in the wrong direction. At the same time, a curve is drawn on the variogram graph, as in Figure 31.
Figure 30. Variogram model parameters.
Figure 31. Estimated spherical variogram (sill = 320, range = 15). 15
Variowin does not use a statistical method to estimate the parameters of the theoretical variogram, but instead relies on an interactive procedure. By changing the parameters in the model dialog and monitoring the change in fit, the user is supposed to converge to a “best fit” model. Once this is obtained, the model parameters can be saved in a file for use with kriging software, such as GSLIB. For example, in Figure 32, a value for the nugget was specified as well, and after some experimentation, the parameters were selected as nugget = 13.09, range = 13.4 and sill = 303.8, with the graph shown in Figure 33. The overall fit improved from 2.488x10-2 to 2.003x10-2.
Figure 32. Parameters for improved spherical model.
Figure 33. Plot for improved spherical model.
Practice Experiment with changing the parameters to improve the fit of the model. Also, use the laozone data set to find the “best” spherical variogram for that data set for one (or both) of the ozone variables. Interpret the result in terms of the range of the spatial correlation. Fitting other Variogram Models Variowin also fits other theoretical models to the experimental variogram. These can be found in the Model drop down list of the model dialog (Figure 34). The procedure is identical to that outlined for the spherical variogram: set the parameters interactively and move towards a better fit for the model. Experiment with an exponential variogram for the second order trend surface residuals. Compare the fit to that of the spherical variogram. Try other models as well if time permits. Practice Compare the results for the spherical variogram fit for the laozone variable(s) to that obtained with alternative models, such as an exponential model. Contrast the implications for the range of the spatial correlation.
Figure 34. Theoretical variogram models.
This action might not be possible to undo. Are you sure you want to continue?
We've moved you to where you read on your other device.
Get the full title to continue reading from where you left off, or restart the preview.