© All Rights Reserved

30 views

© All Rights Reserved

- The Use of Multiple Regressions in Determining Selling Prices of Homes
- BA H DSEC IiApplied Econometrics 5th Sem
- syllabus gipe
- outreg.pdf
- Business Analysis- Causal Models and Regression Analysis
- Simple Regression Model
- 2
- 69813
- Crime in India
- chapter4ZICA4.6
- Demand Estimation
- Spatial Econometric Modeling Using PROC SPATIALREG - Subconscious Musings
- build_predict@MultipleRegressionModels
- 12 Decapitation Terrorist Works Is
- Simple Linear Regression[1]
- 2 the Linear Regression Model
- Case1
- Migration and Innovation
- Investment, Overhang, And Tax Policy
- Proximal Sensing and Digital Terrain Models Applied to Digital Soil Mapping and Modeling of Brazilian - Silva 2016

You are on page 1of 12

Geographically Weighted

Regression

By Richard Yang

Report

The tutorial I picked was the linear and geographically weighted regression. In this

tutorial, I learned about ordinary least squares regression (OLS) and also geographically

weighted regression (GWR). The difference between the two is OLS is a global regression

method, while the other is a more local, spatial, regression method. This allows the relationship

that I am modeling to vary across the study area. The study area for this tutorial was the

Portland Metropolitan Area, and it was focusing on 911 calls. First, I did an OLS regression on

volume of 911 calls to see what variables contributes to the high volume. Is it caused by

population? Education? Income? I was going to find out. Using that information, I did a GWR

regression based upon those variables that were considered important to analyze where future

calls will come from.

The point of the tutorial was to see where the areas of 911 calls were coming from, and

where they will come from in the future. Based upon response stations located now, will they

be effective in the future when the volume of calls grows? First I looked at the hot spot

locations of all the calls now, and where the response stations are located. A file was also

included in the tutorial that had the hot spots already mapped out. Based upon the locations of

911 calls, a map was shown showing the cold spots (blue) to hot spots (red) of volumes of

calls (Figure 1). To continue with the regression analysis, I was asked the question, What

factors are the causes to have a high volume of calls in the hot spot areas? To find out I had

to run an OLS regression to find the factors causes such a high volume of calls. Instead of

using the individual calls as points, I used the file that was associated with calls that have been

aggregated to census tracts. This file is better to use, because this shape file has access to

more information (variables) that could help determine the causes of such a high volume of

calls in the hot spots. The first time, I only used population as the variable to try and explain

the high volume of calls in the hot spots. When running the OLS tool, a results table was spit

out to show many different figures. The most important figure to focus on though, is the RSquared figure. The R-Squared figure was at .393460 (Figure 2). To put it another way,

population was accounting for only 39% of the story of why there was high volume of calls in

the hot spots. If the figure was higher, say 90%, further analysis wouldnt need to be conducted

because population would be causing 90% of the high volume of calls. Since it was only at

39%, it means that other factors are also contributing to the high volume of calls.

To find what other factors attributed to the high volume of calls, I needed to create a

scatterplot matrix. Using the scatterplot matrix, the variables of population, jobs, low education,

and distance to urban centers. Using these 4 variables in the OLS tool. This time the Rsquared value was at .831080 or about 83% (Figure 3). Now the figure of 83% is saying that

these 4 variables is telling 83% of the story, a spatial autocorrelation tool needs to run to see if

the if the data shows a random spatial pattern or not. This step is important, because if there is

a pattern, which means that there is a bias in the data caused by one of the variables. By it

being random, less or no biases will be found caused by the variables. Now I have to check to

see if I have a properly specified model. To see, I have to go back to the OLS results table to

look at figures. First I have to see if coefficient is positive or negative. This is important

because a positive coefficient of population means that as population grows, 911 calls will also

grow. A negative coefficient means that as the populations grows, 911 calls will go down. Since

population has a positive coefficient, that is a good sign (Figure 5). Next I will have to look at

the VIF (variance inflation factor) of my data. This VIF is showing if the variables are showing

the same data. If the number is high (over 7.5, smaller is better), that means that there will be

bias. Since all the figures are around 1.1 1.7, I am good there (Figure 6). Next I had to check

if all the explanatory variables was statistically significant coefficients. By checking if there is

asterisk near certain values, it is showing that it is statistically significant (Figure 7). One figure

that cannot be statiscally significant is the Jarque- Bera test. This one cannot have an asterisk

on this one (Figure 8). Now I have to check the model performance by looking at the Rsquared value (between 0 and 1, closer to one is better) and the AIC (Akaikes Information

Criterion) value (lower the better). In the R- squared value was at .83 while the AIC value when

down to 680 compared to 788 (Figure 9 and Figure 2). Finally running the spatial

autocorrelation tool is the final step to see if the model is free from spatial clustering of over

and under predictions. With the data passing all of these checks, it is now known to have very

little biases and the variables that I picked accounts for a great portion of the data. Now I have

figured out what variables are important, I can apply those same variables to see what will

happen in the future.

To see where the most calls will come from in the future, a geographically weighted

regression (GWR) tool is used. The GWR tool is used to yield optimal results by minimizing

bias and maximizing model fit. By running this tool, I got an output file that shows AIC and Rsquared values. By comparing the OLS output file, I noticed that the R-squared value has gone

up by 3 percent (83% to 86%) and the AIC has gone down 6 points, (from 680 to 674) (Figure

10). Both of these are good sign. Finally using the output of this tool, I inserted into the GWR

Prediction model to see changes for the future. By using the results of the GWR, which was

good, this model can now show me the GWR of the number of calls in the future (Figure 11).

Figures

Figure 1. This is a map of the hot spot analysis of 911 calls in the Portland Metropolitan area. The

green plus signs represent a response station. The blue area represents a low volume of calls while the

red spots represents a high volume of calls.

Figure 2. This is the output file of the OLS results of just using the population as a variable. As the

circle points out, the R- squared number is only 39%. That means that population is accounting for only

39% of telling the story of the data.

Figure 3. This is the output file of the OLS results using the variables: population, jobs, low education,

and distance to urban centers. As the circle now shows, the R- squared number is now at 83%. This is

a much acceptable figure because those four variables are not accounting for 83% of the story of the

data.

Figure 4. This image is mapping the difference between using the variable of population compared to

population, jobs, education, and distance to urban center. The R-squared values are shown underneath

to show how well the model fits using the variables. The colors represent over prediction (red) and

under prediction (blue). These colors should be in a random pattern so that there isnt biases in the

data.

Figure 5. This is the output file of the OLS results using the variables: population, jobs, low education,

and distance to urban centers. The figures in the box is showing the coefficient of those four variables.

It is showing a positive value for population, meaning that as population goes up, 911 calls will also go

up, which is good. That means the model passed one check to see if it is valid.

Figure 6. This is the output file of the OLS results using the variables: population, jobs, low education,

and distance to urban centers. The figures in the box is showing the VIF (variance inflation factor). A

number below 7.5 is a good thing and all four variables fall well below that mark. This model passes the

second test.

Figure 7. This is the output file of the OLS results using the variables: population, jobs, low education,

and distance to urban centers. This images is checking for statically significant figures. By having an

asterisk near certain figures, it is a good thing. This passes the third test

Figure 8. One exception to the asterisk near figures is shown in this figure. The Jarque-Bera Statistics

should not have an asterisk near it. This passes the fourth test.

Figure 9. This is the output file of the OLS results using the variables: population, jobs, low education,

and distance to urban centers. This is the showing the R-squared value and the AIC value. Compared

to Figure 2, which only had population as a variable, the R-squared value has gone up to 83% from

39%. The AIC value has gone from 788 to 680. Higher R-squared value is good, and lower AIC is good.

This passes the fifth test.

Figure 10. This is the output file from the GWR tool. It shows that the AIC has gone down from 680 in

the OLS results to 674. It also shows the R-squared value from the OLS results went from 83% to 89%.

It is showing that using a geographically weighted regression is showing better results and the four

variables that I chose is matching the model well.

Figure 11. This image shows the output file from the GWR tool. The current prediction image is using

the model using current data. The second image is showing the prediction of future 911 calls will come

from based upon future census data.

APPLICATIONS

I cannot see right now how linear and geographically weighted regression can be

applied to my project for next quarter. I can see the applications for it for other studies though.

Using demographics as the overlying factor, many aspects of society can be analyzed. For

instance, one study can see what the incarceration rates are for a certain race, and try to find

the variables that causes those rates. It might be related to location, education, age, crime

rates or many other factors. I could do a study of the incarceration rates of African- Americans

in Louisiana. I can use this same tutorial, but based upon different variables to see if the OLS

shows how well the model works. Using a GWR analysis, I can show what census tracts has

the most area of where incarnated African- Americans live, and what factors might have

caused them to get incarcerated. I could also plot which census tracts will have an increase or

decrease of incarnated people, based upon future predictions of income, education, crime

rates, or other factors. Another application for regression analysis can be applied to

homelessness numbers, and see if race, education, age, or another factor shows why, where,

and where in the future homeless people can be found. Using gender, age, education, and

location, I could see how well the model works, and tweak it to get a better R-squared value

and a low AIC value. Then I could do a use a GWR analysis to show where the homeless

people can be found and where they can be found in the future based upon future predictions

of median income data, job growth, and population increase.

- The Use of Multiple Regressions in Determining Selling Prices of HomesUploaded byAlicia Kuzia
- BA H DSEC IiApplied Econometrics 5th SemUploaded bythanks naved
- syllabus gipeUploaded byaditya
- outreg.pdfUploaded byJoab Dan Valdivia Coria
- Business Analysis- Causal Models and Regression AnalysisUploaded byDr Rushen Singh
- Simple Regression ModelUploaded byLê Thái Sơn
- 2Uploaded byskumar
- 69813Uploaded byEdu Merino Arbieto
- Crime in IndiaUploaded byAkshat Agarwal
- chapter4ZICA4.6Uploaded byVainess S Zulu
- Demand EstimationUploaded byMuhammad Ajmal
- Spatial Econometric Modeling Using PROC SPATIALREG - Subconscious MusingsUploaded byRic Koba
- build_predict@MultipleRegressionModelsUploaded byVenkata Nelluri Pmp
- 12 Decapitation Terrorist Works IsUploaded byVeronica Burlacu
- Simple Linear Regression[1]Uploaded byRangothri Sreenivasa Subramanyam
- 2 the Linear Regression ModelUploaded byBhodza
- Case1Uploaded byritu43
- Migration and InnovationUploaded byevamcbrown
- Investment, Overhang, And Tax PolicyUploaded byvinibarcelos
- Proximal Sensing and Digital Terrain Models Applied to Digital Soil Mapping and Modeling of Brazilian - Silva 2016Uploaded byRaúl Poppiel
- Lesson4 NotesUploaded byJames Jacobs
- et_Ch3Uploaded byRyan Taga
- 6835_UsingStatawithFPSR-2Uploaded byVLad2385
- PREDICTION OF BUS TRAVEL TIME ON URBAN ROUTES WITHOUT DESIGNATED BUS STOPS IN MAKURDI TOWN, BENUE STATE, NIGERIAUploaded byAZOJETE UNIMAID
- Polyprep New Case Study.docxUploaded bysumit singh
- 5301-EMBA-Chap-13-24-sp2012.pptUploaded byJudy Anne Salucop
- RegressionUploaded byRamy Nafea
- Utilising of Linear and Non-linear Prediction Tools for Evaluation of Penetration Rate of Tunnel Boring Machine in Hard Rock ConditionUploaded byIonut Patras
- Group9 Section DUploaded byAkshay Bhogra
- analysis on financial intermediations growth in chinaUploaded byapi-298480545

- The Aerodynamics of Hovering Insect Flight - 3 KinematicsUploaded byMary Lum
- precision in forecastingUploaded byogangurel
- Tesla Self Driving Using IOT.pptxUploaded byBeBrave
- 1.Tectonic Evolution of the Andes of Ecuador, Peru.._IMP.pdfUploaded byWiñaypac SRL
- irc3380_2880-pcUploaded byAndrei Marinas
- SOAQSUploaded byAditya Pratama
- BS4 Structural SectionsUploaded bySara Booker
- 3 Excel FunctionUploaded bySANKALP SINGH
- CFLHD Production RatesUploaded byroldski
- Harmonic Theory DiagramsUploaded bypic2007
- El Metodo de Las Constelaciones Organizacionales (1)Uploaded byFernando Carvajal Caceres
- DhcpUploaded byapi-3832698
- Potassium NitrateUploaded byMohammadAh
- 01478BUploaded bymhemara
- Emotional IntelligenceUploaded byyav007
- Mould Tool Design 02Uploaded byskumaransp
- Preliminary Assignment Computer Science 2018Uploaded byTafseer Ahmed
- Fact Sheet Vacon NXC Low Harmonic Drive AFEUploaded byAnonymous 3Ys5kq
- FELIPE-SIST ESTACIONARIO-1711071551.pdfUploaded byFelipeGonzales
- Tech-Exam (2)Uploaded byمحمد أحمد عبدالوهاب محمد
- PC120-8 main pump 1Uploaded bydavid ballen
- TOK ReflectionUploaded byLawoiri Riffi
- Microsoft Word - Banker'S_algoUploaded byoose
- 15 - Information Security Policy.pptUploaded byMangala Semage
- 013540Uploaded byjbgray
- TRAVEL PERMIT INDULGENCE SCHEME.pptxUploaded byPooja Racha
- Mintzberg H. & Water_ J.a. 1985 of Strategies_ Deliberate and EmergentUploaded byannieangelo
- Inductor ChartUploaded byshaswat_23
- research paperUploaded byapi-405689767
- prasna kundliUploaded byRajeshraj Sharma