Professional Documents
Culture Documents
1
Session Objectives
At the end of this Session students will enable to:
Describe the geocoding and data linkage using primary and
secondary data
Define cluster analysis
Differentiate Local and Global Spatial autocorrelation
Identify the types of interpolation
Define network analysis
2
What are spatial statistics?
They are similar to traditional statistics, but integrate spatial
they
relationships into the calculations.
Spatial statistics will allow you to answer the following questions about your data:
• Are tools used to test the existence of overall clustering (either high or
low)
• But, used to identify and measure the pattern of the entire study areas
• Homogeneity
8
Testing of the existence of clusters cont’d…
Heterogeneity
9
A. Global Statistics
i) Getis-Ord General G (High/Low Clustering)
The Global G statistic computes a single statistic for the entire study area
Able to indicate whether there is a clustering of high or low values but not
both
The draw back is that, if there are both high and low clusters they will
counteract each other so it is advisable to first use Moran’s I
12
Interpretation of Getis-Ord General G result
It is an inferential statistic, which means that the results of the analysis are interpreted
within the context of the null hypothesis
The null hypothesis states that there is no spatial clustering of feature values
When the p-value returned by this tool is small and statistically significant, the null
hypothesis can be rejected
If the null hypothesis is rejected, the sign of the z-score becomes important.
If the z-score value is positive, the observed General G index is larger than the expected
General G index, indicating that high values for the attribute are clustered in the study
area.
If the z-score value is negative, the observed General G index is smaller than the
expected index, indicating that low values are clustered in the study area.
13
Global statistics cont’d….
ii) Spatial Auto-Correlation (Global Moran’s I)
Global Statistic
14
Spatial Autocorrelation Calculation
15
Global statistics cont’d..
Interpretation Global Moran’s I
• Global Moran's I tool is an inferential statistic, which means that the
results of the analysis are always interpreted within the context of its null
hypothesis
• When the p-value returned by this tool is statistically significant, you can
reject the null hypothesis. 16
Interpretation Global Moran’s I cont’d….
17
18
Global Moran's I vs. Getis-Ord General G
Both techniques are used to assess the global clustering (simply tell you whether there is a
cluster or not where the clustering actually exist)
The assumptions behind both statistics are that your data is continuous and normally
distributed in the study area.
Moran's I measure only indicates that similar values occur together (It does not indicate
whether any cluster is composed of high or low values)
General G statistic can be used to indicate whether high or low values are concentrated
over the study area
Hence, when we wish to find out whether our data is clustered in general (auto correlated)
we can use Moran's I.
However, if we want to know more specifically whether or not there are clusters of high/low
values we can use G statistics 19
Moran's I vs. Getis-Ord General G
20
B. Local statistics
i) Anselin Local Moran’s I (Cluster and outlier analysis )
The math of the two are the same as to the global variant but the result are
somewhat different
Anselin Local Moran’s I can identify HH, LL, HL, LH clusters H=High
L=Low HL is a high value surrounded by low values (outliers) 21
Local statistics cont’d…
Statistically significant clusters can consist of high values (HH) or low values (LL)
22
Local statistics cont’d…
Feature is an outlier
In either instance, a p-value for the feature must be small enough for the
cluster or outlier to be considered statistically significant 23
Local statistics cont’d…
ii) Hotspot Analysis (Getis-Ord Gi*)
• Local version of the G statistic that indicates hot spot (cluster of high
values) or cold spots (clusters of low value)
To be statistically significant, the hot or cold spot will have a high/low value
and be surrounded by other features with high/low values exist in the area
Getis-Ord Gi* can identify Hot (High) or Cold (Low) clusters with
different confidence levels
The math of the two are the same as for the global variant but the results
are somewhat different
Getis-Ord Gi* can identify Hot (High) or Cold (Low) clusters with
different confidence intervals.
Anselin Local Moran's I can identify HH, LL, LH, HL clusters where
H=High, L=Low and HL is a high value surrounded by low values
25
Why is spatial autocorrelation important?
• One of the main reasons why spatial auto-correlation is important is because
statistics relies on observations being independent from one another
• If autocorrelation exists in a map, then this violates the fact that observations are
independent from one another
26
Best practice guidelines for using cluster and outlier analysis
(Anselin Local Moran’s I)
Results are only reliable if the input feature class contains at least 30
features;
This tool requires an input field such as count, rate, or other numeric
measurements
If you are analyzing point data, where each point represents a single event or
incident, you might not have a specific numeric attribute to evaluate (a
severity ranking, count or other measurement)
If you are interested in finding locations with many incidents (hot spots)
and /or locations with very few incidents (cold spot)s), you will need to
27
Best practice guidelines for using cluster cont’d……
Especially if the values for the input field are asked, each feature should
have about eight neighbors
28
Best Practice guidelines for using Cluster cont’d…
Given a set of weighted features, the Getis-Ord Gi* (pronounced as Gee Eye Star)
statistic identifies statistically significant hot pots and cold spots
This tool works by looking feature with in the context of neighboring features.
To be statistically significant hot spot, a feature will have a high value and be
surrounded by other features with high values as well.
The local sum of features & its neighbors is compared proportionally to the sum of
all features;
Wen the sum is very different from the expected local sum, and when that
difference is too large to be the result of random chance, a statistically significant
z-score results.
29
Clustering vs Clusters
The mapping clusters tools perform cluster analysis to identify the locations
of statistically significant hot spots, cold spots, spatial outliers and similar
features
Clustering can be detected at the Global level where clusters at the local
level
Moran’s I is a global statistic, i.e. a single value for the whole spatial
pattern
Moran’s I does not provide the location of clusters
30
Interpolation
What is Interpolation?
It can be used to predict unknown values for any geographic point data, such
as home delivery, high child mortality, low ANC visit and so on.
31
Interpolation Methods/Types/
INVERSE DISTANCE WEIGHTED (IDW)
• The Inverse Distance Weighting interpolator assumes that each input point has
a local influence that diminishes with distance.
• It weights the points closer to the processing cell greater than those further
away.
• A specified number of points, or all points within a specified radius can be used
to determine the output value of each location.
• Use of this method assumes the variable being mapped decreases in influence
32
Interpolation Methods cont’d…
• IDW interpolation explicitly implements the assumption that things that are
close to one another are more alike than those that are farther apart.
• To predict a value for any unmeasured location, IDW will use the measured
values surrounding the prediction location.
• Those measured values closest to the prediction location will have more
influence on the predicted value than those farther away.
33
Interpolation Methods cont’d…
Kriging
• Kriging is a geostatistical interpolation technique that considers both the distance and the
degree of variation between known data points when estimating values in unknown areas.
• A kriged estimate is a weighted linear combination of the known sample values around the
point to be estimated.
• Kriging procedure that generates an estimated surface from a scattered set of points with z-
values.
• Kriging assumes that the distance or direction between sample points reflects a spatial
correlation that can be used to explain variation in the surface.
34
Interpolation Methods cont’d…
36
What is Sat Scan?
• Sat Scan is a freely available software that uses the scan statistic to detect clusters
(www.satscan.org)
• To test whether a disease is randomly distributed over space, over time or over
space and time
• The spatial scan statistic can be useful as an addition to disease maps, in order to
determine if the observed patterns are likely due to chance or not
• For each circle, a likelihood ratio statistic is computed based on the number of observed
and expected cases within and outside the circle and compared with the likelihood L0
under the null hypothesis.
• Create a regular or irregular grid of centroids covering the whole study region.
38
The Spatial Scan Statistic cont’d…
For each circle:
Obtain actual and expected number of cases inside and outside the circle.
Compare Circles:
– Pick circle with highest likelihood function as Most Likely Cluster.
Inference:
Generate random replicas of the data set under the null-hypothesis of no clusters
(Monte Carlo sampling).
Compare most likely clusters in real and random data sets (Likelihood ratio test).
39
The Spatial Scan Statistic cont’d…
The scan statistic is the maximum likelihood over all possible circles
Redistribute cases randomly and recalculate the scan statistic many times
Proportion of scan statistics from the Monte Carlo replicates which are greater
than or equal to the scan statistic for the true cluster is the p-value Scan Statistics
40
What SaTScan can/can’t do?
CAN
CANNOT
o Display maps of events and clusters locations
41
Spatial Scan Statistic: Properties
Adjusts for inhomogeneous population density.
Simultaneously tests for clusters of any size and any location, by using circular
windows with continuously variable radius.
42
Introduction of Statistical models in SaTScan
Bernoulli Model
• There are animals with or without a disease (represented by a 0/1 variable)
• Under the null hypothesis, and when there are no covariates, the expected number
of cases in each area is proportional to its population size
• This model a very good approximation to the Bernoulli model if few cases vs.
controls (less than 10%) 43
Introduction of Statistical models in SaTScan cont’d…
Space-Time Permutation Model
Requires only cases data with information about the spatial location and time for each
case (No information needed for population at risk)
If the population increase (or decrease) is the same across the study region, that is
okay, and will not lead to biased results
The user is advised to be very careful when using this method for data spanning
several years
44
45
46
47
48
49
50
Reading assignment
• Geocoding and data linkage using primary and secondary data
• Network analysis
51
Thank
You!!
52