Geospatial
Misc
- Also see Geospatial, Spatial Weights
- Packages
- {waywiser} - Measures the performance of models fit to 2D spatial data by implementing a number of well-established assessment methods in a consistent, ergonomic toolbox
- Features include new yardstick metrics for measuring agreement and spatial autocorrelation, functions to assess model predictions across multiple scales, and methods to calculate the area of applicability of a model.
- {geospt} - Estimation of the variogram through trimmed mean, radial basis functions (optimization, prediction and cross-validation), summary statistics from cross-validation, pocket plot, and design of optimal sampling networks through sequential and simultaneous points methods.
- {geosptdb} - Spatio-Temporal Radial Basis Functions with Distance-Based Methods (Optimization, Prediction and Cross Validation)
- {sperrorest} - Implements spatial error estimation and permutation-based spatial variable importance using different spatial cross-validation and spatial block bootstrap methods, used by {mlr3spatiotempcv}.
- {waywiser} - Measures the performance of models fit to 2D spatial data by implementing a number of well-established assessment methods in a consistent, ergonomic toolbox
- Papers
Spatial Autocorrelation
Misc
- Local metrics can suffer from multiple testing issues when the number of group units is large
- Packages
- {spdep::EBImoran.mc} (Vignette) uses empirical bayes to shrink locations counts/rates that have high variance / small populations towards a global average rate. Then tests (via permutation) for spatial autocorrelation.
- Useful for count data with outliers or overdispersion
- {spdep::EBImoran.mc} (Vignette) uses empirical bayes to shrink locations counts/rates that have high variance / small populations towards a global average rate. Then tests (via permutation) for spatial autocorrelation.
- Residual spatial autocorrelation has an effect of standard errors and on coefficient values.
- If a higher level geometry (e.g. aggregating from tracts to counties) can be justified, it could help remedy residual spatial autocorrelation
- Certain causal effects may be present only at particular scales and missing this scale can lead to misspecification
- See
- Regression, Spatial >> Econometrics >> Examples >> Example 1
- Geospatial, General >> Terms >> Copying Out
Visual Assessment
Example: (source)

libary(ggplot2) data("US_counties_centroids", package = "SpatialInference") spuriouslm <- fixest::feols( noise1 ~ noise2, data = US_counties_centroids, vcov = "HC1" # robust SEs ) spuriouslm #> OLS estimation, Dep. Var.: noise1 #> Observations: 3,108 #> Standard-errors: Heteroskedasticity-robust #> Estimate Std. Error t value Pr(>|t|) #> (Intercept) 6.890000e-16 0.017872 3.860000e-14 1.0000e+00 #> noise2 8.738022e-02 0.015535 5.624836e+00 2.0216e-08 *** #> --- #> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 #> RMSE: 0.996015 Adj. R2: 0.007316 US_counties_centroids$resid <- spuriouslm$residuals ggplot(US_counties_centroids) + geom_sf(aes(col = resid), size = .1) + theme_bw() + scale_color_viridis_c()noise1 and noise2 are independent of each other but spatially coorelated. This leads to an inflated t-value and low p-value.
Clustering of high residuals in the Midwest
Moran’s I
\[ I = \frac{N}{W} \frac{\sum_i \sum_j w_{ij}(x_i - \bar{x})(x_j - \bar{x})}{\sum_i (x_i - \bar{x})^2} \]
- A measure of global spatial autocorrelation or overall clustering of the data
- If there is no global autocorrelation or no clustering, there can still be clusters at a local level (See Local Moran’s I)
- Assume homegeneity (i.e. only one statistic is needed to summarize the whole study area)
- \(N\) is the number of spatial units (e.g. counties)
- \(w_{ij}\) is an element of the spatial weights matrix
- \(W\) is the sum of all \(w_{ij}\)
- Values significantly below the expected value are negatively correlated
- Values significantly above the exected value are positively correlated
- Range: \(w_{\text{min}}\frac{N}{W} \lt I \lt w_{\text{max}}\frac{N}{W}\)
- For a row normalized weight matrix, \(\frac{N}{W} = 1\) (Wiki)
- In {spdep}, this would be style = “W”
- I don’t get this. W = 1, but why would N also equal 1? Not sure if this right.
- For a row normalized weight matrix, \(\frac{N}{W} = 1\) (Wiki)
- A measure of global spatial autocorrelation or overall clustering of the data
Local Moran’s I
\[ \begin{align} &I_i = \frac{x_i - \bar x}{m_2} \sum_{j=1}^N w_{ij} (x_j - \bar x) \\ &\text{where} \;\; m_2 = \frac{\sum_{i=1}^N (x_i - \bar x)^2}{N} \end{align} \]- Moran’s I is just the average of all \(I_i s\), \(I = \sum_{i=1}^N I_i /N\)
Geary’s C
\[ C = \frac{(N-1) \sum_i \sum_j w_{ij}(x_i-x_j)^2}{2W \sum_i (x_i - \bar x)^2} \]- A measure of global spatial autocorrelation or overall clustering of the data
- More sensitive to local spatial autocorrelation than Moran’s I so it can pick-up on spatial autocorrelation that Moran’s I might have missed.
- \(N\) is the number of analysis units on the map
- \(w_{ij}\) is an element of the spatial weights matrix
- \(W\) is the sum of all \(w_{ij}\)
- A measure of global spatial autocorrelation or overall clustering of the data
Local Geary’s C
\[ \begin{align} &C_i = \frac{1}{m_2} \sum_j w_{ij}(x_i - xj)^2\\ &\text{where} \;\; m_2 = \frac{\sum_i (x_i - \bar x)^2}{N-1} \end{align} \]- Geary’s C is \(C=\sum_i C_i/2W\)