Spatial
Misc
- Also see
- Packages
- {CARBayesST} (Vignette) - Spatio-Temporal Generalized Linear Mixed Models for Areal Unit Data
- Binomial, Gaussian, or Poisson available
- {spatialreg} - OG; Various methods of spatial regression, Bivand’s package
- Spatial Autoregressive Combined (SAC) models combine both a Spatial Autoregression (SAR) model and a Spatial Error (SEM) model
- {spatemR} - Generalized Spatial Autoregressive Models for Mean and Variance
- Extends classical methods like logistic Spatial Autoregresive Models (SAR), probit Spatial Autoregresive Models (SAR), and Poisson Spatial Autoregresive Models (SAR),
- Built on top of {gamlss} so splines can be included.
- {sphet} - Estimation of Spatial Autoregressive Models with and without Heteroskedastic Innovations
- {spldv} - Estimates spatial autoregressive models for binary dependent variables using GMM estimators
- Computes average marginal effects for probit and logit models
- {CARBayesST} (Vignette) - Spatio-Temporal Generalized Linear Mixed Models for Areal Unit Data
- Papers
- Notes from
- Bivand’s slides and source code, Categorical independent variables and spatial regression: interpretation and reporting
- Spatial Data Science, Ch.16, Ch.17
- Geodata & Spatial Regression
Econometric
Misc
- Workflow (LeSage (2014), What Regional Scientists Need to Know about Spacial Econometrics)
- Does your DGP indicate a local or global model (i.e. SDM or SDEM)?
- See Models >> Nested Structure. SDM (middle column, global) and SDEM (right column, local)
- SAC is also a global model but LeSage doesn’t like the SAC model (left column).
- See Models >> SAC for details.
- I’d need to check the paper, but I think his concerns might be assuaged with the second spatial weights argument (listw2) in the model function
- See Impacts >> Types for definitions of Local and Global
- Use Likelihood Ratio Tests to select between models nested under whichever one you selected
- Does your DGP indicate a local or global model (i.e. SDM or SDEM)?
- The spatial lag is usually taken as the mean of the values of the variable at neighboring observations for row-standardised spatial weights, or the sum for binary spatial weights.
- Relatively dense spatial weights matrices may down-weight model estimates, suggesting that sparser weights are preferable
- The presence of residual spatial autocorrelation need not bias the estimates of variance of regression coefficients, provided that the covariates themselves do not exhibit spatial autocorrelation
- if covariates and the residual are autocorrelated, it is likely that the estimates of variance of regression coefficients will be biased downwards if attempts are not made to model the spatial processes.
- Common use of pre-test strategies for model selection probably ought to be replaced by the estimation of the most general model appropriate for the relationships being modelled.
- All covariates should NOT be added in spatially lagged form
- If the processes in the covariates and the response match, we should find little difference between the coefficients of a least squares and a SEM, but very often they diverge, suggesting that a Hausman test for this condition should be employed.
Models
-
- SDEM (right) nests: OLS, SLX and SEM, but SEM only nests OLS, not SLX.
{spatialreg} Model Functions
model model name maximum likelihood estimation function SEM spatial error errorsarlm(..., Durbin=FALSE)
SEM spatial error spautolm(..., family="SAR")
SDEM spatial Durbin error errorsarlm(..., Durbin=TRUE)
SLM spatial lag lagsarlm(..., Durbin=FALSE)
SDM spatial Durbin lagsarlm(..., Durbin=TRUE)
SAC spatial autoregressive combined sacsarlm(..., Durbin=FALSE)
GNM general nested sacsarlm(..., Durbin=TRUE)
errorsarlm
has a weights argument may be used to provide weights indicating the known degree of per-observation variability in the variance term (not spatial weights)- method gives the method for calculating the log determinant term in the log likelihood function, and defaults to
"eigen"
, suitable for moderately sized datasets summary
results for models with:- \(\rho_{\text{lag}} Wy\) (lagged y) will have a rho estimate
- \(\theta WX\) (lagged covariates) will have lag.<covariate> estimates
- \(\rho_{\text{err}}Wu\) (spatially modelled errors) will have a lambda estimate
- Says when location i’s neighbors’ residual increases by 1 unit on average, then location i’s residual changes by the lambda estimate on average.
- The significance of \(\rho_{\text{err}}\) indicates whether spatial correlation remains in the residuals
- The combined absolute values of \(\rho_{\text{lag}}\) and \(\rho_{\text{err}}\) and their statistical significance give you an idea of how much spatial autocorrelation is being modelled.
Spatial Lag Model (SLM) or Spatial Autoregressive (SAR)
\[ y = \rho_{\text{lag}}Wy + X\beta + \epsilon \]- Includes a spatial process for the response
- Lag of Y (endogenous interaction effects):
- Possible space-time dependence (e.g. drug use spreads through space over time)
- Values of Y are jointly determined with that of neighboring agents, or there are “spillover effects”
- Terms
- \(y\) : A \(n\times 1\) vector of observations on a response variable taken at each of \(n\) locations
- \(\rho_{\text{lag}}\) : Scalar spatial parameter
- \(W\) : A \(n\times n\) spatial weights matrix
- \(X\) : A \(n\times k\) matrix of covariates usually including the intercept
- \(\beta\) : A \(k\times 1\) vector of parameters
- \(\epsilon\) : A \(n\times 1\) vector of iid disturbances
Spatial Error Model (SEM)
\[ \begin{align} &y = X\beta + u \\ &\text{where} \;\; u = \rho_{\text{err}} Wu + \epsilon \end{align} \]- Includes a spatial process for the residuals
- \(u\) : A \(n\;\times\;1\) spatially autocorrelated disturbance vector
- \(\rho_{\text{err}}\) : A scalar spatial parameter.
- Can be motivated by spatial heterogeneity (e.g., a spatially correlated fixed effect as in panel data models), an omitted variable with spatially distinct effects, others.
Spatial Durbin Model (SDM)
\[ y = \rho_{\text{lag}}Wy + X\beta + WX\gamma + \epsilon \]- Includes spatial processes for the response and covariates
- \(WX\) : Is the spatially weighted, lagged covariates term
- \(\gamma\) : A \(k\;' \times 1\) vector of parameters where often \(k' = k - 1\) when using row-standardized spatial weights and ommitting the spatially lagged intercept.
- Can be motivated by assuming that there is a spatially correlated omitted variable (e.g. environmental amenities, highway accessibility, community social capital)
- The indirect spillover effect produced by the SDM model is both local and global with no prior restrictions on the magnitudes of these effects
- The exogenous regressor of one location casts an effect on the dependent variable of the neighboring locations through feedback loops.
- The feedback loops are a result of the Durbin term due to which a location, say \(i\) affects aneighboring location, say \(j\), which eventually has a consequence on \(i, j = 1,2, \ldots , n\)
- An advantage of the spatial Durbin model is that it produces correct standard errors or t−values of the coefficient estimates also if the true data generating process is a spatial error model
- The SDM model reduces to SEM under a non-linear common factor restriction. Thus, theoretically speaking, the choice between SDM and SEM can be settled by performing astatistical test for the common factor restriction.
- Rao’s score test principle which is asymptotically equivalent to the Likelihood Ratio Test (LRT) but computationally much simpler
- Use of the likelihood ratio test (LRT) is not attractive for testing a non-linear hypothesis, since LRT requires maximum likelihood estimation both under the null and the alternative hypotheses.
General Nested Model (GNM) or General Nested Spatial Model (GNS)
\[ \begin{align} &y = \rho_{\text{lag}}Wy + X\beta + WX\gamma + u \\ &\text{where} \;\; u = \rho_{\text{err}} Wu + \epsilon \end{align} \]
- Includes a spatial process to the response, covariates, and residuals (kitchen sink model)
- AKA Manski Model
- Issue: The GNS is only weakly (or not?) identifiable
- Not recommended
- Analogous to Manski’s reflection problem on neighborhood effects: If people in the same group behave similar, this can be because
- Imitating behavior of the group
- Exogenous characteristics of the group influence the behaviour
- Members of the same group are exposed to the same external circumstances.
- We just cannot separate those in observational data.
Combined Spatial Autocorrelation (SAC)
\[ \begin{align} &y = \rho_{\text{lag}}Wy + X\beta + u \\ &\text{where} \;\; u = \rho_{\text{err}} Wu + \epsilon \end{align} \]
- No spatial process for covariates
- AKA Spatial AutoRegressive-AutoRegressive (SARAR), Kelejian-Prucha, Cliff-Ord
- Issue: Using the same W matrix for lag and error terms can lead to identification problems.
sacsarlm
has a listw2 argument to pass a second spatial weights matrix- Bivand doesn’t use it in his example though
Spatially Lagged X Model (SLX)
\[ y = X\beta + WX\gamma + \epsilon \]
- Only has a spatial process for the covariates
- If explanatory variables of area i might affect its neighbor j
Error Durbin model (SDEM)
\[ \begin{align} &y = X\beta + WX\gamma + u \\ &\text{where} \;\; u = \rho_{\text{err}} Wu + \epsilon \end{align} \]
- No spatial process for the response
Impacts
- Misc
- Notes from
- AKA Marginal Effects
- A model will have either local direct, indirect, and total impacts or global direct, indirect, and total impacts.
- Standard error estimates from SEM and SDEM models (fit by maximum likelihood) and have been corrected by multiplying them by \(N∕(N - k)\) can be compared to OLS and SLX standard error estimates. (section 3.3)
- Code
Just the impact point estimates:
impacts(mod, listw = listw_obj)
W/p-values:
summary(impacts( mod,listw = listw_obj, R = 500, zstats = TRUE ))
- Types
- Direct impacts measure the effect of changing an explanatory variable in a specific location on the dependent variable in that same location. This is analogous to traditional regression coefficients but accounts for spatial feedback effects when lagged endogenous variables are present (i.e. lagged response).
- Note that all variation in the impacts matrix comes from the variation in the spatial weights matrix which we’ve specified a priori.
- i.e. it’s sort of subjective since we choose the neighboring scheme, weighting method, and normalization method. \(W\) is not directly estimated from the data.
- I guess we could treat those choices as hyperparameters though.
- For a GNM,
\[ S_r(W) = ((I - \rho_{\text{lag}}W)^{-1}(I\beta_r + W\gamma_r) \]- \(S_r(W)\) (the partial derivative) expresses the direct impacts (effects) on its principal diagonal, and indirect impacts in off-diagonal elements.
- For SAR,
\[ S_r(W) = (I - \rho_{\text{lag}}W)^{-1} \] - For SLX, SDEM,
- \(\beta_k\) is the average (local) direct impact (covariate coefficient) and \(\gamma_k\) is the average (local) indirect impact (lagged covariate parameter)
- Note that all variation in the impacts matrix comes from the variation in the spatial weights matrix which we’ve specified a priori.
- Indirect impacts (aka spillover effects) capture how changing an explanatory variable in one location affects the dependent variable in neighboring locations through spatial interdependencies. These effects propagate through the spatial weights matrix.
- Total impacts are simply the sum of direct and indirect impacts, representing the overall effect of changing a variable across the entire spatial system.
- Global impacts provide system-wide average measures of direct, indirect, and total effects. Effects spread to all neighbors (i.e. includes higher order neighbors) but decrease with distance. Global spillover effects can be seen as a kind of diffusion process.
- Seen as crucial for reporting results from fitting models including the spatially lagged response (SLM, SDM, SAC, GNM) which assume global dependence, and has been extended to models with spatially lagged covariates (SLX, SDEM).
- Issue: These processes happen over time. In a cross-sectional framework, the global spillover effects are hard to interpret. Anselin (2003) proposes an interpretation as an equilibrium outcome, where the partial impact represents an estimate of how this long-run equilibrium would change due to a change in \(x_k\).
- Example: A public railway is built (exogenous effect) and connects the business district to other key areas in the city
- The new transit station drastically increases the desirability of District A. This surge in desirability leads to a sharp increase in demand for housing in District A.
- Because housing supply is fixed in the short term (you can’t instantly build new houses), this increased demand results in a significant increase in house prices in District A.
- Potential homebuyers who were priced out of District A will now look for the next best alternative. District B, which is adjacent to District A and might have similar housing stock but without the direct train access, suddenly becomes a more attractive option by comparison. These buyers bring their increased willingness-to-pay with them.
- This effect continues to ALL higher order neighbors, but weakens the further the district is from a railway station.
- The new transit station drastically increases the desirability of District A. This surge in desirability leads to a sharp increase in demand for housing in District A.
- Local impacts allow these effects to vary across different locations. This is particularly relevant in weighted spatial regression and other spatially varying coefficient models, where the relationship between variables can differ across space due to local conditions, policies, or characteristics.
- In Spatial Lag of X (SLX) model or the Spatial Durbin Error Model (SDEM), the influence of a change in an explanatory variable is restricted to a defined neighborhood of the initial location.
- These models assume local dependence. The impact does not propagate endlessly throughout the system. This more contained spillover is referred to as a local impact.
- Only direct neighbors – as defined in \(W\) – contribute to local spillover effects. The coefficients only estimate how my direct neighbour’s \(X\) values influence my own outcome \(Y\).
- There are no higher order neighbors involved (as long as we do not model them), nor are there any feedback loops due to interdependence.
- Local impacts may be reported for SDEM and SLX models, using a linear combination to calculate standard errors for the total impacts of each covariate (sums of coefficients on the covariates and their spatial lags)
- Example: The effect of environmental quality on house prices
- The environmental quality in the focal unit itself but also in neighbouring units could influence the attractiveness of a district and its house prices.
- Therefore, it seems reasonable to assume that we have local spillover effects: only the environmental quality in directly contiguous units (e.g. in walking distance) is relevant for estimating the house prices.
- Example: DUIs (source)
- Data on DUI counts (response) and covariates are at the county level
- Covariates are % population are baptists (dry), liquor sold per capita, % population is college students (drunk kids), avg distance between liquor stores and homes, avg distance between restaurants w/liquor and homes, % recreation/entertainment workers (tourism)
- Does a change in any of these covariates in one county affect the count of DUIs in ALL other counties in the state (global) or just the counties neighboring a particular county (local)?
- Likely just neighboring counties, so we should use a local model.
- In Spatial Lag of X (SLX) model or the Spatial Durbin Error Model (SDEM), the influence of a change in an explanatory variable is restricted to a defined neighborhood of the initial location.
- Feedback Effects are where the initial change in a location influences neighboring locations, which in turn influence the original location back.
- My \(X\) influences my \(Y\) directly, but my \(Y\) then influences my neigbour’s \(Y\), which then influences my \(Y\) again (also also other neighbour’s \(Y\)s). Thus the influence of my \(X\) on my \(Y\) includes a spatial multiplier
- Direct impacts measure the effect of changing an explanatory variable in a specific location on the dependent variable in that same location. This is analogous to traditional regression coefficients but accounts for spatial feedback effects when lagged endogenous variables are present (i.e. lagged response).
- Interpretation
- Direct Impacts represent the average change in the dependent variable at location \(i\) from a one-unit change in the explanatory variable at location \(i\).
- An average effect of a unit change in \(x_i\) on \(y_i\)
- Example: If the direct impact of police spending (in $1000s) is 0.15 for crime reduction, then a $1,000 increase in police spending in a city reduces crime in that same city by 0.15 units on average.
- Indirect Impacts represent the average change in the dependent variable at location \(i\) from a one-unit change in the explanatory variable at all other locations \(j \ne i\).
- How a change in \(x_i\), on average, influences all neighboring units \(y_j\) .
- For global models: A change of \(x_i\) in the focal units flows through the entire system of neighbors (direct neightbors, neighbors of neighbours, …) influencing ‘their’ \(y_j\). One can think of this as diffusion or a change in a long-term equilibrium.
- Example: If logNO2 increases by one unit, this increases the house price in the focal unit by 0.448 units. Overall, a one unit change in logNO2 increases the house prices in the entire neighborhood system (direct and higher order neighbours) by 0.754.
- Example: If the indirect impact is 0.08, then a \(\$1,000\) increase in police spending in city \(i\) reduces crime in all other cities by 0.08 units on average through spillover effects.
- Local
- Example: % of county population employed by the entertainment industry (i.e. tourism proxy) is shown to have a significant indirect impact on the count of DUIs in a county. Although, the total impact for this variable is not significant. (source)
- Indirect: Says counties with any neighbors building a casino/hotel/etc which would increase their population that’s employed by the entertainment/hotel/etc. industry should be prepared for an increase in DUIs
- Total: Says that the state shouldn’t expect DUIs to significantly increase statewide and there’s no evidence to indicate changes in policy are necessary.
- I don’t agree with this interpretation. Sounds like the interpretation of a global model’s total impact.
- I think maybe this should be interpreted for group of counties (i.e. a county and it’s neighbors) and not statewide. So the total DUIs in a neighborhood which includes a county with a substatial proportion of entertainment workers should expect that total to be significantly affected.
- Example: % of county population employed by the entertainment industry (i.e. tourism proxy) is shown to have a significant indirect impact on the count of DUIs in a county. Although, the total impact for this variable is not significant. (source)
- Remember it’s the average. The effect on location \(i\) from each other location will vary according to spatial relationships encoded in the spatial weights matrix.
- A change in city \(j\) can affect city \(i\) both directly (if they’re neighbors) and indirectly through intermediate cities in the spatial network (i.e. the effect generated in other cities spilling over into a city).
- For SLX models, nothing is gained from computing the “impacts,” as they equal the coefficients. Again, it’s the effects of direct neighbours only (i.e. a local indirect impact).
- Total Impacts represent the overall effect of a change in an explanatory variable across all locations in the study area. In essence, it captures the entire spatial multiplier effect
- Example: Human Capital’s impact on regional labor productivity
- “A ceteris paribus increase in the level of human capital is found to have a significant and positive direct impact. But this positive direct impact is offset by a significant and negative indirect (spillover) impact leading to a total impact that is not significantly different from zero.”
- Example: Human Capital’s impact on regional labor productivity
- Direct Impacts represent the average change in the dependent variable at location \(i\) from a one-unit change in the explanatory variable at location \(i\).
Diagnostics
- Misc
- Also see Diagnostics, Geospatial >> Spatial Autocorrelation (Moran’s I)
- AIC and the LR test are about model selection while the rest (Hausman, B-P, Rao) seem to more about determining whether a model fit is bad due to misspecification (residual autocorrelation, residual heteroskedacity).
- In Bivand’s Boston house sales/NOX pollution paper (see Example 1), he only reported Breusch-Pagan, Rao’s Score, and Moran’s I for models that don’t model spatial errors (i.e. OLS, SLX)
- I don’t think there’s was a spatial Durbin Rao’s score at the time though.
- AIC can be used to compare models using data with the same level of geometries
- i.e. Models using data at the census tract level can’t be compared with models using data at the county level
- See Diagnostics, Regression >> GOF >> Information Criteria
- Spatial Hausman Test
- Notes from Video: Spatial Regression in R 1: The four simplest models
Hausman.test(sem_or_sdem_mod)
- “For a given set of variables, a divergence between the coefficient estimates from SEM and OLS suggests that neither is yielding regression parameter estimates matching the underlying parameters of the DGP. This calls into question use of either OLS or SEM for that set of variables” Orignial Paper
- Under the SEM model assumptions, OLS and SEM regression parameter estimates should be unbiased. This suggests that significant differences in regression parameter estimates will arise only from misspecification
- H0: OLS and SEM parameters are not significantly different
- Not rejecting the null says that you could fit the OLS and get the same coefficient estimates as the SEM, but the standard errors would be off since the spatial process isn’t being captured. So fitting SEM is preferrable over OLS, and there is no misspecification in fitting the SEM.
- Rejecting the null (p-value < 0.05) indicates neither OLS nor SEM is appropriate and there is a spatial process not sufficiently captured by the SEM model which only spatially models the residuals. In other words, try some other spatial regression models.
- I’ve only seen this test mention SEM models. But Bivand used this test with a SDEM model, so, I assume it’s okay.
- Likelihood Ratio (LR) Test
- Test Model Restrictions:
LR.sarlm(mod1, mod2)
- Technically “restricting” is talking about restricting one of the multipliers (rhos, theta, or lambda) to 0 which would give you a simpler model.
- dof is how many multipliers are you restricting to 0.
- e.g. If you’re using lagged covariates, then there’s a dof for each one.
- The order of the models in function arguments doesn’t matter.
- Tests to see, within a nested family of models, if a more complex model should be a simpler model
- e.g. SDEM should be “restricted” to the SEM, i.e. the SEM is a better fit than the SDEM.
- Can only test restriction to a simpler, nested model.
- e.g. SDM and SDEM are not nested, as one cannot be simplified into the other.
- If two models are nested according to type (e.g. SDEM and SEM), but each have different spatial weights matrices, then they are NOT nested.
- Null Hypothesis (H0): Restrictions are true. Use a simpler nested model.
- See Models >> Nested Structure
- And vice versa, failing to reject the Null means you should keep the more complex model.
- If the test fails to restrict to lower level model, then you don’t need to keep testing against lower and lower level models,
- e.g. If the test fails to restrict a SDEM to a SER, then you don’t need to test the SDEM model against a SLX model or OLS.
- Test Model Restrictions:
- Spatial Breusch-Pagan Test
- Tests residuals for heteroskedasticity.
bptest.Sarlm(mod, studentize = TRUE)
- H0: Homoskedastic residuals
- Coefficients won’t be biased but your standard errors (and therefore p-values) won’t be right.
- Solutions:
- In Example 1, the hope was that applying case weights to the models would remedy heteroskedacity but judging by the p-values for this test, I think it exacerbated it in this instance.
- Same trend in the Moran’s I results
- Aggregating to a higher (more appropriate) level geometry helped the most according the p-values. Certain causal effects may be present only at particular scales and missing this scale can lead to misspecification.
- See {sphet}
- I wonder if HC3 SEs or Conley SEs would work here
- Also supposedly the effect on SEs isn’t overly strong. So, if you’re p-value is really small, then you’re probably okay.
- In Example 1, the hope was that applying case weights to the models would remedy heteroskedacity but judging by the p-values for this test, I think it exacerbated it in this instance.
- Rao’s Score (AKA Robust Lagrange Multiplier Test)
- The Rao’s Score for linear regression offers a good performance in distinguishing between SAR, SEM, and non-spatial OLS (
lm.RStests
)- There’s also one for Durbin models (
SD.RStests
)
- There’s also one for Durbin models (
- Potential Causes for Spatial Autocorrelation
- Interdependence, \(\rho\) or \(\rho_{\text{lag}}\) (aka lagged dependent variable)
- Clustering on unobservables, \(\lambda\) or \(\rho_{\text{err}}\) (aka error dependence)
- Spillovers in covariates, \(\theta\) (aka lagged covariates)
- Null Hypotheses
- H0: \(\rho_{\text{lag}} = 0\)
or
H0: \(\rho_{\text{err}} = 0\) - If p-value < 0.5, then lags of the dependent variable should be included or errors should be spatially modelled (depending on the test’s H0).
- Should be similar for the Durbin tests — p-value < 0.05 means include the component that it’s testing for, e.g. if SDM_adjRSWX has a p-value < 0.05, then include lagged covariates.
- H0: \(\rho_{\text{lag}} = 0\)
- For OLS test formulas see GSR 12.1.1
- Available Tests for Linear Regression Rao Score:
lm.RStests
- “all” - Compute all tests
- “RSerr” - Error dependence
- “adjRSerr” - Tests for error dependence in the possible presence of a missing lagged dependent variable
- “RSlag” - Missing spatially lagged dependent variable
- “adjRSlag” - Tests for missing spatially lagged dependent variable in the possible presence of error dependence
- “SARMA” - Combination test, “RSerr” + “adjRSlag”
- Available Tests for Spatial Durbin Rao Score:
SD.RStests
- “all” - Compute all tests
- “SDM” - Computes all SDM tests which include:
- “SDM_RSlag” - Same as “RSlag”
- “SDM_adjRSlag” - Same as “adjRSlag”
- “SDM_RSWX” - Missing spatially lagged independent variable(s)
- “SDM_adjRSWX” - Tests for missing spatially lagged dependent variable when there’s possibly a missing lagged dependent variable
- “SDM_Joint” - Combination test (SDM_RSlag + SDM_adjRSWX or SDM_adjRSlag + SDM_RSWX)
- “SDEM” - Computes all SDEM tests which include:
- “SDEM_RSerr” - Same as “RSerr”
- “SDEM_RSWX” - Same as “SDM_RSlag”
- “SDEM_Joint” - Same as “SDM_Joint”
- No adjusted tests required because of orthogonality
- Functions
A
lm
model and a spatial weights object are the inputsUtilizing different options can have substantial effects on the test, such as:
- Using a different weighting scheme, e.g. switching for row-standardized (“W”) to global standardized (“C”)
- Only lagging a subset of the covariates, e.g. Durbin = ~ INC
- Variable transformations, e.g. scaling
There will be warnings about using categorical variables in the Durbin tests, because the effects are not well-understood.
Example:
library(spatialreg); library(spdep) # data, nb list, spatial weights <- columbus ::st_read(system.file("shapes/columbus.gpkg", sfpackage = "spData")[1]) <- col.gal.nb ::read.gal(system.file("weights/columbus.gal", spdeppackage = "spData")[1]) <- nb2listw(col.gal.nb, style = "W") col.listw # cleaning $fEW <- factor(columbus$EW) columbus$fDISCBD <- ordered(cut(columbus$DISCBD, columbusc(0, 1.5, 3, 4.5, 6))) <- formula(log(CRIME) ~ INC + HOVAL + fDISCBD + fEW) f <- lm(f, data=columbus) lm_obj summary(lm.RStests(lm_obj, col.listw,test = "all")) #> statistic parameter p.value #> RSerr 1.915043 1 0.1664 #> RSlag 1.827463 1 0.1764 #> adjRSerr 0.143411 1 0.7049 #> adjRSlag 0.055831 1 0.8132 #> SARMA 1.970874 2 0.3733 summary(SD.RStests(lm_obj, col.listw, test = "SDM")) #> statistic parameter p.value #> SDM_RSlag 1.8275 1 0.1764 #> SDM_adjRSlag 1.9150 1 0.1664 #> SDM_RSWX 3.6798 6 0.7199 #> SDM_adjRSWX 3.7673 6 0.7081 #> SDM_Joint 5.5948 7 0.5878 #> Warning: use of spatially lagged factors (categorical variables) fDISCBD, fEW is not well-understood <- lm_obj1 lm(I(scale(CRIME)) ~ 0 + I(scale(INC)) + I(scale(HOVAL)), data = columbus) summary(SD.RStests(lm_obj1, col.listw, test = "all")) #> statistic parameter p.value #> SDM_RSlag 7.8250 1 0.005153 ** #> SDM_adjRSlag 4.6111 1 0.031765 * #> SDM_RSWX 6.0609 2 0.048295 * #> SDM_adjRSWX 2.8470 2 0.240873 #> SDM_Joint 10.6720 3 0.013638 * #> SDEM_RSerr 4.6111 1 0.031765 * #> SDEM_RSWX 6.0609 2 0.048295 * #> SDEM_Joint 10.6720 3 0.013638 *
- No rejections when using sort of basic settings, but dropping the categoricals, scaling the numerics in the regression model does result in significant p-values.
- Although the intercept is dropped here, leaving the intercept produces nearly the same results.
- The Rao’s Score for linear regression offers a good performance in distinguishing between SAR, SEM, and non-spatial OLS (
Examples
Example 1: Boston Housing Values (SDS Ch.17.2, Paper)
Census Tract
library(spatialreg) <- form formula(log(median) ~ CRIM + ZN + INDUS + CHAS + I((NOX*10)^2) + I(RM^2) + AGE + log(DIS) + log(RAD) + TAX + PTRATIO + I(BB/100) + log(I(LSTAT/100))) <- eigenw(lw_q_489) eigs_489 <- sdem_489 errorsarlm(form, data = boston_489, listw = lw_q_489, Durbin = TRUE, zero.policy = TRUE, control = list(pre_eig = eigs_489)) <- sem_489 errorsarlm(form, data = boston_489, listw = lw_q_489, zero.policy = TRUE, control = list(pre_eig = eigs_489)) cbind(data.frame(model=c("SEM", "SDEM")), rbind(broom::tidy(Hausman.test(sem_489)), ::tidy(Hausman.test(sdem_489))))[,1:4] broom#> model statistic p.value parameter #> 1 SEM 52.0 2.83e-06 14 #> 2 SDEM 48.7 6.48e-03 27 LR.Sarlm(sdem_489, sem_489) #> Likelihood ratio for spatial linear models #> #> data: #> Likelihood ratio = 74.408, df = 13, p-value = 1.227e-10 #> sample estimates: #> Log likelihood of sdem_489 Log likelihood of sem_489 #> 310.6741 273.4702
- Uses the boston_489 dataset which has no aggregation or unioned geometries — only has the censored values (coded as NA) removed from median (response) (and CHAS coerced to a factor)
- Therefore, this model is almost completely at the census tract level (except for the NOX variable) and not the higher (NOX_ID) level.
- The spatial weights list, lw_q_489, is calculated directly from boston_489
- The weights were row-standardized
- The neighbor list is contiguity-based and includes 3 subgraphs and an observation with no neighbors.
- The NOX variable was multiplied by 10 just for units purposes to get parts per 100 million
- Both Hausman test results are significant which suggests neither is appropriate, and that the spatial process not significantly being captured by only spatially modeling the residuals.
- If we had to choose one of these models (certain it’s a local dgp, don’t buy Hausman, etc.), then the SDEM has a significantly larger log likelihood and would be the choice.
Aggregated
<- eigenw(lw_q_94) eigs_94 <- sdem_94 errorsarlm(form, data = boston_94, listw = lw_q_94, Durbin = TRUE, control = list(pre_eig=eigs_94)) <- sem_94 errorsarlm(form, data = boston_94, listw = lw_q_94, control = list(pre_eig = eigs_94)) cbind(data.frame(model=c("SEM", "SDEM")), rbind(broom::tidy(Hausman.test(sem_94)), ::tidy(Hausman.test(sdem_94))))[, 1:4] broom# model statistic p.value parameter # 1 SEM 15.66 0.335 14 # 2 SDEM 9.21 0.999 27 LR.Sarlm(sdem_94, sem_94) #> Likelihood ratio for spatial linear models #> #> data: #> Likelihood ratio = 43.17, df = 13, p-value = 4.209e-05 #> sample estimates: #> Log likelihood of SDEM_94 Log likelihood of SEM_94 #> 81.33336 59.74851
- The boston_94 dataset has all its explanatory variables and its response variable aggregated to the air pollution model output zone level (NOX_ID)
- The spatial weights, lw_q_94, are calculated from boston_94 with the same row-standardized, contiguity-based scheme.
- The null is no longer rejected for both models which says the SEM and SDEM models aren’t misspecified at the air pollution model output zone level (NOX_ID)
- We could fit an OLS model at this geometry level and get unbiased coefficients, but the model wouldn’t capture the spatial process which would bias the standard errors/p-values.
- The SDEM has the significantly larger log likelihood than the SEM model and is the choice.
Census Tract
<- slx_489 lmSLX(form, data = boston_489, listw = lw_q_489, zero.policy = TRUE) LR.Sarlm(sdem_489, slx_489) #> Likelihood ratio for spatial linear models #> #> data: #> Likelihood ratio = -159.38, df = 1, p-value < 2.2e-16 #> sample estimates: #> Log likelihood of slx_489 Log likelihood of sdem_489 #> 230.9842 310.6741
- The SDEM has the significantly larger log likelihood than the SLX model and is the choice.
- “Translating zonal data into census tracts tends to overstate the correlation because relatively more census tracts are located in center city zones in which PART and NOX levels tend to be most highly correlated.”
- Hence, why the SDEM model probably performs better on this geometry level for these data.
Aggregated
<- slx_94 lmSLX(form, data = boston_94, listw = lw_q_94, zero.policy = TRUE) LR.Sarlm(sdem_94, slx_94) #> Likelihood ratio for spatial linear models #> #> data: #> Likelihood ratio = 0.21585, df = 1, p-value = 0.6422 #> sample estimates: #> Log likelihood of sdem_94 Log likelihood of slx_94 #> 81.33336 81.22543
- There is no difference between the SDEM and SLX log likelihoods, so choosing the less complex model, SLX, model is the choice.
Census Tract
<- slx_489w lmSLX(form, data = boston_489, listw = lw_q_489, weights = units, zero.policy = TRUE) <- sdem_489w errorsarlm(form, data = boston_489, listw = lw_q_489, Durbin = TRUE, weights = units, zero.policy = TRUE, control = list(pre_eig = eigs_489)) LR.Sarlm(sdem_489w, slx_489w) #> Likelihood ratio for spatial linear models #> #> data: #> Likelihood ratio = 136.31, df = 1, p-value < 2.2e-16 #> sample estimates: #> Log likelihood of sdem_489w Log likelihood of slx_489w #> 379.0057 310.8491
- units is the counts of house units which is a variable in the dataset
- The use of weights is justified, because tract-level counts of reported housing units underlying the weighted median values vary from 5 to 3,031, and the air pollution model output zone (aggregated geometries using NOX_ID) counts vary from 25 to 12,411. (See histograms)
- These counts are part of the dataset and most have name pattern of C*_* where the asterisk values depend on the particular home value interval
- SDEM still the model choice
Aggregated
<- slx_94w lmSLX(form, data = boston_94, listw = lw_q_94, weights = units) <- sdem_94w errorsarlm(form, data = boston_94, listw = lw_q_94, Durbin = TRUE, weights = units, control = list(pre_eig = eigs_94)) LR.Sarlm(sdem_94, slx_94) #> Likelihood ratio for spatial linear models #> #> data: #> Likelihood ratio = 0.21585, df = 1, p-value = 0.6422 #> sample estimates: #> Log likelihood of sdem_94 Log likelihood of slx_94 #> 81.33336 81.22543
- Again, no significant difference between model likelihoods, so the less complex model, SLX, would be the choice.
- Uses the boston_489 dataset which has no aggregation or unioned geometries — only has the censored values (coded as NA) removed from median (response) (and CHAS coerced to a factor)
Mixed Effects
- Packages
- {hglm} - Procedures for fitting hierarchical generalized linear models (HGLM).
- Linear mixed models and generalized linear mixed models with random effects for a variety of links and a variety of distributions for both the outcomes and the random effects.
- {R2BayesX} - Estimate structured additive regression (STAR) models with ‘BayesX’.
- {INLA} - Integrated nested Laplace approximation (INLA) is a method for approximate Bayesian inference.
- Alternative to other methods such as Markov chain Monte Carlo because of its speed and ease of use via the R-INLA package.
- Although the INLA methodology focuses on models that can be expressed as latent Gaussian Markov random fields (GMRF), this encompasses a large family of models that are used in practice.
- glmmTMB uses functions of the distance between observations, and fitted variograms to model the spatial autocorrelation present (haven’t found an example using this package)
- {hglm} - Procedures for fitting hierarchical generalized linear models (HGLM).
Varying Intercepts
- Uses a discrete location/id variable as the group variable (i.e. varying intercepts by location).
- Examples
Example: Boston Housing Values (SDS Ch. 16.2)
- The lower level (census tract) dataset (no aggregation of variables) is used in these models
- Since the NOX (air pollution) variable is measured at the higher geometry level (NOX_ID) and not the census tract level like the other covariates, the coefficient estimates for that variable are not useful, but looking at the random effects does provide some insight (See Visualization tab).
This example shows how to fit the same model with four different packages
See Geospatial, Preprocessing >> Aggregations >> Examples >> Example 3 for notes on the processing of this dataset.
NOX_ID is the location variable (and grouping variable) for the measured NOX values.
library(Matrix) library(lme4) <- form formula(log(median) ~ CRIM + ZN + INDUS + CHAS + I((NOX*10)^2) + I(RM^2) + AGE + log(DIS) + log(RAD) + TAX + PTRATIO + I(BB/100) + log(I(LSTAT/100))) <- MLM lmer(update(form, . ~ . + (1 | NOX_ID)), data = boston_487, REML = FALSE) # random effects $MLM_re <- ranef(MLM)[[1]][,1] boston_93
- Not sure why {Matrix} needs to be loaded
library(hglm) |> suppressPackageStartupMessages() suppressWarnings(HGLM_iid <- hglm(fixed = form, random = ~1 | NOX_ID, data = boston_487, family = gaussian())) # random effects $HGLM_re <- unname(HGLM_iid$ranef) boston_93
library(R2BayesX) |> suppressPackageStartupMessages() <- BX_iid bayesx(update(form, . ~ . + sx(NOX_ID, bs = "re")), family = "gaussian", data = boston_487, method = "MCMC", iterations = 12000, burnin = 2000, step = 2, seed = 123)
library(INLA) |> suppressPackageStartupMessages() <- INLA_iid inla(update(form, . ~ . + f(NOX_ID, model = "iid")), family = "gaussian", data = boston_487) # random effects $INLA_re <- INLA_iid$summary.random$NOX_ID$mean boston_93
library(tmap) tm_shape(boston_93) + tm_polygons(fill = c("MLM_re", "HGLM_re", "INLA_re", "BX_re"), fill.legend = tm_legend("IID", frame=FALSE, item.r = 0), fill.free = FALSE, lwd = 0.01, fill.scale = tm_scale(midpoint = 0, values = "brewer.rd_yl_gn")) + tm_facets_wrap(columns = 2, rows = 2) + tm_layout(panel.labels = c("lmer", "hglm", "inla", "bayesx"))
- The air pollution model output zone level IID random effects are very similar across the four model fitting functions reported.
- The central downtown zones have stronger negative random effect values, but strong positive values are also found in close proximity; suburban areas take values closer to zero.
Spatially Structured Random Effects
- Simultaneous Autoregressive (SAR), Conditional Autoregressive (CAR), and Intrinsic CAR (ICAR) models use spatially structured random effects
- SAR, CAR, ICAR, and Leroux models have the same overall model formula but different random effects formulas
- Model Formula
\[ \begin{align} &Y = X \beta + Zu \\ &\text{where} \;\; u \sim\mathcal{N}(0, \sigma^2_u \; \Sigma) \end{align} \]- \(\sigma^2_u \; \Sigma\) is the variance/covariance matrix of the random effects
- Random Effects (\(u\)) Formulas
- SAR: \(\Sigma^{-1} = (I-\rho W)'\;(I-\rho W)\)
- \(W\) is a non-singular spatial weights matrix
- \(\rho\) is a spatial correlation parameter
- CAR: \(\Sigma^{-1} = (I-\rho W)\)
- \(W\) is a symmetric and strictly positive definite spatial weights matrix
- ICAR: \(\Sigma^-1 = M = \text{diag}(n_i) - W\)
- \(W\) is a symmetric and strictly positive definite spatial weights matrix
- Leroux: \(\Sigma^{-1} = (I-\rho)I_n + \rho M)\)
- SAR: \(\Sigma^{-1} = (I-\rho W)'\;(I-\rho W)\)
- Examples
Example: Boston Housing Values (SDS Ch. 16.2)
- The lower level (census tract) dataset (no aggregation of variables) is used in these models while the neighbor’s list/spatial weights object used is created from a higher level geometry (NOX_ID) (i.e. the spatial relationship is modeled at the higher level)
- Since the NOX (air pollution) variable is measured at the higher geometry level (NOX_ID) and not the census tract level like the other covariates, the coefficient estimates for that variable are not useful, but looking at the random effects does provide some insight (See Visualization tab).
- Care is needed to match the indexing variable with the spatial weights. The neighbor’s list object (nb_q_93) needs to get manipulated in some way in all these models in order for the spatial weights object to be in a suitable format for the particular model to accept.
See Geospatial, Preprocessing >> Aggregations >> Examples >> Example 3 for notes on the processing of this dataset.
NOX_ID is the location variable (and grouping variable) for the measured NOX values.
library(spatialreg) <- as(spdep::nb2listw(nb_q_93, style = "B"), "CsparseMatrix") W <- form formula(log(median) ~ CRIM + ZN + INDUS + CHAS + I((NOX*10)^2) + I(RM^2) + AGE + log(DIS) + log(RAD) + TAX + PTRATIO + I(BB/100) + log(I(LSTAT/100))) suppressWarnings( <- HGLM_car hglm(fixed = form, random = ~ 1 | NOX_ID, data = boston_487, family = gaussian(), rand.family = CAR(D = W)) ) # random effects $HGLM_ss <- HGLM_car$ranef[,1] boston_93
library(R2BayesX) <- R2BayesX::nb2gra(nb_q_93) RBX_gra all.equal(row.names(RBX_gra), attr(nb_q_93, "region.id")) # [1] TRUE all.equal(unname(diag(RBX_gra)), spdep::card(nb_q_93)) # [1] TRUE <- BX_mrf bayesx(update(form, . ~ . + sx(NOX_ID, bs = "mrf", map = RBX_gra)), family = "gaussian", data = boston_487, method = "MCMC", iterations = 12000, burnin = 2000, step = 2, seed = 123) # random effects $BX_ss <- BX_mrf$effects["sx(NOX_ID):mrf"][[1]]$Mean boston_93
library(INLA) |> suppressPackageStartupMessages() <- as.integer(as.factor(boston_487$NOX_ID)) ID2 <- INLA_ss inla(update(form, . ~ . + f(ID2, model = "besag", graph = W)), family = "gaussian", data = boston_487) # random effects $INLA_ss <- INLA_ss$summary.random$ID2$mean boston_93
<- Diagonal(nrow(W), rowSums(W)) - W M <- Diagonal(nrow(M), 1) - M Cmatrix <- INLA_lr inla(update(form, . ~ . + f(ID2, model = "generic1", Cmatrix = Cmatrix)), family = "gaussian", data = boston_487) # random effects $INLA_lr <- INLA_lr$summary.random$ID2$mean boston_93
library(mgcv) names(nb_q_93) <- attr(nb_q_93, "region.id") $NOX_ID <- as.factor(boston_487$NOX_ID) boston_487 <- GAM_MRF gam(update(form, . ~ . + s(NOX_ID, bs = "mrf", xt = list(nb = nb_q_93))), data = boston_487, method = "REML") # random effects <- ssre predict(GAM_MRF, type = "terms", se = FALSE)[, "s(NOX_ID)"] all(sapply(tapply(ssre, list(boston_487$NOX_ID), c), function(x) length(unique(round(x, 8))) == 1)) # [1] TRUE # return the first value for each upper-level unit $GAM_ss <- aggregate(ssre, list(boston_487$NOX_ID), boston_93n=1)$x head,
- The upper-level random effects may be extracted by predicting terms
- The values in all lower-level tracts belonging to the same upper-level air pollution model output zones are identical
library(tmap) tm_shape(boston_93) + tm_polygons(fill = c("HGLM_ss", "INLA_lr", "INLA_ss", "BX_ss", "GAM_ss"), fill.legend = tm_legend("SSRE", frame=FALSE, item.r = 0), fill.free = FALSE, lwd = 0.1, fill.scale = tm_scale(midpoint = 0, values = "brewer.rd_yl_gn")) + tm_facets_wrap(columns = 3, rows = 2) + tm_layout(panel.labels = c("hglm CAR", "inla Leroux", "inla ICAR", "bayesx ICAR", "gam ICAR"))
- The spatially structured random effects are also very similar to each other
- The SAR spatial smooth is perhaps a little smoother than the CAR smooths when considering the range of values taken by the random effect term.
- The central downtown zones have stronger negative random effect values, but strong positive values are also found in close proximity; suburban areas take values closer to zero.