Spatial

Misc

  • Packages
    • {CARBayesST} (Vignette) - Spatio-Temporal Generalized Linear Mixed Models for Areal Unit Data
    • {hglm} - Procedures for fitting hierarchical generalized linear models (HGLM).
      • Linear mixed models and generalized linear mixed models with random effects for a variety of links and a variety of distributions for both the outcomes and the random effects.
    • {INLA} - Integrated nested Laplace approximation (INLA) is a method for approximate Bayesian inference.
      • Alternative to other methods such as Markov chain Monte Carlo because of its speed and ease of use via the R-INLA package.
      • Although the INLA methodology focuses on models that can be expressed as latent Gaussian Markov random fields (GMRF), this encompasses a large family of models that are used in practice.
    • glmmTMB uses functions of the distance between observations, and fitted variograms to model the spatial autocorrelation present (haven’t found an example using this package)
    • {R2BayesX} - Estimate structured additive regression (STAR) models with ‘BayesX’.

Varying Intercepts

  • Uses a discrete location/id variable as the group variable (i.e. varying intercepts by location).
  • Examples
    • Example: Boston Housing Values (SDS Ch. 16.2)

      • The lower level (census tract) dataset (no aggregation of variables) is used in these models
      • Since the NOX (air pollution) variable is measured at the higher geometry level (NOX_ID) and not the census tract level like the other covariates, the coefficient estimates for that variable are not useful, but looking at the random effects does provide some insight (See Visualization tab).

      This example shows how to fit the same model with four different packages

      See Geospatial, Preprocessing >> Aggregations >> Examples >> Example 3 for notes on the processing of this dataset.

      NOX_ID is the location variable (and grouping variable) for the measured NOX values.

      library(Matrix)
      library(lme4)
      
      form <- 
        formula(log(median) ~ CRIM + ZN + INDUS + CHAS + 
                I((NOX*10)^2) + I(RM^2) + AGE + log(DIS) +
                log(RAD) + TAX + PTRATIO + I(BB/100) + 
                log(I(LSTAT/100)))
      
      MLM <- 
        lmer(update(form, . ~ . + (1 | NOX_ID)),
             data = boston_487, 
             REML = FALSE)
      
      # random effects
      boston_93$MLM_re <- ranef(MLM)[[1]][,1]
      • Not sure why {Matrix} needs to be loaded
      library(hglm) |> suppressPackageStartupMessages()
      suppressWarnings(HGLM_iid <- hglm(fixed = form,
                                        random = ~1 | NOX_ID,
                                        data = boston_487,
                                        family = gaussian()))
      
      # random effects
      boston_93$HGLM_re <- unname(HGLM_iid$ranef)
      library(R2BayesX) |> suppressPackageStartupMessages()
      
      BX_iid <- 
        bayesx(update(form, . ~ . + sx(NOX_ID, bs = "re")),
               family = "gaussian", 
               data = boston_487,
               method = "MCMC", 
               iterations = 12000,
               burnin = 2000, 
               step = 2, 
               seed = 123)
      library(INLA) |> suppressPackageStartupMessages()
      
      INLA_iid <- 
        inla(update(form, . ~ . + f(NOX_ID, model = "iid")),
             family = "gaussian", 
             data = boston_487)
      
      # random effects
      boston_93$INLA_re <- INLA_iid$summary.random$NOX_ID$mean

      library(tmap)
      
      tm_shape(boston_93) +
        tm_polygons(fill = c("MLM_re", "HGLM_re", "INLA_re", "BX_re"),
            fill.legend = tm_legend("IID", frame=FALSE, item.r = 0),
            fill.free = FALSE, lwd = 0.01,
            fill.scale = tm_scale(midpoint = 0, values = "brewer.rd_yl_gn")) +
        tm_facets_wrap(columns = 2, rows = 2) + 
        tm_layout(panel.labels = c("lmer", "hglm", "inla", "bayesx"))                
      • The air pollution model output zone level IID random effects are very similar across the four model fitting functions reported.
      • The central downtown zones have stronger negative random effect values, but strong positive values are also found in close proximity; suburban areas take values closer to zero.

Spatially Structured Random Effects

  • Simultaneous Autoregressive (SAR), Conditional Autoregressive (CAR), and Intrinsic CAR (ICAR) models use spatially structured random effects

    • SAR, CAR, ICAR, and Leroux models have the same overall model formula but different random effects formulas
  • Model Formula

    \[\begin{align} &Y = X \beta + Zu \\ &\text{where} \;\; u \sim\mathcal{N}(0, \sigma^2_u \; \Sigma) \end{align}\]

    • \(\sigma^2_u \; \Sigma\) is the variance/covariance matrix of the random effects
  • Random Effects (\(u\)) Formulas

    • SAR: \(\Sigma^{-1} = (I-\rho W)'\;(I-\rho W)\)
      • \(W\) is a non-singular spatial weights matrix
      • \(\rho\) is a spatial correlation parameter
    • CAR: \(\Sigma^{-1} = (I-\rho W)\)
      • \(W\) is a symmetric and strictly positive definite spatial weights matrix
    • ICAR: \(\Sigma^-1 = M = \text{diag}(n_i) - W\)
      • \(W\) is a symmetric and strictly positive definite spatial weights matrix
    • Leroux: \(\Sigma^{-1} = (I-\rho)I_n + \rho M)\)
  • Examples

    • Example: Boston Housing Values (SDS Ch. 16.2)

      • The lower level (census tract) dataset (no aggregation of variables) is used in these models while the neighbor’s list/spatial weights object used is created from a higher level geometry (NOX_ID) (i.e. the spatial relationship is modeled at the higher level)
      • Since the NOX (air pollution) variable is measured at the higher geometry level (NOX_ID) and not the census tract level like the other covariates, the coefficient estimates for that variable are not useful, but looking at the random effects does provide some insight (See Visualization tab).
      • Care is needed to match the indexing variable with the spatial weights. The neighbor’s list object (nb_q_93) needs to get manipulated in some way in all these models in order for the spatial weights object to be in a suitable format for the particular model to accept.

      See Geospatial, Preprocessing >> Aggregations >> Examples >> Example 3 for notes on the processing of this dataset.

      NOX_ID is the location variable (and grouping variable) for the measured NOX values.

      library(spatialreg)
      W <- as(spdep::nb2listw(nb_q_93, style = "B"), "CsparseMatrix")
      
      form <- 
        formula(log(median) ~ CRIM + ZN + INDUS + CHAS + 
                I((NOX*10)^2) + I(RM^2) + AGE + log(DIS) +
                log(RAD) + TAX + PTRATIO + I(BB/100) + 
                log(I(LSTAT/100)))
      
      suppressWarnings(
        HGLM_car <- 
          hglm(fixed = form,
               random = ~ 1 | NOX_ID,
               data = boston_487,
               family = gaussian(),
               rand.family = CAR(D = W))
      )
      
      # random effects
      boston_93$HGLM_ss <- HGLM_car$ranef[,1]
      library(R2BayesX)
      
      RBX_gra <- R2BayesX::nb2gra(nb_q_93)
      all.equal(row.names(RBX_gra), attr(nb_q_93, "region.id"))
      # [1] TRUE
      
      all.equal(unname(diag(RBX_gra)), spdep::card(nb_q_93))
      # [1] TRUE
      
      BX_mrf <- 
        bayesx(update(form, . ~ . + sx(NOX_ID, 
                                       bs = "mrf",
                                       map = RBX_gra)), 
               family = "gaussian", 
               data = boston_487,
               method = "MCMC", 
               iterations = 12000,
               burnin = 2000, 
               step = 2, 
               seed = 123)
      
      # random effects
      boston_93$BX_ss <- BX_mrf$effects["sx(NOX_ID):mrf"][[1]]$Mean
      library(INLA) |> suppressPackageStartupMessages()
      
      ID2 <- as.integer(as.factor(boston_487$NOX_ID))
      
      INLA_ss <- 
        inla(update(form, . ~ . + f(ID2, 
                                    model = "besag",
                                    graph = W)),
             family = "gaussian", 
             data = boston_487)
      
      # random effects
      boston_93$INLA_ss <- INLA_ss$summary.random$ID2$mean
      M <- Diagonal(nrow(W), rowSums(W)) - W
      Cmatrix <- Diagonal(nrow(M), 1) -  M
      
      INLA_lr <- 
        inla(update(form, . ~ . + f(ID2, 
                                    model = "generic1",
                                    Cmatrix = Cmatrix)),
             family = "gaussian", 
             data = boston_487)
      
      # random effects
      boston_93$INLA_lr <- INLA_lr$summary.random$ID2$mean
      library(mgcv)
      
      names(nb_q_93) <- attr(nb_q_93, "region.id")
      boston_487$NOX_ID <- as.factor(boston_487$NOX_ID)
      
      GAM_MRF <- 
        gam(update(form, . ~ . + s(NOX_ID, 
                                   bs = "mrf",
                                   xt = list(nb = nb_q_93))),
            data = boston_487, 
            method = "REML")
      
      # random effects
      ssre <- 
        predict(GAM_MRF, 
                type = "terms",
                se = FALSE)[, "s(NOX_ID)"]
      
      all(sapply(tapply(ssre, list(boston_487$NOX_ID), c),
                 function(x) length(unique(round(x, 8))) == 1))
      # [1] TRUE
      
      # return the first value for each upper-level unit
      boston_93$GAM_ss <- aggregate(ssre, list(boston_487$NOX_ID), 
                                    head, n=1)$x
      • The upper-level random effects may be extracted by predicting terms
      • The values in all lower-level tracts belonging to the same upper-level air pollution model output zones are identical

      library(tmap)
      
      tm_shape(boston_93) +
        tm_polygons(fill = c("HGLM_ss", "INLA_lr", "INLA_ss", "BX_ss", 
                "GAM_ss"),
            fill.legend = tm_legend("SSRE", frame=FALSE, item.r = 0),
            fill.free = FALSE, lwd = 0.1,
            fill.scale = tm_scale(midpoint = 0, values = "brewer.rd_yl_gn")) +
        tm_facets_wrap(columns = 3, rows = 2) + 
        tm_layout(panel.labels = c("hglm CAR", "inla Leroux",
                 "inla ICAR", "bayesx ICAR", "gam ICAR"))
      • The spatially structured random effects are also very similar to each other
        • The SAR spatial smooth is perhaps a little smoother than the CAR smooths when considering the range of values taken by the random effect term.
      • The central downtown zones have stronger negative random effect values, but strong positive values are also found in close proximity; suburban areas take values closer to zero.