Generalized Additive Models

Misc

  • Also see Feature Engineering, Splines
  • Packages
    • {mgcv} - Mixed GAM Computation Vehicle with Automatic Smoothness Estimation
      • See R >> Documents >> Regression >> GAMs >> Generalized Additive Models: An Introduction with R, Second Edition
      • bam: Uses numerical methods are designed for datasets containing upwards of several tens of thousands of data
        • Has a much lower memory footprint than gam
        • Can compute on a cluster set up by {parallel}
        • If discrete=TRUE
          • Uses a method based on discretization of covariate values and C code level parallelization (controlled by the nthreads argument instead of the cluster argument) is used
          • Number of response data can not exceed .Machine$integer.max
    • {gamlss} - GAM modeling where all the parameters of the assumed distribution for the response can be modelled as functions of the explanatory variables
    • {gamboostLSS} - Boosting models for fitting generalized additive models for location, shape and scale (‘GAMLSS’) to potentially high dimensional data.
    • {bamlss} - Bayesian Additive Models for Location, Scale, and Shape (and Beyond)
  • Resources
  • Papers
  • Large gaps in the values of the predictor variable can be a problem if you are trying to interpolate between those gaps. (See bkmks, method = "reml" + s(x, m = 1))
  • Thread discussing an example using basis type, bs = “sz”, which is meant for separating a baseline from other effects
  • Partial Effect Plots - Show the component contributions, on the link scale, of each model term to the linear predictor.
    • {gratia::draw}
    • Sound similar to Partial Dependence Plots/Profiles except instead of the average response value on the Y-axis, it’s the effect size.
    • The Y-axis on these plots is typically centred around 0 due to most smooths having a sum-to-zero identifiability constraint applied to them
    • Show link-scale predictions of the response for each smooth, conditional upon all other terms in the model, including any parametric effects (i.e. fixed effects) and the intercept, having zero contribution.
    • These plots show adjusted predictions, just where the adjustment includes setting the contribution of all other model terms to the predicted value to zero

Diagnostics

  • {gratia::appraise}

    • QQ plot of deviance residuals,
    • Scatterplot of deviance residuals against the linear predictor,
    • Histogram of deviance residuals, and
    • Scatterplot of observed vs fitted values.
  • {gratia::draw(mod, residuals = TRUE)} - Adds partial residuals to partial effects plots

    • Can help diagnose overfitting in your spline terms
  • “Deviance explained” is the R2 value for GAMs

  • mgcv::gam.check(gam_fit)

    ## Method: GCV  Optimizer: magic
    ## Smoothing parameter selection converged after 19 iterations.
    ## The RMS GCV score gradient at convergence was 5.938335e-08 .
    ## The Hessian was positive definite.
    ## Model rank =  21 / 22 
    ## Basis dimension (k) checking results. Low p-value (k-index<1) may
    ## indicate that k is too low, especially if edf is close to k'.
    ##                                           k'  edf  k-index p-value   
    ## s(id)                                   1.00  0.35    0.82  <2e-16 ***
    ## s(log_profit_rug_business_b)            9.00  8.52    1.01    0.69   
    ## s(log_profit_rug_business_b):treatment 10.00  1.50    1.01    0.62   
    ## ---
    ## Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
    • Check if the size of the basis expansion (k) for each smooth is sufficiently large
      • k.check can also do this
      • If all your smoothing predictors are not sufficiently large, then this indicates that using a GAM is a bad fit for your data.
      • See SO post from Simpson
  • Formal test for the necessity of a smooth

    m <- 
      gam(y ~ x + s(x, m = c(2, 0), bs = "tp"),
          data = foo,
          method = "REML",
          family = binomial())
    • See EDA, General >> Continuous Predictor vs Outcome >> Continuous and Categorical for examples
    • bs = "tp" is just the default thin plate basis function
    • Fit the predictor of interest as a linear term (x) plus a smooth function of x
    • Modify the basis for the smooth so that it no longer includes linear functions in the span of the basis with m = c(2, 0)
      • m: Controls the penalty on the wiggliness of spline
      • 2: An order-2 penalty (the default and most common) penalizes the second derivative of the function, which relates to its curvature.
        • Higher values would penalize higher-order derivatives, resulting in even smoother functions.
      • 0: Specifies that no null space basis is required.
        • The null space is the span of functions that aren’t affected by the (main) penalty, because they have 0 second derivative. (i.e. terms that doen’t have curvature)
        • For an order-2 penalty, the null space typically includes constant and linear terms.
    • summary will give a test for the necessity of the wiggliness provided by the smooth over the linear effect estimated by the linear term. Check the p-value of the smooth term. If it’s significant, then a spline should be used. In your model, you wouldn’t use the zeroed out null space specification though.
    • From Simpson SO post
    • Also see Wood’s “Generalized Additive Models: An Introduction with R”, 2nd Ed, section 6.12.3, “Testing a parametric term against a smooth alternative” p 312-313 (R >> Documents >> Regression >> gam)