Generalized Additive Models
Misc
- Also see Feature Engineering, Splines
- Packages
- {mgcv} - Mixed GAM Computation Vehicle with Automatic Smoothness Estimation
- See R >> Documents >> Regression >> GAMs >> Generalized Additive Models: An Introduction with R, Second Edition
bam
: Uses numerical methods are designed for datasets containing upwards of several tens of thousands of data- Has a much lower memory footprint than
gam
- Can compute on a cluster set up by {parallel}
- If discrete=TRUE
- Uses a method based on discretization of covariate values and C code level parallelization (controlled by the nthreads argument instead of the cluster argument) is used
- Number of response data can not exceed
.Machine$integer.max
- Has a much lower memory footprint than
- {gamlss} - GAM modeling where all the parameters of the assumed distribution for the response can be modelled as functions of the explanatory variables
- {gamboostLSS} - Boosting models for fitting generalized additive models for location, shape and scale (‘GAMLSS’) to potentially high dimensional data.
- {bamlss} - Bayesian Additive Models for Location, Scale, and Shape (and Beyond)
- {mgcv} - Mixed GAM Computation Vehicle with Automatic Smoothness Estimation
- Resources
- Video “Introduction to Generalized Additive Models with R and mgcv” Gavin Simpson
- Video “The Wonderful World of mgcv” Noam Ross
- Hierarchical generalized additive models in ecology: an introduction with mgcv (2019)
- Papers
- Bayesian views of generalized additive modelling
- Bayesian GAMs explainer with coded examples.
- Bayesian views of generalized additive modelling
- Large gaps in the values of the predictor variable can be a problem if you are trying to interpolate between those gaps. (See bkmks,
method = "reml" + s(x, m = 1)
) - Thread discussing an example using basis type, bs = “sz”, which is meant for separating a baseline from other effects
- Partial Effect Plots - Show the component contributions, on the link scale, of each model term to the linear predictor.
- {gratia::draw}
- Sound similar to Partial Dependence Plots/Profiles except instead of the average response value on the Y-axis, it’s the effect size.
- The Y-axis on these plots is typically centred around 0 due to most smooths having a sum-to-zero identifiability constraint applied to them
- Show link-scale predictions of the response for each smooth, conditional upon all other terms in the model, including any parametric effects (i.e. fixed effects) and the intercept, having zero contribution.
- These plots show adjusted predictions, just where the adjustment includes setting the contribution of all other model terms to the predicted value to zero
Diagnostics
-
- QQ plot of deviance residuals,
- Scatterplot of deviance residuals against the linear predictor,
- Histogram of deviance residuals, and
- Scatterplot of observed vs fitted values.
{gratia::draw(mod, residuals = TRUE)} - Adds partial residuals to partial effects plots
- Can help diagnose overfitting in your spline terms
“Deviance explained” is the R2 value for GAMs
-
## Method: GCV Optimizer: magic ## Smoothing parameter selection converged after 19 iterations. ## The RMS GCV score gradient at convergence was 5.938335e-08 . ## The Hessian was positive definite. ## Model rank = 21 / 22 ## Basis dimension (k) checking results. Low p-value (k-index<1) may ## indicate that k is too low, especially if edf is close to k'. ## k' edf k-index p-value ## s(id) 1.00 0.35 0.82 <2e-16 *** ## s(log_profit_rug_business_b) 9.00 8.52 1.01 0.69 ## s(log_profit_rug_business_b):treatment 10.00 1.50 1.01 0.62 ## --- ## Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
- Check if the size of the basis expansion (k) for each smooth is sufficiently large
k.check
can also do this- If all your smoothing predictors are not sufficiently large, then this indicates that using a GAM is a bad fit for your data.
- See SO post from Simpson
- Check if the size of the basis expansion (k) for each smooth is sufficiently large
Formal test for the necessity of a smooth
<- m gam(y ~ x + s(x, m = c(2, 0), bs = "tp"), data = foo, method = "REML", family = binomial())
- See EDA, General >> Continuous Predictor vs Outcome >> Continuous and Categorical for examples
bs = "tp"
is just the default thin plate basis function- Fit the predictor of interest as a linear term (x) plus a smooth function of x
- Modify the basis for the smooth so that it no longer includes linear functions in the span of the basis with m = c(2, 0)
- m: Controls the penalty on the wiggliness of spline
- 2: An order-2 penalty (the default and most common) penalizes the second derivative of the function, which relates to its curvature.
- Higher values would penalize higher-order derivatives, resulting in even smoother functions.
- 0: Specifies that no null space basis is required.
- The null space is the span of functions that aren’t affected by the (main) penalty, because they have 0 second derivative. (i.e. terms that doen’t have curvature)
- For an order-2 penalty, the null space typically includes constant and linear terms.
summary
will give a test for the necessity of the wiggliness provided by the smooth over the linear effect estimated by the linear term. Check the p-value of the smooth term. If it’s significant, then a spline should be used. In your model, you wouldn’t use the zeroed out null space specification though.- From Simpson SO post
- Also see Wood’s “Generalized Additive Models: An Introduction with R”, 2nd Ed, section 6.12.3, “Testing a parametric term against a smooth alternative” p 312-313 (R >> Documents >> Regression >> gam)