CV

Misc

A better validation set score than the training set score (Notes from link):
- You don’t have that much data and it’s luck.
  - Can be diagnosed by changing the seed (random_state in py) in data split function
- Gap between them shrinks over time
  - May be do to regularization (if it’s being used).
  - During validation and testing, your loss function only comprises prediction error
- Gap between them stays the same and training loss has fluctuations
  - DL: dropout is only applicable during the training process, so it only affects training loss
- Validation loss lower than training loss at first but has similar or higher values later on
  - DL: Training loss is calculated during each epoch, but validation loss is calculated at the end of each epoch

Compare Training vs Test

Example: {gt} table, {yardstick} forecast metrics

bind_rows(
  yardstick::mape(rf_preds_train, Sale_Price, .pred),
  yardstick::mape(rf_preds_test, Sale_Price, .pred)
) %>% 
  mutate(dataset = c("training", "holdout")) %>% 
  gt::gt() %>% 
  gt::fmt_number(".estimate", decimals = 1)

Regression

For prediction, if coefficients vary significantly across the test folds their robustness is not guaranteed (see coefficient boxplot below), and they should probably be interpreted with caution.
- Boxplots show the variance of the coefficient across the folds of a repeated 5-fold cv.
- The “Coefficient importance” in the example is just the coefficient value of the standardized variable in a ridge regression
- Note outliers beyond the whiskers for Age and Experience
  - In this case, the variance is caused by the fact that experience and age are strongly collinear.
- Variability in coefficients can also be explained by collinearity between predictors
- Perform sensitivity analysis by removing one of the collinear predictors and re-running the CV. Check if the variance of the variable that was kept has stabilized (e.g. fewer outliers past the whiskers of a boxplot).