Probabilistic

Misc

  • AIC vs BIC (paper)
    • Lower is Better (which is the model that minimizes the information loss)
    • AIC
      • Penalizes parameters by 2 points per parameter
      • Ideal AIC scenario
        • Numerous hypotheses are considered
        • You have a conviction that all of them are to differing degrees wrong
    • BIC
      • Penalizes parameters by ln(sample size) points per parameter and ln(20) = 2.996
      • Almost always a stronger penalty in practice
      • Ideal BIC scenario
        • Only a few potential hypotheses are considered
        • One of the hypotheses is (essentially) correct

Scores

  • Continuous Ranked Probability Score (CRPS)
    • fabletools::accuracy
    • {loo} - crps(), scrps(), loo_crps(), and loo_scrps() for computing the (Scaled) Continuously Ranked Probability Score
    • Manual calculation (article)
    • Measures forecast distribution accuracy
    • Combines a MAE score with the spread of simulated point forecasts
    • See notebook (pg 172)
  • Winkler Score
    • fabletools::accuracy
    • Measures how well a forecast is covered by the prediction intervals (PI)
    • See notebook (pg 172)

Visual Inspection

  • Check how well the predicted distribution matches the observed distribution

  • {topmodels} currently supported models:

  • autoplot produces a ggplot object that can be used for further customization

  • (Randomized) quantile-quantile residuals plot

    qqrplot(distr_forest_fit)
    • Quantiles of the standard normal distribution vs quantile residuals (regular ole q-q plot)

    • Interpretation

      • Pretty good fit as the points stick pretty close to the line (red dot is the laser pointer from the dude giving the talk)
      • Left and right tails show deviation.
      • The left tail also shows increased uncertainty due the censored distribution that was used to fit the model
    • Compare with a bad model

      c(qqrplot(distr_forest_fit, plot = FALSE), qqrplot(lm_fit, plot = FALSE)) |> autoplot(legend = TRUE, single_graph = TRUE, col = 1:2)
  • (Randomized) quantile-quantile residuals plot

    pithist(distr_forest_fit)
    • Compares the value that the predictive CDF attains at the observation with the uniform distribution

    • The flatter the histogram, the better the model.

    • Interpretation: As with the q-q, this model shows some deviations at the tails but is more or less pretty flat

    • Compare with a bad model

      c(pithist(distr_forest_fit, plot = FALSE), pithist(lm_fit, plot = FALSE) |> autoplot(legend = TRUE, style = "lines", single_graph = TRUE, col = 1:2)
  • (Hanging) Rootogram

    rootogram(distr_forest_fit)
    • Compares whether the observed frequencies match the expected frequencies

    • Observed frequencies (bars) are hanging off the expected frequencies (model predictions, red line)

    • robs is the outcome values

    • Interpretation: Near perfect prediction for 0 precipitation (outcome variable), underfitting values of “1” precipitation

    • Compare with a bad model

      c(rootogram(distr_forest_fit, breaks = -9:14), rootogram(lm_fit,
      breaks = -9:14) |> autoplot(legend = TRUE)
      • lm model shows overfitting of outcome variable values 1-5 and underfitting the zeros.
      • The lm model doesn’t use a censored distribution so there’s an expectation of negative values
  • Reliability Diagram

    reliagram(fit)
    • Forecasted probabilities of an event vs observed frequencies
      • Basically a fitted vs observed plot
      • Forecast probabilites are binned (points on the line), 10 in this example, and averaged
    • Close to the dotted line indicates a good model
  • Worm plot

    wormplot(fit)
    • ? ( he didn’t describe this chart)
    • Guessing the dots on the zero line indicates a perfect model and dots inside the dashed lines indicates a good model
      • He said this model fit was reasonable but doesn’t look that great to me.