Probabilistic

Misc

AIC vs BIC (paper)
- Lower is Better (which is the model that minimizes the information loss)
- Assumptions
  - Likelihoods of the models being compared is calculated the same way
    - e.g. ARIMA and ETS models can’t be compared with AIC or BIC
  - The number of observations that the models were trained on must be the same
    - A model that uses more lags of a variable than another model will be fit with a different number of observations.
- AIC
  \[ \begin{align} \text{AIC} &= 2k - 2\ln (\hat L) \\ &= T\log \left(\frac{\text{SSE}}{T}\right) + 2(K+2) \end{align} \]
  - \(\hat L\) is the maximized likelihood
  - \(T\) is the number of observations and \(K\) is the number of parameters
  - Penalizes parameters by 2 points per parameter
  - Ideal AIC scenario
    - Numerous hypotheses are considered
    - You have a conviction that all of them are to differing degrees wrong
- AIC_c
  \[ \text{AIC}_c = \text{AIC} + \frac{2(K+2)(K+3)}{T-K-3} \]
  - Small sample size corrected
- BIC
  \[ \text{BIC} = T\log \left(\frac{\text{SSE}}{T}\right) + (K+2)\log(T) \]
  - Penalizes parameters by ln(sample size) points per parameter and ln(20) = 2.996
  - Almost always a stronger penalty in practice
  - Ideal BIC scenario
    - Only a few potential hypotheses are considered
    - One of the hypotheses is (essentially) correct

Scores

Continuous Ranked Probability Score (CRPS)
- fabletools::accuracy
- {loo} - crps(), scrps(), loo_crps(), and loo_scrps() for computing the (Scaled) Continuously Ranked Probability Score
- Manual calculation (article)
- Measures forecast distribution accuracy
- Combines a MAE score with the spread of simulated point forecasts
- See notebook (pg 172)
Winkler Score
- fabletools::accuracy
- Measures how well a forecast is covered by the prediction intervals (PI)
- See notebook (pg 172)

Visual Inspection

Check how well the predicted distribution matches the observed distribution
{topmodels} currently supported models:
- lm, glm, glm.nb, hurdle, zeroinfl, zerotrunc, crch, disttree, and models from {disttree}, {crch}
- Also see video, Probability Distribution Forecasts: Learning from Random Forests and Graphical Assessment
autoplot produces a ggplot object that can be used for further customization
(Randomized) quantile-quantile residuals plot
```
qqrplot(distr_forest_fit)
```
- Quantiles of the standard normal distribution vs quantile residuals (regular ole q-q plot)
- Interpretation
  - Pretty good fit as the points stick pretty close to the line (red dot is the laser pointer from the dude giving the talk)
  - Left and right tails show deviation.
  - The left tail also shows increased uncertainty due the censored distribution that was used to fit the model
- Compare with a bad model
```
c(qqrplot(distr_forest_fit, plot = FALSE), qqrplot(lm_fit, plot = FALSE)) |> autoplot(legend = TRUE, single_graph = TRUE, col = 1:2)
```
(Randomized) quantile-quantile residuals plot
```
pithist(distr_forest_fit)
```
- Compares the value that the predictive CDF attains at the observation with the uniform distribution
- The flatter the histogram, the better the model.
- Interpretation: As with the q-q, this model shows some deviations at the tails but is more or less pretty flat
- Compare with a bad model
```
c(pithist(distr_forest_fit, plot = FALSE), pithist(lm_fit, plot = FALSE) |> autoplot(legend = TRUE, style = "lines", single_graph = TRUE, col = 1:2)
```
(Hanging) Rootogram
```
rootogram(distr_forest_fit)
```
- Compares whether the observed frequencies match the expected frequencies
- Observed frequencies (bars) are hanging off the expected frequencies (model predictions, red line)
- robs is the outcome values
- Interpretation: Near perfect prediction for 0 precipitation (outcome variable), underfitting values of “1” precipitation
- Compare with a bad model
```
c(rootogram(distr_forest_fit, breaks = -9:14), rootogram(lm_fit,
breaks = -9:14) |> autoplot(legend = TRUE)
```
  - lm model shows overfitting of outcome variable values 1-5 and underfitting the zeros.
  - The lm model doesn’t use a censored distribution so there’s an expectation of negative values
Reliability Diagram
```
reliagram(fit)
```
- Forecasted probabilities of an event vs observed frequencies
  - Basically a fitted vs observed plot
  - Forecast probabilites are binned (points on the line), 10 in this example, and averaged
- Close to the dotted line indicates a good model
Worm plot
```
wormplot(fit)
```
- ? ( he didn’t describe this chart)
- Guessing the dots on the zero line indicates a perfect model and dots inside the dashed lines indicates a good model
  - He said this model fit was reasonable but doesn’t look that great to me.