A suitably flexible Bayesian regression adjustment model,
Chosen by cross-validation/LOO,
Including Gaussian processes for the unit-level effects over time (and space/network if relevant),
Imputation of missing data, and
Informative priors for biases in the data collection process.
Discrete Parameters
Models with discrete parameters arise in a range of statistical motifs including hidden Markov models, finite mixture models, and generally in the presence of unobserved categorical data
HMC cannot operate on models containing discrete parameters. HMC relies on gradient information to guide its exploration of the parameter space. Discrete parameters don’t have well-defined gradients.
Using HMC would require marginalization of the likelihood to remove these discrete dimensions from the sampling problem (i.e. integrating out discrete variables).
This can result in loss of information
Depending on the number of discrete parameters, it can be computationally instensive or intractable.
The direct relationship between discrete parameters and the data is obscured.
Potentially requiring more samples to achieve the same level of accuracy due to slower mixing
Can be complex and error-prone for intricate models
{nimbleHMC} (JOSS) can perform HMC sampling of hierarchical models that also contain discrete parameters. It allows for HMC sampling operating alongside discrete samplers.
A workflow for a problem with discrete parameters should consist of testing combinations of samplers in order to optimize MCMC efficiency
{nimbleHMC} allows you to mix-and-match samplers from a large pool of candidates.
{compareMCMCs} (JOSS) - Compares MCMC Efficiency from ‘nimble’ and/or Other MCMC Engines
Built-in metrics include two methods of estimating effective sample size (ESS), posterior summaries such as mean and common quantiles, efficiency defined as ESS per computation time, rate defined as computation time per ESS, and minimum efficiency per MCMC.
Amortized Bayesian Inference uses deep neural networks to learn a direct mapping from observables, \(y\), to the corresponding posterior, \(p(\theta | y)\)
i.e. Approximates posterior distributions for faster parameter estimation.
Popular for simulation-based inference (SBI) but is expanding beyond SBI
Training Stage: neural networks learn to distill information from the probabilistic model based on simulated examples of observations and parameters, \((\theta | y) \sim p(\theta) \;p(y|\theta)\)
Inference Stage: neural networks approximate the posterior distribution for an unseen data set, \(y_\text{obs}\) in near-instant time without repeating the training stage