R4EPIs - A project to develop standardized data cleaning, analysis and reporting tools to cover common types of outbreaks and population-based surveys that would be conducted in an MSF emergency response setting (automated outbreak templates)
RECON Learn - Free and open training resources to respond to outbreaks, health emergencies and humanitarian crises
Case Line Lists - Data tables in which every line is a different case/patient, and columns record different variables of potential epidemiological interest such as date of events (e.g. onset of symptom, case notification), disease outcome, or patient data (e.g. age, sex, occupation).
Disease Mapping
Goals
Provide accurate estimates of mortality/incidence risks or rates in space and time
Unveil underlying spatial and spatio-temporal patterns
Detect high-risk areas or hotspots
Risk estimation using metrics such as Standardized Mortality Ratio (SMR) when analyzing rare diseases or low-populated areas are highly variable over time, so it’s diffficult to spot patterns and form hypotheses
SMR = Observed number of cases / Expected number of cases
SMR > 1: risk is greater than the whole region under study
Guessing “Expected number of cases” is the average number of cases for the whole study region
Statistical models smooth risks by borrowing information from spatio-temporal neighbors
The smoothed gradient over the entire study region makes it easier to detect patterns and form hypotheses than highly variable, local area metric estimates (e.g. SMR in a low populated county)
Traditional Models
Types
Mixed Poisson with conditional autoregressive (CAR) priors for “space” and random walk priors for “time” that include space ⨯ time interactions (Knorr-Held, 2000, Bayesian modeling of inseperable space-time variation in disease risk)
Reduced rank multidimensional P-splines (Ugarte et al, 2017, One-dimensional, two-dimensional, and three dimensional B-splines to specify space-time interactions in Bayesian disease mapping)
Issues
Estimating the cov-var matrix becomes intractable with big data and many areas since the covariance must be estimated between each pair of areas
CAR models assume the same level of spatial dependence between all areas which isn’t likely.
Scalable non-stationary Bayesian models for high-dim, count data
Dependencies
Uses {future} for distributed computing
Integrated, nested laplace approximation (INLA) method through {R-INLA}
K-order neighborhood model
Breaks up local spatial or spatio-temporal domains so that estimations can distributed and local area dependencies (neighborhoods) can be accounted for.
“Areas” are usually districts, counties, provinces, etc.
Package does provide a method to create a “random” area grid
Might be useful to compare a random grid model with the e.g. county model to see how much county boundaries influence the estimates
Each local area model includes k adjacent areas which creates a partition
The local area estimate is smoothed by taking information from the adjacent areas
Adjacent areas also have estimate posteriors computed
Each area will have multiple posterior estimates from local area models where the area is the local area or where it is the adjacent area
Merge or don’t merge estimate posteriors for each area
Merge: use weights proportional to the conditional predictive ordinates (CPO) ???
Don’t Merge: Use the posterior marginal risk estimates of an area corresponding to the original submodel.
i.e. use the posterior where the area is the “local area” in that local area model and not an adjacent area.
Primary functions
CAR.INLA() fits several spatial CAR models for high dim count data
STCAR.INLA() fits several spatio-temporal CAR models for high dim count data