Statistical

Misc

  • Packages
  • For intermittent data, see Logistics >> Demand Planning >> Intermittent Demand
  • Let the context of the decision making process determine the units of the forecast
    • i.e. Don’t forecast on a hourly scale just because you can.
  • What can be forecast depends on the predictability of the event:
    1. How well we understand the factors that contribute to it;
      • We have a good idea of the contributing factors: electricity demand is driven largely by temperatures, with smaller effects for calendar variation such as holidays, and economic conditions.
    2. How much data is available;
      • There is usually several years of data on electricity demand available, and many decades of data on weather conditions.
    3. How similar the future is to the past;
      • For short-term forecasting (up to a few weeks), it is safe to assume that demand behaviour will be similar to what has been seen in the past.
    4. Whether the forecasts can affect the thing we are trying to forecast.
      • For most residential users, the price of electricity is not dependent on demand, and so the demand forecasts have little or no effect on consumer behaviour.
  • Starting a project
    • Understand the dgp through eda and talking to domain experts
      • How are sales generated? (e.g. online, brick and mortar,…)
    • What is the client currently using to forecast?
      • Model that you need to beat
      • Where does it fail?
        • Biased? underfitting or overfitting somewhere
        • Missing seasonality?
    • What is the loss function?
      • Carrying this many items in inventory results in this cost
      • If we’re out of stock and lose this many sales, how much does this cost
    • What does the client really want?
      • How is success measured
  • Fable models produce different results with NAs in the time series
    • In rolling cfr project, steinmetz’s manually calc’d rolling 7-day means and his lagged vars had NAs, models using data with and without NAs had different score
  • It is helpful to keep track of and understand what our forecast bias has historically been.  Even where we are fortunate enough to show a history of bias in both directions.
  • Forecasting shocks is difficult for an algorithm
    • Also see
    • It can better to smooth out (expected) shocks (Christmas) in the training data and then add an adjustment to the predictions during the dates of the shocks.
    • The smoothed out data will help the algorithm produce more accurate predictions for days when there isn’t an expected shock.
    • Examples of shocks that may need training data to have manual adjustments and not be smoothed by an algorithm
      • One-Time spikes due to abnormal weather conditions
      • One-Off promotions
      • A sustained marketing campaign that is indistinguishable from organic growth.
  • Intermittent(or sporadic) time series (lotsa zeros).
    • {thief} has the latest methods while {tsintermittent} has older methods
  • Benchmark models
    • Naive
    • 28-day moving average (i.e. 4 week MA)

Terms

  • Weak stationarity (commonly referred to as just stationarity)(aka covariance stationary) - Implies that the mean and the variance of the time series are finite and do not change with time.
  • Cointegration - \(x_t\) and \(y_t\) are cointegrated if \(x_t\) and \(y_t\) are \(I(1)\) series and there exists a \(\beta\) such that \(z_t = x_t - \beta y_t\) is an \(I(0)\) series (i.e. stationary).
    • Important for understanding stochastic or deterministic trends.
    • The differences in the means of the set of cointegrated series remain constant over time, without offering an indication of directionality
    • Might have low correlation, and highly correlated series might not be cointegrated at all.
    • Can use Error Correction Model (ECM) with differenced data and inserting a error correction term (residuals from a OLS regression)
  • Stochastic - No value of a variable is known with certainty. Some values may be more likely than others (probabilistic). Variable gets mapped onto a distribution.

Preprocessing

  • Filling in gaps
    • Bi-directional forecast method from AutoML for time series: advanced approaches with FEDOT framework
      • Steps
        1. Smooth series prior to the gap
          • They used a “Gaussian filter w/sigma = 2” (not sure what that is)
        2. Create lagged features of the smoothed series
        3. Forecast using ridge regression where h = length of gap
        4. Repeat in the opposite direction using the series after the gap
        5. Use the average of the forecasts to fill the gap in the series.
  • Log before differencing (SO post)
  • Detrend or Difference
    • (The goal is to get a stationary series, so if one doesn’t work try the other.)
    • Differencing (for unit root processes)(stochastic trend)
      • if the process requires differencing to be made stationary, then it is called difference stationary and possesses one or more unit roots.
        • Sometimes see charts of roots and a unit circle. I read this in an article about VAR models “process is stationary if all the roots \(z_1, \ldots , z_n\) of the determinant \(\det(\psi(z))\), or \(\det(I − Bz) = 0\), lie outside of the unit circle.”
      • One advantage of differencing over detrending to remove trend is that no parameters are estimated in the differencing operation.
      • One disadvantage, however, is that differencing does not yield an estimate of the stationary process
      • If the goal is to coerce the data to stationarity, then differencing may be more appropriate.
      • Differencing is also a viable tool if the trend is fixed
        • Random Walking looking series should be differenced and not detrended.
      • Backshift operator notation:
        • In general: \(\nabla^d = (1 − B)^d\)
          • Where \(d\) is the order of differencing
          • Fractional differencing is when \(0 \lt d \lt 1\)
            • When \(0 \lt d \lt 0.5\), the series is classified as a long term memory series (often used for environmental time series arising in hydrology)
          • If \(d\) is negative, then its called forward-shift differencing
        • Examples:
          • Identities:

            \[ B y_t = y_{t-1} \\ B^2 y_t = y_{t-2} \]

          • Seasonal Difference:

            \[ (1 - B)(1 - B^m) y_t = (1 - B - B^m + B^{m + 1})y_t = y_t - y_{t-1} - y_{t-m} + y_{t-m-1} \]

          • ARIMA : AR(p)I(d) = MA(q)

            \[ (1-\phi_1 B - \cdots - \phi_p B^p)(1-B)^d y_t = c+(1+\theta_1 B + \cdots + \theta_q B^q)\epsilon_t \]

          • ARIMA(1,1,1)(1,1,1)4 for quarterly data (m = 4)

            \[ (1-\phi_1 B)(1-\Phi B^4)(1-B)(1-B^4)y_t = (1+\theta_1 B)(1+\Theta B^4)\epsilon_t \]

    • Detrending (for trend-stationary processes)(deterministic trend)
      • It is possible for a time series to be non-stationary, yet have no unit root and be trend-stationary
        • a trend-stationary process is a stochastic process from which an underlying trend (function solely of time) can be removed (detrended), leaving a stationary process.
      • If an estimate of the stationary process is essential, then detrending may be more appropriate.
      • How is this back-transformed after forecasting?
        • maybe look at “forecasting with STL” section in fpp2
    • In both unit root and trend-stationary processes, the mean can be growing or decreasing over time; however, in the presence of a shock, trend-stationary processes are mean-reverting (i.e. transitory, the time series will converge again towards the growing mean, which was not affected by the shock) while unit-root processes have a permanent impact on the mean (i.e. no convergence over time).
    • Testing
      • KPSS test: H0 = Trend-Stationary, Ha = Unit Root.
        • urca::ur_kpss the H0 is stationarity
        • tseries::kpss.test(res, null = "Trend") where H0 is “trend-stationarity”
      • Dickey-Fuller tests: H0 = Unit Root, Ha = Stationary or Trend-Stationary depending on version
      • KPSS-type tests are intended to complement unit root tests, such as the Dickey–Fuller tests. By testing both the unit root hypothesis and the stationarity hypothesis, one can distinguish series that appear to be stationary, series that appear to have a unit root, and series for which the data (or the tests) are not sufficiently informative to be sure whether they are stationary.
    • Steps:
      1. ADF:
        1. If H0 rejected. The trend (if any) can be represented by a deterministic linear trend.
        2. If H0 is not rejected then we apply the KPSS test.
      2. KPSS :
        1. If H0 rejected then we conclude that there is a unit root and work with the first differences of the data.
          • Upon the first differences of the series we can test the significance of other regressors or choose an ARMA model.
        2. If H0 is not rejected then data doesn’t contain enough information. In this case it may be safer to work with the first differences of the series.
    • Steps when using an ARIMA:
      1. Suppose the series is not trending
        1. If the ADF test (without trend) rejects, then apply model directly
        2. If the ADF test (without trend) does not reject, then model after taking difference (maybe several times)
      2. Suppose the series is trending
        1. If the ADF test (with trend) rejects, then apply model after detrending the series
        2. If the ADF test (with trend) does not reject, then apply model after taking difference (maybe several times)

Diagnostics

  • Testing for significant difference between model forecasts
    • Nemenyi test
      • sutils::nemenyi
    • MCB
      • greybox::rmcb

Algorithms

Misc

  • Auto Arima, ETS and Theta are general-purpose methods particularly well-suited for monthly, quarterly and annual data
  • TBATS and STL will also handle multiple seasonalities such as arise in daily and weekly data.
  • When to try nonlinear models (see Forecasting, Nonlinear)
    • Linear prediction methods (e.g. ARIMA) don’t produce adequate predictions
    • Chaotic nature of the time series is obvious (e.g. frequent, unexplainable shocks that can’t be explained by noise)

Regression (including ARIMA)

  • Misc

    • Double check auto_arima, for the parameters, (p, d, q), one should pick q to be at least p (link)
    • Sometimes the error terms are called random shocks.
    • If using lm and there are NAs, make sure to use na.action = NULL else they get removed and therefore dates between variables won’t match-up. See lm doc for further details on best practices.
    • ARIMA models make h-step out predictions by iterating 1-step forward predictions and feeding the intermediate predictions in as if they were actual observations (0 are used for the errors)
    • Polynomial Autoregression (AR) models exponentiate the lags. So the design matrix includes the lags and the exponentiated series.
      • If the polynomial is order 3, then order 2 is also included. So, now, the design matrix would be the lags, the square of each lag, and the cube of each lag

      • Example:

        library(dplyr); library(timetk)
        # tbl w/polynomial design matrix of order 3
        # value is the ts values
        poly_tbl <- group_tbl %>%
          tk_augment_lags(.value = value, .lags = 1:4) %>%
          mutate(across(contains("lag"), 
                 .fns = list(~.x^2, ~.x^3), 
                 .names = "{.col}_{ifelse(.fn==1, 'quad','cube')}"))
        • .fn is the item number in the .fns list.
        • Squared lag 2 will have the name “value_lag2_quad”
  • Types

    • AR: single variable with autoregressive dependent variable terms
    • ARMA: same as AR but errors models as a moving average
    • ARIMA: same as ARMA but with differencing the timeseries
    • SARIMA: same as ARIMA but also with seasonal P, D, Q terms
    • ARMAX: same as ARMA but with additional exogenous predictors
    • Dynamic Regression: OLS regression with modeled (usually arima) errors
  • OLS vs ARIMA

    • John Mount
      • The fear in using standard regression for time series problems is that the error terms are likely correlated.
        • So one can no longer appeal to the Gauss Markov Theorem (i.e. OLS is BLUE) to be assured of good out of sample performance (link)
    • Ryer and Chan regarding Dynamic Regression vs OLS
      • “Regression (with arima errors) coefficient estimate on Price is similar to that from the OLS regression fit earlier, but the standard error of the estimate is about 10% lower than that from the simple OLS regression. This illustrates the general result that the simple OLS estimator is consistent but the associated standard error is generally not trustworthy”
    • Hyndman
      • “The forecasts from a regression model with autocorrelated errors are still unbiased, and so are not “wrong,” but they will usually have larger prediction intervals than they need to. Therefore we should always look at an ACF plot of the residuals.”
      • The estimated coefficients are no longer the best estimates, as some information has been ignored in the calculation;
        • Meaning modeling the errors to take into account the autocorrelation
      • Any statistical tests associated with the model (e.g., t-tests on the coefficients) will be incorrect.
        • Affected by the bloated std errors
      • The AICc values of the fitted models are no longer a good guide as to which is the best model for forecasting.
      • In most cases, the p-values associated with the coefficients will be too small, and so some predictor variables will appear to be important when they are not. This is known as “spurious regression.”
    • Removing measurements to mitigate autocorrelation
      • From a paper where precipitation amounts are modeled, “we only keep one [of] every three hourly records of precipitation so as to ensure independency between samples.”
  • Trend (\(\beta\) \(t\)) is modeled by setting the variable \(t\) to just an index variable (i.e. \(t = 1, \ldots, T\)). Modeling quadratic trend would be adding in \(t^2\) to the model formula.

    • Hyndman suggests that using splines is a better approach than using t2
    • From Steinmitz’s CFR article
      • Instead of trend (like in {forecast}), he’s using poly(date, 2) to include a quadratic trend
  • R2 and Adjusted-R2

    • Appropriate for time series (i.e. estimate of the population R2), as long as the data are stationary and weakly dependent
      • i.e. The variances of both the errors and the dependent variable do not change over time.
      • i.e. If \(y_t\) has a unit root (Integrated of order 1, I(1)) (needs differenced)
  • Interpretation of coefficients

    \[ y_t = \alpha + \beta_0 x_t + \beta_1 x_{t-1} + \cdots + \beta_s x_{t-s} + \cdots + \beta_q x_{t-q} \]

    • If \(x\) increases by one unit today, the change in \(y\) will be \(\beta_0+\beta_1+...+\beta_s\) after \(s\) periods; This quantity is called the \(s\)-period interim multiplier. The total multiplier is equal to the sum of all \(\beta\) s in the model.
  • Residuals

    • Types
      • “Regression” is for the main model
        • Original data minus the effect of the regression variables
      • “Innovation” is for the error model
        • Default arg
        • Hyndman uses these for dynamic regression residual tests
    • Autocorrelation tests
      • Failing the test does not necessarily mean that (a) the model produces poor forecasts; or (b) that the prediction intervals are inaccurate. It suggests that there is a little more information in the data than is captured in the model. But it might not matter much.
      • Breusch-Godfrey test designed for pure regression or straight AR model
        • Does handle models with lagged dependent vars as predictors
        • LM (lagrange multiplier) test
        • forecast::checkresiduals can calculate it and display it, but you don’t have access to the values programmatically
          • Defaults for lag is \(\min(10,n/5)\) for nonseasonal and \(\min(2m, n/5)\) for seasonal where the frequency is seasonality, m

            lag <- ifelse(freq > 1, 2 * freq, 10)
            lag <- min(lag, round(length(residuals)/5))
            lag <- max(df+3, lag)
        • {lmtest} and {DescTools} (active) packages have the function that forecast uses but only takes lm objects
      • Durbin-Watson designed for pure regression
        • Error term can’t be correlated with predictor to use this test
          • So no lagged dependent variables can be used as predictors
          • There is an durbin alternate test mentioned in stata literature that can do lagged variables but I haven’t seen a R version that specifies that’s the version it is.
        • {lmtest} and {DescTools} takes a lm object and has a small sample size correction available
        • {car::durbinWatsonTest} takes a lm object or residual vector.
          • Only lm returns p-value. Residual vector returns DW statistic
        • p-values \(\lt 0.05\) \(\rightarrow\) Autocorrelation present
        • DW statistic guide (\(0 \lt \text{DW} \lt 4\))
          • Around 2 \(\rightarrow\) No Autocorrelation
          • Signifcantly \(\lt 2\) \(\rightarrow\) Positive Correlation
            • Saw values \(\lt 1\) have p-values = 0
          • Significantly \(\gt 2\) \(\rightarrow\) Negative Correlation
      • Ljung-Box
        • For dynamic regression, arima, ets, etc.
          • There’s a SO post that shows this shouldn’t be used for straight regression
            • For straight AR models, the comments show it should be fine as long as lags \(\gt\) model [df]{arg-text} (see below)
        • Test is whether a group of lagged residuals has significant autocorrelation, so an acf of the residuals might show individual spikes but the group as a whole may not have significant autocorrelation
          • If you see a spike in the residuals, may be interesting to include that lag number in the group of lags and see if significance of the group changes
        • {feasts::ljung_box}
          • Requires numeric residuals vector, model degrees of freedom, number of lags to check
            • The model df is number of variables used in the regression + intercept + p + q (of ARIMA error model)
              • e.g. Model with predictors: trend + cases and an error model: arima (2,1,1) had df = 2 (predictors: trend, cases) + 1 (intercept) + 2 (p) + 1 (q) = 6 d.f.
              • dof <- length(fit$coef)
            • See Breusch-Godfrey section for number of lags to use
        • p-values \(\lt 0.05\) \(\rightarrow\) autocorrelation present
  • Spectral analysis takes the approach of specifying a time series as a function of trigonometric components (i.e. Regression with fourier terms)

    • A smoothed version of the periodogram, called a spectral density, can also be constructed and is generally preferred to the periodogram.

Random Walk

  • A process integrated to order 1, (an I(1) process) is one where its rate of change is stationary. Brownian motion is a canonical I(1) process because its rate of change is Gaussian white noise, which is stationary. But the random walk itself is not stationary. So the \(t+1\) value of a random walk is just the value at \(t\) plus a number sampled from some bell curve.

  • Characteristics

    • Long periods of apparent trends up or down
    • Sudden and unpredictable changes in direction
  • Also see

  • A special case of an autoregressive model

    \[ y_t = c + \phi_1 y_{t-1} + \cdots + \phi_p y_{t-p} + \epsilon_t \]

    • Where \(c=0\), \(p=1\), \(\phi = 1\), and \(\epsilon \sim \mathcal {N}(0, s)\)
  • Drift

    Feature Random Walk without Drift Random Walk with Drift
    Steps Purely random, equal probability left/right Biased, one direction slightly more likely
    Change in value Average change is zero Average change includes a constant drift
    Path Zig-zag around starting point Zig-zag with upward/downward trend
    Mean Stays roughly the same Increases/decreases over time depending on drift
    Variance Increases with time Increases with time
    Stationarity Non-stationary Non-stationary
  • Examples with and without drift



Prophet

  • The basic methodology is an iterative curve-matching routine, where Prophet will then train your data on a bigger period, then predict again and this will repeat until the end point is reached.
  • The development team of Prophet claim that its strengths are:
    • Working with high-frequency data (hourly, daily, or weekly) with multi-seasonality, such as hour of day, day of week and time of year;
    • Special events and bank holidays that are not fixed in the year;
    • Allowing for the presence of a reasonable number of missing values or large outliers;
    • Accounting for changes in the historical trends and non-linear growth curves in a dataset.
  • Further advantages include the ability to train from a moderate sized dataset, without the need for specialist commercial software, and fast start up times for development.
  • Disadvantages
    • No autoregressive (i.e. lags of target series) features since it’s a curve-fitting algorithm
  • Time series decomposition by prophet:
    • \(g(t)\): Logistic or linear growth trend with optional linear splines (linear in the exponent for the logistic growth). The library calls the knots “change points.”
    • \(s(t)\): Sine and cosine (i.e. Fourier series) for seasonal terms.
    • \(h(t)\): Gaussian functions (bell curves) for holiday effects (instead of dummies, to make the effect smoother).

Kalman Filter

Misc

  • Notes from How a Kalman filter works, in pictures
  • If a dynamic system is linear and with Gaussian noise (inaccurate measurements, etc.), the optimal estimator of the hidden states is the Kalman Filter
    • For nonlinear systems, we use the extended Kalman filter, which works by simply linearizing the predictions and measurements about their mean. (I may do a second write-up on the EKF in the future)
    • Good for predictions where the measurements of the outcome variable over time can be noisy
  • Assumptions
    • Gaussian noise
    • Markov property
      • If you know \(x_{t−1}\), then knowledge of \(x_{t−2},\ldots , x_0\) doesn’t give any more information about xt (i.e. not much autocorrelation if at all)
  • tl;dr
    • A predicted value from a physically-determined autoregression-type equation with 1 lag that gets adjusted for measurement error
  • Advantages
    • Light on memory (they don’t need to keep any history other than the previous state)
    • Very fast, making them well suited for real time problems and embedded systems
  • Use cases
    • Engineering: common for reducing noise from sensor signals (i.e. smoothing out measurements)
    • Detection-based object tracking (computer vision)

First set of equations

  • Notes
    • This set of equations deals physical part of the system. It’s kinda how we typically forecast.
      • The \(\hat x_k\) equation is pretty much like a typical auto-regression plus explanatory variables except for the F matrix which may require knowledge of system dynamics
    • Wiki shows a term, \(w_k\), added to the end of the \(\hat x_k\) equation. \(w_k\) is the process noise and is assumed to be drawn from a zero mean multivariate normal distribution,
  • The new best estimate is a prediction made from previous best estimate, plus a correction for known external influences.
    • \(\hat x_k\): The step-ahead predicted “state”; \(\hat x_{k-1}\) is the current “state”
    • \(u_k\) (“control” vector): An explanatory variable(s)
    • \(F_k\) (“prediction” matrix) and \(B_k\) (“control” matrix) are transformation matrices
      • \(F_k\) was based on one of Galileo’s equations of motion in the example so this might be very context specific
      • Might need to based on substantial knowledge of the system to create a system of linear equations (i.e. \(F_k\) matrix) that can be used to model the it.
  • And the new uncertainty is predicted from the old uncertainty, with some additional uncertainty from the environment.
    • \(P_k\) and \(P_{k-1}\) are variance/covariance matrices for the step-ahead predicted state and current state respectively
    • \(Q_k\) is the uncertainty term for the variance/covariance matrix of the predicted state distribution

Second Set of Equations

 

  • Notes
    • These equations refine the prediction of the first set of equations by taking into account various sources of measurement error in the observed outcome variable
    • The equations do this by finding the intersection, which is itself a distribution, of the transformed prediction distribution, \(μ_0\), and the measurement distribution, \(μ_1\).
      • This mean, \(\mu'\), of this intersection distribution is the predicted value that most likely to be the true value
  • \(H_k\) is a transformation matrix that maps the predicted state (result of the first set of equations), \(\hat x_k\) , to the measurement space
    • Where \(H_k \cdot \hat x_k\) is the expected measurement (pink area) (i.e. Mean of the distribution of transformed prediction)
  • \(\vec z_k\) is the mean of the measurement distribution (green area)
  • \(\hat x_k'\) is the intersection of the transformed prediction distribution and the measurement distribution (i.e. the predicted state thats most likely to true)
  • \(R_k\) is the uncertainty term for variance/covariance matrix for the measurement distribution
  • \(K'\) is called the Kalman Gain
    • Didn’t read anything interpretative about the value. Just seems to a mathematical construct that’s part of the derivation.
    • In the derivation, it starts out as the ratio of the measurement covariance matrix to the sum of the measurement variance covariance matrix and the transformed prediction variance covariance matrix

Process

Hyperparameters

  • \(Q\) is the process noise covariance
    • Controls how sensitive the model will be to process noise.
  • \(R\) is the measurement noise variance
    • Controls how quickly the model adapts to changes in the hidden state.
      • Guessing “std” is the default value?

Exponential Smoothing

  • The general idea is that future values are a weighted average of past values, with the weights decaying exponentially as we go back in time
  • Methods
    • Simple Exponential Smoothing
    • double Exponential Smoothing or Holt’s Method (for time series with a trend)
    • Triple Exponential Smoothing or Holt-Winter’s method (for time series with a trend and sesaonality)

TBATS

  • Trigonometric seasonality, Box-Cox transformation, ARMA errors, Trend, and Seasonal components
  • Can treat non-linear data, solve the autocorrelation problem in residuals since it uses an ARMA model, and it can take into account multiple seasonal periods
  • Represents each seasonal period as a trigonometric representation based on Fourier series. This allows the model to fit large seasonal periods and non-integer seasonal periods

Interval Forecasting

  • Notes from Video: ISF 2021 Keynote
  • Interval data is commonly analyzed by modeling the range (difference between interval points)
    • Range data doesn’t provide information about the variation of the mean (aka level) over time.
    • Range only provides information about the boundaries, where interval analysis provides information about the boundary and the interior of the interval.
  • Provides more information than point forecasts.
  • Data examples:
    • Daily Temperature, Stock Prices: Each day a high and low values are recorded
    • Stock Price Volatility, Bid-Ask spread use hi-lo value differences
    • Intra-House Inequality: difference between wife and husband earnings
    • Urban-Rural income gap
    • Interval-Valued Output Growth Rate: China reports it’s targeted growth rate as a range now.
    • Diastolic and Systolic blood pressure
      • Others: Blood Lipid, White Blood Cell Count, Hemoglobin
  • Examples where (generalized) intervals can be modeled instead of differences:
    • Stock Volatility
      • GARCH models often used to model volitility but Conditional Autoregressive Range (CARR) gives better forecasts
        • Because GARCH model is only based on the closing price but the CARR model uses the range (difference).
      • Dynamic Interval Modeling
        • Use Autoregressive Interval (ARI) model to estimate the parameters using an interval time series (not the range)
        • Then take the forecasted left and right values of the interval to forecast the volatility range in a CARR model
        • The extra information of the interval data over time (instead of a daily range) yields a more efficient estimation of the parameters
    • Capital Asset Pricing Model (CAPM)
      • Also see Finance, Valuation >> Cost of Capital >> WACC >> Cost of Equity

      • Standard Equation

        \[ R_t - R_{ft} = \alpha + β(R_{mt} - R_{ft}) + \epsilon_t \]

        • \(R_t\): Return of Certain Portfolio
        • \(R_{ft}\): Risk-Free Interest Fate
        • \(R_{mt}\): Return of Market Portfolio
        • \(R_t - R_{ft}\): Asset Risk Premium
      • Interval-based version

        \[ Y_t = (\alpha_0 + \beta_0I_0) + \beta X_t + u_t \]

        • \(I_0 = [-0.5, 0.5]\)
        • \(Y_t = [R_{ft}, R_t]\)
        • \(X_t = [R_{ft}, R_{mt}]\)
        • The \(R_t - R_{ft}\) can then be calculated by taking the difference of the interval bounds of the interval-based predictions
  • Model the center of the interval and the range in a bi-variate VAR model (doesn’t use all points in the interval data)
    • Bi-variate Nonlinear Autoregressive Model for center and range
      • Has an indicator variable that captures nonlinearity of interval data
    • Space-time autoregressive model
  • Autoregressive Conditional Interval model
    • The interval version of an ARMA model

      • depends on lags and lagged residuals
    • ACI(p,q):

      \[ Y_t = (\alpha_0 + \beta_0 I_0) + \sum_{j=`}^p \beta_jY_{t-j} + \sum_{j=1}^p \gamma_ju_{t-j} + u_t \]

      • \(\alpha_0\), \(\beta_0\), \(\beta_j\), \(\gamma_j\) are unknown scalar parameters
      • \(I_0 = [-\frac{1}{2},\; \frac{1}{2}]\) is a unit interval
      • \(\alpha_0 + \beta_0I_0 = [\frac{\alpha_0 - \beta_0}{2},\: \frac{\alpha_0 + \beta_0}{2}]\) is a constant interval intercept
      • \(u_t\) is the interval residuals that satisfies \(\mathbb{E}(u_t\;|\;I_{t-1}) = [0,0]\)
      • \(Y_t\) is a random interval variable
    • Objective function that gets minimized is called \(D_k\) distance

      • \(D^2_K [u_t(\theta),0]\)
        • \(u_t(\theta)\) is the interval residuals
        • \(K\) refers to some kind of kernel function
          • It’s a wacky quadratic with constants a,b,c
          • Measures the distance between all pairs of points
      • The minimization is a two-stage process
        • Finds the optimal kernel, \(K\), then uses it to minimize the residuals to estimate the parameters
  • Threshold Autoregressive Interval (TARI)
    • Nonlinear ACI model and interval version of TAR(p) model (¯\_(ツ)_/¯)
    • 2-Procedure Model
      • Basically 2 autoregressive equations with an \(i_1u_t\) or \(i_2u_t\) added on to the end.
      • The interval series, \(Y_t\) ,follows one of the equations based on threshold variable \(q_t\) is less than or equal to a threshold parameter, \(\gamma\) or greater than.
    • Estimation is similar to ACI model
    • For more details, need to research what a TAR model (Terasvirta, Tjostheim, and Granger 2010) is

Count Forecasting

  • Misc
    • Packages
      • {tscount} - Poisson and Negative Binomial likelihood-based methods for model fitting and assessment, prediction, and intervention analysis of count time series following generalized linear models.
      • {ZINARp} - Simulation, exploratory data analysis and Bayesian analysis of the parametric, p-order Integer-valued Autoregressive (INAR(p)) and Zero-inflated p-order Integer-valued Autoregressive (ZINAR(p)) processes with Poisson or zero-inflated Poisson likelihoods.
      • {spINAR} (JOSS) - Procedures for simulation, estimation, and bootstrapping INAR models within parametric and semiparametric setups.
  • Integer-Valued Autoregressive Models (INAR)
    • \(\mbox{INAR}(p)\) is an analog to an \(\mbox{AR}(p)\) model
      \[ \begin{aligned} &Y_t = \alpha_1 Y_{t-1} + \alpha_2 Y_{t-2} + \cdots + \alpha_p Y_{t-p} + \epsilon_t \\ &\begin{aligned} \text{with} \; &\epsilon \overset{\text{iid}}{\sim} G \;\text{and}\; G \in \mathbb{N}_0 = \{0,1,2,\ldots\} \\ &\alpha_i \in (0,1) \;\text{and}\; \sum_{i=1}^p \alpha_i < 1 \end{aligned} \end{aligned} \]
    • Binomial Thinning, where \(\circ\) is the binomial thinning operator
      \[ \alpha_i \circ Y_{t-i} = \sum_{j=1}^{Y_{t-i}} Z_j^{(t,i)} \;\text{and}\; Z_j^{(t,i)} \sim \mbox{Binomial}(1, \alpha_i) \]