Quantile

Misc

Packages

{quantregRanger} - uses Ranger to fit quantile RFs
- In {tidymodels}, quantreg = TRUE tells ranger that you’re estimating quantiles rather than averages. Also predict(airquality, type = 'quantiles')
{grf} - generalized random forest
{quantreg} - Estimation and inference methods for models for conditional quantile functions: Linear and nonlinear parametric and non-parametric (total variation penalized) models for conditional quantiles of a univariate response.
{partykit} - Conditional inference trees; model-based recursive partitioning trees
- {bonsai}: tidymodels partykit conditional trees, forests; successor to treesnip - Model Wrappers for Tree-Based Models
{{quantile-forest}} - Zillow’s sklearn compatible quantile forest. Compared to other python implementations, optimized for training and inference speed, enabling it to scale to millions of samples with a runtime that is orders of magnitude faster than less-optimized solutions. It also allows specifying prediction quantiles after training, permitting a trained model to be reused to estimate conditional quantiles as needed.
- Out-of-Bag Scoring: OOB scoring can be used to obtain unbiased estimates of prediction errors and quantile-specific metrics without the need for additional validation datasets.
- Quantile Rank Calculation: Provide a measure of relative standing for each data point in the distribution. Allows you to compare and rank observations based on their position within the quantile distribution, providing valuable insights for various applications, such as risk assessment and anomaly detection.
- Proximity and Similarity Estimation: Quantifies the similarity between pairs of observations based on their paths through the forest. Useful for clustering, anomaly detection, and identifying influential observations.
{{skgarden}} - Extension for sklearn tree and forest models. Produces online training models called Mondrian Forests (paper). Has a quantile random forest flavor.
{qrnn}: Quantile Regression Neural Network
- Fit quantile regression neural network models with optional left censoring, partial monotonicity constraints, generalized additive model constraints, and the ability to fit multiple non-crossing quantile functions.
{qrcm} - A parsimonious parametric approach that directly models the linear regression coefficients as smooth functions of q, which succeeds in effectively pooling information across quantile levels. It also estimates different quantile coefficients simultaneously.
- Note that Quantile RFs simulaneously estimate the entire conditional distribution
- Benefits of Simultaneous Estimation:
  - Computational Efficiency: Reduces overall computation time compared to fitting each quantile separately.
  - No Quantile Crossing: Crossing violates the basic principle that higher quantiles should always have higher values than lower quantiles for any given set of predictor variables. This also violates the fundamental properties of cumulative distribution functions, which should be monotonically increasing.
  - Improved Stability: The joint estimation can lead to more stable estimates, especially in smaller samples or when dealing with extreme quantiles. In regions where data is sparse, borrowing information across quantiles can lead to more robust estimates.
  - Enhanced inference: Simultaneous estimation allows for easier joint hypothesis testing across multiple quantiles.
{qrcmNP} - Uses the method in {qrcm} for nonlinear and penalized parametric modeling of quantile regression coefficient functions.
{fastkqr} (paper) - A Fast Algorithm for Kernel Quantile Regression
- Efficient algorithm to fit and tune kernel quantile regression models based on the majorization-minimization (MM) method.
- Fits multiple quantile curves simultaneously without crossing.
{rquest} - Functions to conduct hypothesis tests and derive confidence intervals for quantiles, linear combinations of quantiles, ratios of dependent linear combinations and differences and ratios of all of the above for comparisons between independent samples. Additionally, quantile-based measures of inequality are also considered.
{bayesQR} - Bayesian quantile regression using the Asymmetric Laplace (AL) distribution, both continuous as well as binary dependent variables are supported
- CIs have bad coverage for n < 500 because the standard errors are extremely biased (See {IJSE} paper)
{pqrBayes} (Paper) - Bayesian Penalized Quantile Varying Coefficient Regression
- Incorporates spike-and-slab prior
- The varying part seems to be a spline function
{randomForestSRC} - Fast Unified Random Forests for Survival, Regression, and Classification (RF-SRC) (also Quantile Regression)
{vinereg} - D-vine copula based mean and quantile regression.
{expectreg} - Expectile and quantile regression of models with nonlinear effects e.g. spatial, random, ridge using least asymmetric weighed squares / absolutes as well as boosting; also supplies expectiles for common distributions
- See Loss Functions >> Expectile Loss
{hdqr} - Implements an efficient algorithm to fit and tune penalized quantile regression models using the generalized coordinate descent algorithm.
- Designed to handle high-dimensional datasets effectively, with emphasis on precision and computational efficiency.
{erboost} - Nonparametric Multiple Expectile Regression via ER-Boost
{QR.break} - Methods for detecting structural breaks, determining the number of breaks, and estimating break locations in linear quantile regression, using one or multiple quantiles

Used to estimate the conditional quantiles of a target variable
- Example: Assume we have a quantile regression model predicting the demand for apples tomorrow. Our model forecasts the 90th quantile as 100, which means that according to the model, there is a 90% probability that the actual demand will be 100 or lower.
Quantile regression is a weighted Least Absolute Deviation (LAD) Regression
- Quantile: Minimizes a weighted residual sum of absolute deviations
- LAD: Minimizes the residual sum of absolute deviations (i.e. MAE)
  - {{scikit-lego::LADRegression}}
  - Can optimize for the lowest MAPE (Mean Average Percentage Error), by providing sample_weight=np.abs(1/y_train)
Also see
- Loss Functions >> Quantile Loss
Resources
- Handbook of Quantile Regression - Koenker ({quantreg} book) (see R >> Documents >> Regression)
- Applied Machine Learning Using mlr3 in R, Ch.13.6
Papers
- Cross validation for penalized quantile regression with a case-weight adjusted solution path
  - Code for efficiently finding influential observations
For quantiles > 0.80, see quantile models in Extreme Value Theory (EVT)
- Quantile Loss is not effective at predicting tail events
Harrell: To characterize an entire distribution or in other words, have a “high degree of confidence that no estimated quantile will be off by more than a probability of 0.01, n = 18,400 will achieve this.
- For example, with n = 18,400, the sample 0.25 quantile (first quartile) may correspond to population quantiles 0.24-0.26.
- To achieve a \(\pm\) 0.1 MOE requires n = 180, and to have \(\pm\) 0.05 requires n = 730 (see table)
```
#>        n   MOE
#> 1     20 0.294
#> 2     50 0.188
#> 3    100 0.134
#> 4    180 0.100
#> 5    250 0.085
#> 6    500 0.060
#> 7    730 0.050
#> 8    750 0.049
#> 9   1000 0.043
#> 10  2935 0.025
#> 11  5000 0.019
#> 12 10000 0.014
#> 13 18400 0.010
```
Bayesian Quantile Regression
- Notes from Valid standard errors for Bayesian quantile regression with clustered and independent data
- The most commonly used likelihood is the asymmetric Laplace (AL) likelihood
- AL-based quantile regression has been shown to produce good finite-sample Bayesian point estimates and to be consistent. However, if the AL distribution does not correspond to the data-generating distribution, credible intervals based on posterior standard deviations can have poor coverage.
- Yang, Wang, and He (2016) adjustment to the posterior covariance matrix is sensitive to the choice of scale parameter of the AL likelihood and also leads to CIs with poor coverage for small data.
- Most common prior for \(\sigma\) is an inverse gamma
  - Others are a uniform prior, \(\mathbb{U}(0,10)\) and a half-t distribution with 3 dof which is used by {brms}
- {IJSE} - Infinitesimal jackknife standard errors for {brms} models — clustered and independent data. Only requires the model object and the clustering variable if appropiate.
  - Applicable for other models besides quantile regression, whenever frequentist standard errors are required or model-based posterior standard deviations are not valid due to model misspecification.
Harrell has a pretty cool text effect to display quantile values in his {HMisc::describe} that uses {gt} under the hood (See EDA >> Packages >> HMisc)
- Histogram is a sparkline