ANOVA
Misc
- Notes from
- Resources
- Power Analysis with Superpower
- Shiny Apps for demystifying statistical models and methods
- Factorial ANOVA, Mixed effects ANOVA, Mixed effects with random slopes, Logistic regression, Multiple comparisons, ANCOVA, One-way ANOVA, Odds ratio vs relative risk, Correlation coefficient vs. slope Repeated sampling app
- Packages
- {car} -
Anova
function that computes all 3 types of ANOVA table- Can also be applied to
glm
models to produce Analysis of Deviance tables (e.g. logistic, poisson, etc.) - Think the other packages wrap this function, so they can be used instead in order to advantage of their plotting, testing conveniences.
- Can also be applied to
- {grafify} - ANOVA wrappers, plotting, wrappers for {emmeans}
- {afex} - Analysis of Factorial EXperiments
- ANOVA helper functions that fit the
lm
, center, apply contrasts, etc. in one line of code- Example:
afex::aov_car(Y ~ group * condition + Error(id), data = d)
- Type III used, Factor variables created, Sum-to-Zero contrast is applied
- Example:
- Effect plotting functions
- ANOVA helper functions that fit the
- {car} -
- ANOVA vs. Regression (GPT-3.5)
- Different Research Questions:
- ANOVA is typically used when you want to compare the means of three or more groups to determine if there are statistically significant differences among them. It’s suited for situations where you’re interested in group-level comparisons (e.g., comparing the average test scores of students from different schools).
- Regression, on the other hand, is used to model the relationship between one or more independent variables and a dependent variable. It’s suitable for predicting or explaining a continuous outcome variable.
- Data Type:
- ANOVA is traditionally used with categorical independent variables and a continuous dependent variable. It helps assess whether the categorical variable has a significant impact on the continuous variable.
- There are other variants such as ANCOVA (categorical and continuous IVs) and Analysis of Deviance (discrete outcome)
- Regression can be used with both categorical and continuous independent variables to predict a continuous dependent variable or to examine the relationship between variables.
- ANOVA is traditionally used with categorical independent variables and a continuous dependent variable. It helps assess whether the categorical variable has a significant impact on the continuous variable.
- Multiple Factors:
- ANOVA is designed to handle situations with multiple categorical independent variables (factors) and their interactions. It is useful when you are interested in understanding the combined effects of several factors.
- Regression can accommodate multiple independent variables as well, but it focuses on predicting the value of the dependent variable rather than comparing groups.
- Hypothesis Testing:
- ANOVA tests for differences in means among groups and provides p-values to determine whether those differences are statistically significant.
- Regression can be used for hypothesis testing, but it’s more often used for estimating the effect size and making predictions.
- Assumptions:
- ANOVA assumes that the groups are independent and that the residuals (the differences between observed values and group means) are normally distributed and have equal variances.
- Regression makes similar assumptions about residuals but also assumes a linear relationship between independent and dependent variables.
- Different Research Questions:
General
- Family of procedures which summarizes the relationship between the underlying model and the outcome by partitioning the variation in the outcome into components which can be uniquely attributable to different sources according to the law of total variance.
- Essentially, each of the model’s terms is represented in a line in the ANOVA table which answers the question how much of the variation in Y can be attributed to the variation in X?
- Where applicable, each source of variance has an accompanying test statistic (oftenF), sometimes called the omnibus test, which indicates the significance of the variance attributable to that term, often accompanied by some measure of effect size.
- One-Way ANOVA - 1 categorical, independent variable
- Determines whether there is a statistically significant difference in the means of the dependent variable across the different levels of the independent variable.
- Example: A researcher wants to compare the average plant height grown using three different types of fertilizer. They would use a one-way ANOVA to test if there is a significant difference in height between the groups fertilized with each type.
- Two-Way ANOVA - 2 categorical, independent variables
- Example: 3 treatments are given to subjects and the researcher thinks that females and males will have different responses in general.
- Test whether there are treatment differences after accounting for sex effects
- Test whether there are sex differences after accounting for treatment effects
- Test whether the treatment effect is different for females and males if you allow the treatment \(\times\) sex interaction to be in the model
- Example: 3 treatments are given to subjects and the researcher thinks that females and males will have different responses in general.
- Types
- TL;DR;
- I don’t see a reason not to run type III every time.
- Type I: Sequential Attribution of Variation
- Type II: Simultaneous Attribution of Variation
- For interactions: Sequential-Simultaneous Attribution of Variation
- Type III: Simultaneous Attribution of Variation for Main Effects and Interactions
- If the categorical explanatory variables in the analysis are balanced, then all 3 types will give the same results. The results for each variable will be it’s unique contribution.
Example:
# balanced table(d$Rx, d$condition) #> Ca Cb #> Placebo 5 5 #> Dose100 5 5 #> Dose250 5 5 # imbalanced table(d$group, d$condition) #> Ca Cb #> Gb 6 6 #> Ga 5 6 #> Gc 4 3
- Type I: Sequential Sum of Squares
Variance attribution is calculated sequentially so the order of variables in the model matters. Each term is attributed with a portion of the variation (represented by its SS) that has not yet been attributed to any of the previous terms.
Rarely used in practice because the order in which variation is attributed isn’t usually important
Example: Order of terms matters
anova(lm(Y ~ group + X, data = d)) #> Analysis of Variance Table #> #> Response: Y #> Df Sum Sq Mean Sq F value Pr(>F) #> group 2 8783 4391 0.0918 0.912617 #> X 1 380471 380471 7.9503 0.009077 ** #> Residuals 26 1244265 47856 #> --- #> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 anova(lm(Y ~ X + group, data = d)) #> Analysis of Variance Table #> #> Response: Y #> Df Sum Sq Mean Sq F value Pr(>F) #> X 1 325745 325745 6.8067 0.01486 * #> group 2 63509 31754 0.6635 0.52353 #> Residuals 26 1244265 47856 #> --- #> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
- Sum of Squares values change based on the order of the terms in the model
- In the first model,
- The effect of group does not represent its unique contribution to Y’s variance, but instead its total contribution.
- This reminds me of a dual path DAG where group is influenced by X. Here X’s variance contribution is included in group’s contribution since X is not conditioned upon. (See Causal Inference >> Dual Path DAGs)
- The effect of X represents only what X explains after removing the contribution of group — the variance attributed to X is strictly the variance that can be uniquely attributed to X, controlling for group
- The effect of group does not represent its unique contribution to Y’s variance, but instead its total contribution.
- Type II: Simultaneous Sum of Squares
The variance attributed to each variable is its unique contribution — variance after controlling for the other variables. Order of terms does not matter.
Example
::Anova(m, type = 2) car#> Anova Table (Type II tests) #> #> Response: Y #> Sum Sq Df F value Pr(>F) #> group 63509 2 0.6635 0.523533 #> X 380471 1 7.9503 0.009077 ** #> Residuals 1244265 26 #> --- #> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
- Sum of Squares values are equal to values of the Type 1 results when each variable is last.
- Note that factor variables, e.g. group, are treated as 1 term and not broken down into dummy variables for each level.
With interactions, the method of calculation could be called, Sequential-Simultaneous.
- Terms are evaluated simultaneously in groups based on type of term, e.g. main effects, 2-way interactions, 3-way interactions, etc., but sequentially according to the order of that term where the order of main effects < 2-way interactions < 3-way interactions, etc.
- All main effects (1st order) are tested simultaneously (accounting for one another), then all 2-way interactions (2nd order) are tested simultaneously (accounting for the main effects and one another), and finally the 3-way interaction is tested (accounting for all main effects and 2-way interactions).
- So, if you use this way to test a model with interactions, only the highest order term’s Sum of Squares represents a unique variance contribution.
- Type III: Simultaneous-Simultaneous Sum of Squares
- The Sum-of-Squares for each main effect and interaction is calculated as its unique contribution (i.e. takes into account all other terms of the model).
- Unlike Type II, it allows you compare variance contributions for every term in your model.
- Without centering continuous variables and applying sum-to-zero contrasts to categorical variables, tests results can change depending on the categorical level of the moderator. (Also see Regression, Linear >> Contrasts >> Sum-to-Zero)
- Example
No Centering, No Sum-to-Zero Contrasts
<- lm(Y ~ group * X, data = d) m_int $group <- relevel(d$group, ref = "Gb") d<- lm(Y ~ group * X, data = d) m_int2 ::Anova(m_int, type = 3) car#> Sum Sq Df F value Pr(>F) #> (Intercept) 538630 1 22.9922 6.994e-05 *** #> group 738108 2 15.7536 4.269e-05 *** #> X 101495 1 4.3325 0.04823 * #> group:X 682026 2 14.5566 7.246e-05 *** #> Residuals 562240 24 ::Anova(m_int2, type = 3) car#> Sum Sq Df F value Pr(>F) #> (Intercept) 219106 1 9.3528 0.005402 ** #> group 738108 2 15.7536 4.269e-05 *** #> X 910646 1 38.8722 1.918e-06 *** #> group:X 682026 2 14.5566 7.246e-05 *** #> Residuals 562240 24
- The sum of squares and p-value change for X when the categorical variable’s reference level changed which shouldn’t matter given this is an omnibus test (i.e. the categorical variable is treated as 1 entity and not set of dummy variables).
Centered, Sum-to-Zero Contrasts Applied
# center, contr.sum <- d |> d_contr_sum mutate(X_c = scale(X, scale = FALSE)) contrasts(d_contr_sum$group) <- contr.sum <- lm(Y ~ group * X_c, data = d_contr_sum) m_int_cont_sum ::Anova(m_int_cont_sum, type = 3) car#> Sum Sq Df F value Pr(>F) #> (Intercept) 4743668 1 202.4902 3.401e-13 *** #> group 19640 2 0.4192 0.66231 #> X_c 143772 1 6.1371 0.02067 * #> group:X_c 682026 2 14.5566 7.246e-05 *** #> Residuals 562240 24 # change reference level <- d_contr_sum |> d_rl mutate(group_rl = relevel(group, ref = "Gb")) contrasts(d_rl$group_rl) <- contr.sum ::Anova(lm(Y ~ group_rl * X_c, data = d_rl), cartype = 3) #> Sum Sq Df F value Pr(>F) #> (Intercept) 4743668 1 202.4902 3.401e-13 *** #> group_rl 19640 2 0.4192 0.66231 #> X_c 143772 1 6.1371 0.02067 * #> group_rl:X_c 682026 2 14.5566 7.246e-05 *** #> Residuals 562240 24
- Now when the reference level is changed, the sum-of-squares and p-value for X remain the same.
- Example
- TL;DR;
Assumptions
- Each group category has a normal distribution.
- Each group category is independent of each other and identically distributed (iid)
- Group categories have of similar variance (i.e. homoskedastic variance)
If this is violated
- If the ratio of the largest variance to the smallest variance is less than 4, then proceed with one-way ANOVA (robust to small differences)
- If the ratio of the largest variance to the smallest variance is greater than 4, perform a Kruskal-Wallis test. This is considered the non-parametric equivalent to the one-way ANOVA. (example)
EDA
%>% data group_by(program) %>% summarize(var=var(weight_loss)) #> A tibble: 3 x 2 #> program var #> 1 A 0.819 #> 2 B 1.53 #> 3 C 2.46
Perform a statisical test to see if these variables are statistically significant (See Post-Hoc Analysis, Difference-in-Means >> EDA >> Tests for Equal Variances)
Mathematics
Asides:
- This lookd like the variance formula except for not dividing by the sample size to get the “average” squared distance
- SSA formula - the second summation just translates to multiplying by ni, the group category sample size, since there is no j in that formula
Calculate SSA and SSE
\[ \begin{align} \text{SST} &= \text{SSA} + \text{SSE} \\ &= \sum_{i = 1}^a \sum_{j=i}^{n_i} (x_{i,j} - \mu)^2 \\ &= \sum_{i = 1}^a \sum_{j=i}^{n_i} (\bar x_i - \mu)^2 + \sum_{i = 1}^a \sum_{j=i}^{n_i} (x_{i,j} - \bar x_i)^2 \end{align} \]
- \(\text{SST}\): Sum of Squares Total
- \(\text{SSA}\): Sum of Squares between categories, treatments, or factors
- “A” stands for attributes (i.e. categories)
- \(\text{SSE}\): Sum of Squares of Errors; randomness within categories, treatments, or factors
- \(x_{ij}\): The jth observation of the ith category
- \(\bar x_i\): The sample mean of category i
- \(\mu\): The overall sample mean
- \(n_i\): The group category sample size
- \(a\): The number of group categories
Calculate MSA and MSE
\[ \begin{align} \text{MSE} &= \frac{\text{SSE}}{N-a} \\ \text{MSA} &= \frac{\text{SSA}}{a-1} \end{align} \]
- Where N is the total sample size
Calculate the F statistic and P-Value
\[ F = \frac{\text{MSA}}{\text{MSE}} \]
- Find the p-value (need a table to look it up)
- If our F statistic is less than the critical value F statistic for a \(\alpha = 0.05\) than we cannot reject the null hypothesis (no statistical difference between categories)
Discussion
- If there is a group category that has more variance than the others’ attribute error (SSA), we should then pick that up when we compare it to the random error (SSE)
- If a group is further away from the overall mean, then it will increase SSA and thus influence the overall variance but might not always increase random error
- If there is a group category that has more variance than the others’ attribute error (SSA), we should then pick that up when we compare it to the random error (SSE)
Diagnostics
- Eta Squared
- Metric to describe the effect size of a variable
- Range: [0, 1]; values closer to 1 indicating that a specific variable in the model can explain a greater fraction of the variation
lsr::etaSquared(anova_model)
(use first column of output)- Guidelines
- 0.01: Effect size is small.
- 0.06: Effect size is medium.
- Large effect size if the number is 0.14 or above
- Post-ANOVA Tests
- Assume approximately Normal distributions
- For links to more details about each test, https://www.statisticshowto.com/probability-and-statistics/statistics-definitions/post-hoc/
- Duncan’s new multiple range test (MRT)
- When you run Analysis of Variance (ANOVA), the results will tell you if there is a difference in means. However, it won’t pinpoint the pairs of means that are different. Duncan’s Multiple Range Test will identify the pairs of means (from at least three) that differ. The MRT is similar to the LSD, but instead of a t-value, a Q Value is used.
- Fisher’s Least Significant Difference (LSD)
- A tool to identify which pairs of means are statistically different. Essentially the same as Duncan’s MRT, but with t-values instead of Q values.
- Newman-Keuls
- Like Tukey’s, this post-hoc test identifies sample means that are different from each other. Newman-Keuls uses different critical values for comparing pairs of means. Therefore, it is more likely to find significant differences.
- Rodger’s Method
- Considered by some to be the most powerful post-hoc test for detecting differences among groups. This test protects against loss of statistical power as the degrees of freedom increase.
- Scheffé’s Method
- Used when you want to look at post-hoc comparisons in general (as opposed to just pairwise comparisons). Scheffe’s controls for the overall confidence level. It is customarily used with unequal sample sizes.
- Tukey’s Test
- The purpose of Tukey’s test is to figure out which groups in your sample differ. It uses the “Honest Significant Difference,” a number that represents the distance between groups, to compare every mean with every other mean.
- Dunnett’s Test
- Like Tukey’s this post-hoc test is used to compare means. Unlike Tukey’s, it compares every mean to a control mean.
- {DescTools::DunnettTest}
- Benjamin-Hochberg (BH) Procedure
- If you perform a very large amount of tests, one or more of the tests will have a significant result purely by chance alone. This post-hoc test accounts for that false discovery rate.
One-Way
- Measures if there’s a difference in means between any group category
- Example: 1 control, 2 Test groups
Data
<- data.frame(Group = rep(c("control", "Test1", "Test2"), each = 10), data value = c(rnorm(10), rnorm(10),rnorm(10))) $Group<-as.factor(data$Group) datahead(data) #> Group value #> 1 control 0.1932123 #> 2 control -0.4346821 #> 3 control 0.9132671 #> 4 control 1.7933881 #> 5 control 0.9966051 #> 6 control 1.1074905
Fit model
<- aov(value ~ Group, data = data) model summary(model) #> Df Sum Sq Mean Sq F value Pr(>F) #> Group 2 4.407 2.2036 3.71 0.0377 * #> Residuals 27 16.035 0.5939 # or <- lm(value ~ Group, data = data) lm_mod anova(lm_mod)
- P-Value < 0.05 says at least 1 group category has a statistically significant different mean from another category
Dunnett’s Test
::DunnettTest(x=data$value, g=data$Group) DescTools #> Dunnett's test for comparing several treatments with a control : #> 95% family-wise confidence level #> $control #> diff lwr.ci upr.ci pval #> Test1-control -0.8742469 -1.678514 -0.06998022 0.0320 * #> Test2-control -0.7335283 -1.537795 0.07073836 0.0768 .
- Measures if there is any difference between treatments and the control
- The mean score of the test1 group was significantly higher than the control group. The mean score of the test2 group was not significantly higher than the control group.
Tukey’s HSD
::TukeyHSD(model, conf.level=.95) stats
- Measures difference in means between all categories and each other
ANCOVA
Analysis of Covariance is used to measure the main effect and interaction effects of categorical variables on a continuous dependent variable while controlling the effects of selected other continuous variables which co-vary with the dependent variable.
Misc
- Analysis of covariance is classical terminology for linear models but we often use the term for nonlinear models (Harrell)
- See also
Assumptions
Independent observations (i.e. random assignment, avoid is having known relationships among participants in the study)
Linearity: the relation between the covariate(s) and the dependent variable must be linear.
Normality: the dependent variable must be normally distributed within each subpopulation. (only needed for small samples of n < 20 or so)
Homogeneity of regression slopes: the beta-coefficient(s) for the covariate(s) must be equal among all subpopulations. (regression lines for these individual groups are assumed to be parallel)
- Failure to meet this assumption implies that there is an interaction between the covariate and the treatment.
- This assumption can be checked with an F test on the interaction of the independent variable(s) with the covariate(s).
- If the F test is significant (i.e., significant interaction) then this assumption has been violated and the covariate should not be used as is.
- A possible solution is converting the continuous scale of the covariate to a categorical (discrete) variable and making it a subsequent independent variable, and then use a factorial ANOVA to analyze the data.
The covariate (adjustment variable) and the treatment are independent
<- aov(grade ~ technique, data = data) model summary(model) #> Df Sum Sq Mean Sq F value Pr(>F) #> technique 2 9.8 4.92 0.14 0.869 #> Residuals 87 3047.7 35.03
- H0: variables are independent
Homogeneity of variance: variance of the dependent variable must be equal over all subpopulations (only needed for sharply unequal sample sizes)
# response ~ treatment leveneTest(exam ~ technique, data = data) #> Df F value Pr(>F) #> group 2 13.752 6.464e-06 *** #> 87 # alt test fligner.test(size ~ location, my.dataframe)
- H0: Homogeneous variance
- This one fails
Fit
<- aov(exam ~ technique + grade, data = data) ancova_model ::Anova(ancova_model, type="III") car #> Sum Sq Df F value Pr(>F) #> (Intercept) 3492.4 1 57.1325 4.096e-11 *** #> technique 1085.8 2 8.8814 0.0003116 *** #> grade 4.0 1 0.0657 0.7982685 #> Residuals 5257.0 86
- When adjusting for current grade (covariate), study technique (treatment) has a significant effect on the final exam score (response).
Does the effect differ by treatment
<- multicomp::glht(ancova_model, linfct = mcp(technique = "Tukey")) postHocs summary(postHocs) #> Estimate Std. Error t value Pr(>|t|) #> B - A == 0 -5.279 2.021 -2.613 0.0284 * #> C - A == 0 3.138 2.022 1.552 0.2719 #> C - B == 0 8.418 2.019 4.170 <0.001 ***
- Also see Post-Hoc Analysis, Multilevel >> Tukey’s Test
- \(A\), \(B\), and \(C\) are the study techniques (treatment)
- Significant differences between \(B\) and \(A\) and a pretty large difference between \(B\) and \(C\).
Example: RCT
\[ \begin{align} \text{post}_i &\sim \mathcal{N}(\mu_i, \sigma_\epsilon)\\ \mu_i &= \beta_0 + \beta_1 \text{tx}_i + \beta_2 \text{pre}_i \end{align} \]
<- glm( w2 data = dw, family = gaussian, ~ 1 + tx + pre) post
- Specification
- post, pre: The post-treatment and pre-treatment measurement of the outcome variable
- tx: The treatment indicator variable
- \(\beta_0\): Population mean for the outcome variable in the control group
- \(\beta_1\): Parameter is the population level difference in pre/post change in the treatment group, compared to the control group.
- Also a causal estimate for the average treatment effect (ATE) in the population, τ
- Because pre is added as a covariate, both \(\beta_0\) and \(\beta_1\) are conditional on the outcome variable, as collected at baseline before random assignment.
- Specification
MANOVA
- Multivariate Analysis of Variance is used to analyze how one or more factor variables affects a multivariate response (multiple response variables).
- Assumptions
- Multivariate Normality – Response variables are multivariate normally distributed within each group of the factor variable(s).
- Independence – Each observation is randomly and independently sampled from the population.
- Equal Variance – The population covariance matrices of each group are equal.
- No Multivariate Outliers – There are no extreme multivariate outliers.
- {MVN} (Vignette) - Tests for multivariate normality
- Contains the three most widely used multivariate normality: Mardia’s, Henze-Zirkler’s and Royston’s
- Graphical approaches, including chi-square Q-Q, perspective and contour plots.
- Two multivariate outlier detection methods, which are based on robust Mahalanobis distances