ANOVA

Misc

Packages

{car} - Anova function that computes all 3 types of ANOVA table
- Can also be applied to glm models to produce Analysis of Deviance tables (e.g. logistic, poisson, etc.)
- Think the other packages wrap this function, so they can be used instead in order to advantage of their plotting, testing conveniences.
{grafify} - ANOVA wrappers, plotting, wrappers for {emmeans}
{afex} - Analysis of Factorial EXperiments
- ANOVA helper functions that fit the lm, center, apply contrasts, etc. in one line of code
  - Example: afex::aov_car(Y ~ group * condition + Error(id), data = d)
  - Type III used, Factor variables created, Sum-to-Zero contrast is applied
- Effect plotting functions
{BayesFactor} - A suite of functions for computing various Bayes factors for simple designs, including contingency tables, one- and two-sample designs, one-way designs, general ANOVA designs, and linear regression
- Can be used with {bayestestR::bf_inclusion}. See Feature Selection >> Basic >> BMA.
{HDANOVA} - High-Dimensional Analysis of Variance
- Main methods:
  - ASCA+ - Analysis of Variance Simultaneous Component Analysis
  - APCA+ - ANOVA Principal Component Analysis
  - LiMM-PCA - Linear Mixed Model PCA
  - MSCA - Multilevel Simultaneous Component Analysis
  - PC-ANOVA - Principal Component Analysis of Variance
  - PRC - Principal Response Curves
  - PERMANOVA - Permutation Based MANOVA
{qountstat} - Statistical Analysis of Count Data and Quantal (aka dichotomous) Data
- CPCAT (Closure Principle Computational Approach Test) - Sort of an ANOVA for count data I think
  - CP is for alpha inflation due to multiple testing
  - CAT is a hypothesis test based on the Poisson distribution rather than a parametric test based on normal distribution assumptions
- CPFISH - Similar to CPCAT but dichotomous count data (e.g. live/dead)
  - Uses a Fisher test on all sub-hypotheses to be analyzed
- Dunnett GLM - Alternative approach to CPCAT. Uses a Quasi-Poisson
{ParamANOVA} - Fully Parameterizable ANOVA Tests
- Allows users to specify the type of sum of squares (1, 2 or 3), the types of variables (Fixed or Random) and their relationships (crossed or nested) with SAS matching output
{AOboot} - Bootstrapping in Different One-Way and Two-Way ANOVA
- Bootstrapping functions to approximate a normal distribution of the parameter estimates for between-subject, within-subject, and mixed one-way and two-way ANOVA
{ARTool} (Description) - Aligned Rank Transform for Nonparametric Factorial ANOVAs
- Nonparametric, multi-factor ANOVA; similar concept as Conover’s test but can handle interactions
- Supports traditional ANOVA models (fit usinglm), repeated measures ANOVAs (fit usingaov), and mixed effects models (fit usinglmer); the model used is determined by the formula passed to art.

Notes from
- Everything You Always Wanted to Know About ANOVA
Resources
- Power Analysis with Superpower
- Shiny Apps for demystifying statistical models and methods
  - Factorial ANOVA, Mixed effects ANOVA, Mixed effects with random slopes, Logistic regression, Multiple comparisons, ANCOVA, One-way ANOVA, Odds ratio vs relative risk, Correlation coefficient vs. slope Repeated sampling app
- Online R Learning - Includes ANOVA analysis for various experiment designs
  - One-way ANOVA and the compact letter display
  - What to do when ANOVA assumption fails
  - Randomized block design ANOVA
  - Multifactorial deign ANOVA and interactions
  - Repeated measures ANOVA
  - Split field and nested experimental design
ANOVA vs. Regression (GPT-3.5)
- Different Research Questions:
  - ANOVA is typically used when you want to compare the means of three or more groups to determine if there are statistically significant differences among them. It’s suited for situations where you’re interested in group-level comparisons (e.g., comparing the average test scores of students from different schools).
  - Regression, on the other hand, is used to model the relationship between one or more independent variables and a dependent variable. It’s suitable for predicting or explaining a continuous outcome variable.
- Data Type:
  - ANOVA is traditionally used with categorical independent variables and a continuous dependent variable. It helps assess whether the categorical variable has a significant impact on the continuous variable.
    - There are other variants such as ANCOVA (categorical and continuous IVs) and Analysis of Deviance (discrete outcome)
  - Regression can be used with both categorical and continuous independent variables to predict a continuous dependent variable or to examine the relationship between variables.
- Multiple Factors:
  - ANOVA is designed to handle situations with multiple categorical independent variables (factors) and their interactions. It is useful when you are interested in understanding the combined effects of several factors.
  - Regression can accommodate multiple independent variables as well, but it focuses on predicting the value of the dependent variable rather than comparing groups.
- Hypothesis Testing:
  - ANOVA tests for differences in means among groups and provides p-values to determine whether those differences are statistically significant.
  - Regression can be used for hypothesis testing, but it’s more often used for estimating the effect size and making predictions.
- Assumptions:
  - ANOVA assumes that the groups are independent and that the residuals (the differences between observed values and group means) are normally distributed and have equal variances.
  - Regression makes similar assumptions about residuals but also assumes a linear relationship between independent and dependent variables.

General

Family of procedures which summarizes the relationship between the underlying model and the outcome by partitioning the variation in the outcome into components which can be uniquely attributable to different sources according to the law of total variance.
Essentially, each of the model’s terms is represented in a line in the ANOVA table which answers the question how much of the variation in Y can be attributed to the variation in X?
- Where applicable, each source of variance has an accompanying test statistic (oftenF), sometimes called the omnibus test, which indicates the significance of the variance attributable to that term, often accompanied by some measure of effect size.
One-Way ANOVA - 1 categorical, independent variable
- Determines whether there is a statistically significant difference in the means of the dependent variable across the different levels of the independent variable.
- Example: A researcher wants to compare the average plant height grown using three different types of fertilizer. They would use a one-way ANOVA to test if there is a significant difference in height between the groups fertilized with each type.
Two-Way ANOVA - 2 categorical, independent variables
- Example: 3 treatments are given to subjects and the researcher thinks that females and males will have different responses in general.
  - Test whether there are treatment differences after accounting for sex effects
  - Test whether there are sex differences after accounting for treatment effects
  - Test whether the treatment effect is different for females and males if you allow the treatment $\times$ sex interaction to be in the model
Types
- TL;DR;
  - I don’t see a reason not to run type III every time.
  - Type I: Sequential Attribution of Variation
  - Type II: Simultaneous Attribution of Variation
    - For interactions: Sequential-Simultaneous Attribution of Variation
  - Type III: Simultaneous Attribution of Variation for Main Effects and Interactions
  - If the categorical explanatory variables in the analysis are balanced, then all 3 types will give the same results. The results for each variable will be it’s unique contribution.
    - Example:
      # balanced table(d$Rx, d$condition) #> Ca Cb #> Placebo 5 5 #> Dose100 5 5 #> Dose250 5 5 # imbalanced table(d$group, d$condition) #> Ca Cb #> Gb 6 6 #> Ga 5 6 #> Gc 4 3
- Type I: Sequential Sum of Squares
  - Variance attribution is calculated sequentially so the order of variables in the model matters. Each term is attributed with a portion of the variation (represented by its SS) that has not yet been attributed to any of the previous terms.
  - Rarely used in practice because the order in which variation is attributed isn’t usually important
  - Example: Order of terms matters
```
anova(lm(Y ~ group + X, data = d))
#> Analysis of Variance Table
#> 
#> Response: Y
#>           Df  Sum Sq Mean Sq F value   Pr(>F)   
#> group      2    8783    4391  0.0918 0.912617   
#> X          1  380471  380471  7.9503 0.009077 **
#> Residuals 26 1244265   47856                    
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

anova(lm(Y ~ X + group, data = d))
#> Analysis of Variance Table
#> 
#> Response: Y
#>           Df  Sum Sq Mean Sq F value  Pr(>F)  
#> X          1  325745  325745  6.8067 0.01486 *
#> group      2   63509   31754  0.6635 0.52353  
#> Residuals 26 1244265   47856                  
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
```
    - Sum of Squares values change based on the order of the terms in the model
    - In the first model,
      - The effect of group does not represent its unique contribution to Y’s variance, but instead its total contribution.
        
        This reminds me of a dual path DAG where group is influenced by X. Here X’s variance contribution is included in group’s contribution since X is not conditioned upon. (See Causal Inference >> Dual Path DAGs)
      - The effect of X represents only what X explains after removing the contribution of group — the variance attributed to X is strictly the variance that can be uniquely attributed to X, controlling for group
- Type II: Simultaneous Sum of Squares
  - The variance attributed to each variable is its unique contribution — variance after controlling for the other variables. Order of terms does not matter.
  - Example
```
car::Anova(m, type = 2)
#> Anova Table (Type II tests)
#> 
#> Response: Y
#>            Sum Sq Df F value   Pr(>F)   
#> group       63509  2  0.6635 0.523533   
#> X          380471  1  7.9503 0.009077 **
#> Residuals 1244265 26                    
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
```
    - Sum of Squares values are equal to values of the Type 1 results when each variable is last.
    - Note that factor variables, e.g. group, are treated as 1 term and not broken down into dummy variables for each level.
  - With interactions, the method of calculation could be called, Sequential-Simultaneous.
    - Terms are evaluated simultaneously in groups based on type of term, e.g. main effects, 2-way interactions, 3-way interactions, etc., but sequentially according to the order of that term where the order of main effects < 2-way interactions < 3-way interactions, etc.
    - All main effects (1st order) are tested simultaneously (accounting for one another), then all 2-way interactions (2nd order) are tested simultaneously (accounting for the main effects and one another), and finally the 3-way interaction is tested (accounting for all main effects and 2-way interactions).
    - So, if you use this way to test a model with interactions, only the highest order term’s Sum of Squares represents a unique variance contribution.
- Type III: Simultaneous-Simultaneous Sum of Squares
  - The Sum-of-Squares for each main effect and interaction is calculated as its unique contribution (i.e. takes into account all other terms of the model).
  - Unlike Type II, it allows you compare variance contributions for every term in your model.
  - Without centering continuous variables and applying sum-to-zero contrasts to categorical variables, tests results can change depending on the categorical level of the moderator. (Also see Regression, Linear >> Contrasts >> Sum-to-Zero)
    - Example
      - No Centering, No Sum-to-Zero Contrasts
        
        m_int <- lm(Y ~ group * X, data = d) d$group <- relevel(d$group, ref = "Gb") m_int2 <- lm(Y ~ group * X, data = d) car::Anova(m_int, type = 3) #> Sum Sq Df F value Pr(>F) #> (Intercept) 538630 1 22.9922 6.994e-05 *** #> group 738108 2 15.7536 4.269e-05 *** #> X 101495 1 4.3325 0.04823 * #> group:X 682026 2 14.5566 7.246e-05 *** #> Residuals 562240 24 car::Anova(m_int2, type = 3) #> Sum Sq Df F value Pr(>F) #> (Intercept) 219106 1 9.3528 0.005402 ** #> group 738108 2 15.7536 4.269e-05 *** #> X 910646 1 38.8722 1.918e-06 *** #> group:X 682026 2 14.5566 7.246e-05 *** #> Residuals 562240 24
        
        The sum of squares and p-value change for X when the categorical variable’s reference level changed which shouldn’t matter given this is an omnibus test (i.e. the categorical variable is treated as 1 entity and not set of dummy variables).
      - Centered, Sum-to-Zero Contrasts Applied
        
        # center, contr.sum d_contr_sum <- d |> mutate(X_c = scale(X, scale = FALSE)) contrasts(d_contr_sum$group) <- contr.sum m_int_cont_sum <- lm(Y ~ group * X_c, data = d_contr_sum) car::Anova(m_int_cont_sum, type = 3) #> Sum Sq Df F value Pr(>F) #> (Intercept) 4743668 1 202.4902 3.401e-13 *** #> group 19640 2 0.4192 0.66231 #> X_c 143772 1 6.1371 0.02067 * #> group:X_c 682026 2 14.5566 7.246e-05 *** #> Residuals 562240 24 # change reference level d_rl <- d_contr_sum |> mutate(group_rl = relevel(group, ref = "Gb")) contrasts(d_rl$group_rl) <- contr.sum car::Anova(lm(Y ~ group_rl * X_c, data = d_rl), type = 3) #> Sum Sq Df F value Pr(>F) #> (Intercept) 4743668 1 202.4902 3.401e-13 *** #> group_rl 19640 2 0.4192 0.66231 #> X_c 143772 1 6.1371 0.02067 * #> group_rl:X_c 682026 2 14.5566 7.246e-05 *** #> Residuals 562240 24
        
        Now when the reference level is changed, the sum-of-squares and p-value for X remain the same.

Assumptions

Each group category has a normal distribution.
Each group category is independent of each other and identically distributed (iid)
Group categories have of similar variance (i.e. homoskedastic variance)
- If this is violated
  - If the ratio of the largest variance to the smallest variance is less than 4, then proceed with one-way ANOVA (robust to small differences)
  - If the ratio of the largest variance to the smallest variance is greater than 4, perform a Kruskal-Wallis test. This is considered the non-parametric equivalent to the one-way ANOVA. (example)
- EDA
```
data %>%
  group_by(program) %>%
  summarize(var=var(weight_loss))
#> A tibble: 3 x 2
#>   program  var   
#> 1 A      0.819
#> 2 B      1.53 
#> 3 C      2.46
```
- Perform a statisical test to see if these variables are statistically significant (See Post-Hoc Analysis, Difference-in-Means >> EDA >> Tests for Equal Variances)

Mathematics

Asides:
- This lookd like the variance formula except for not dividing by the sample size to get the “average” squared distance
- SSA formula - the second summation just translates to multiplying by ni, the group category sample size, since there is no j in that formula
Calculate SSA and SSE

\[ \begin{align} \text{SST} &= \text{SSA} + \text{SSE} \\ &= \sum_{i = 1}^a \sum_{j=i}^{n_i} (x_{i,j} - \mu)^2 \\ &= \sum_{i = 1}^a \sum_{j=i}^{n_i} (\bar x_i - \mu)^2 + \sum_{i = 1}^a \sum_{j=i}^{n_i} (x_{i,j} - \bar x_i)^2 \end{align} \]
- $\text{SST}$: Sum of Squares Total
- $\text{SSA}$: Sum of Squares between categories, treatments, or factors
  - “A” stands for attributes (i.e. categories)
- $\text{SSE}$: Sum of Squares of Errors; randomness within categories, treatments, or factors
- $x_{ij}$: The j^th observation of the i^th category
- $\bar x_i$: The sample mean of category i
- $\mu$: The overall sample mean
- $n_i$: The group category sample size
- $a$: The number of group categories
Calculate MSA and MSE

\[ \begin{align} \text{MSE} &= \frac{\text{SSE}}{N-a} \\ \text{MSA} &= \frac{\text{SSA}}{a-1} \end{align} \]
- Where N is the total sample size
Calculate the F statistic and P-Value

\[ F = \frac{\text{MSA}}{\text{MSE}} \]
- Find the p-value (need a table to look it up)
- If our F statistic is less than the critical value F statistic for a $\alpha = 0.05$ than we cannot reject the null hypothesis (no statistical difference between categories)
Discussion
- If there is a group category that has more variance than the others’ attribute error (SSA), we should then pick that up when we compare it to the random error (SSE)
  - If a group is further away from the overall mean, then it will increase SSA and thus influence the overall variance but might not always increase random error

Diagnostics

Eta Squared
- Metric to describe the effect size of a variable
- Range: [0, 1]; values closer to 1 indicating that a specific variable in the model can explain a greater fraction of the variation
- lsr::etaSquared(anova_model) (use first column of output)
- Guidelines
  - 0.01: Effect size is small.
  - 0.06: Effect size is medium.
  - Large effect size if the number is 0.14 or above
Post-ANOVA Tests
- Assume approximately Normal distributions
- For links to more details about each test, https://www.statisticshowto.com/probability-and-statistics/statistics-definitions/post-hoc/
- Duncan’s new multiple range test (MRT)
  - When you run Analysis of Variance (ANOVA), the results will tell you if there is a difference in means. However, it won’t pinpoint the pairs of means that are different. Duncan’s Multiple Range Test will identify the pairs of means (from at least three) that differ. The MRT is similar to the LSD, but instead of a t-value, a Q Value is used.
- Fisher’s Least Significant Difference (LSD)
  - A tool to identify which pairs of means are statistically different. Essentially the same as Duncan’s MRT, but with t-values instead of Q values.
- Newman-Keuls
  - Like Tukey’s, this post-hoc test identifies sample means that are different from each other. Newman-Keuls uses different critical values for comparing pairs of means. Therefore, it is more likely to find significant differences.
- Rodger’s Method
  - Considered by some to be the most powerful post-hoc test for detecting differences among groups. This test protects against loss of statistical power as the degrees of freedom increase.
- Scheffé’s Method
  - Used when you want to look at post-hoc comparisons in general (as opposed to just pairwise comparisons). Scheffe’s controls for the overall confidence level. It is customarily used with unequal sample sizes.
- Tukey’s Test
  - The purpose of Tukey’s test is to figure out which groups in your sample differ. It uses the “Honest Significant Difference,” a number that represents the distance between groups, to compare every mean with every other mean.
  - {TukeyC} - Provides tools to perform multiple comparison analyses, based on Tukey’s “Honestly Significant Difference” (HSD) test
- Dunnett’s Test
  - Like Tukey’s this post-hoc test is used to compare means. Unlike Tukey’s, it compares every mean to a control mean.
  - {DescTools::DunnettTest}
- Benjamin-Hochberg (BH) Procedure
  - If you perform a very large amount of tests, one or more of the tests will have a significant result purely by chance alone. This post-hoc test accounts for that false discovery rate.

One-Way

Measures if there’s a difference in means between any group category

Example: 1 control, 2 Test groups

Data

data <- data.frame(Group = rep(c("control", "Test1", "Test2"), each = 10),
value = c(rnorm(10), rnorm(10),rnorm(10)))
data$Group<-as.factor(data$Group)
head(data)
#>   Group      value
#> 1 control  0.1932123
#> 2 control -0.4346821
#> 3 control  0.9132671
#> 4 control  1.7933881
#> 5 control  0.9966051
#> 6 control  1.1074905

Fit model

model <- aov(value ~ Group, data = data)
summary(model)
#>             Df    Sum Sq   Mean Sq  F value  Pr(>F) 
#> Group        2     4.407    2.2036     3.71  0.0377 *
#> Residuals   27    16.035    0.5939

# or
lm_mod <- lm(value ~ Group, data = data)
anova(lm_mod)

P-Value < 0.05 says at least 1 group category has a statistically significant different mean from another category

Dunnett’s Test

DescTools::DunnettTest(x=data$value, g=data$Group)

#> Dunnett's test for comparing several treatments with a control : 
#>     95% family-wise confidence level
#> $control
#>                     diff    lwr.ci      upr.ci  pval   
#> Test1-control -0.8742469 -1.678514 -0.06998022 0.0320 * 
#> Test2-control -0.7335283 -1.537795  0.07073836 0.0768 .

Measures if there is any difference between treatments and the control
The mean score of the test1 group was significantly higher than the control group. The mean score of the test2 group was not significantly higher than the control group.

Tukey’s HSD
```
stats::TukeyHSD(model, conf.level=.95)
```
- Measures difference in means between all categories and each other

ANCOVA

Analysis of Covariance is used to measure the main effect and interaction effects of categorical variables on a continuous dependent variable while controlling the effects of selected other continuous variables which co-vary with the dependent variable.
Misc
- Analysis of covariance is classical terminology for linear models but we often use the term for nonlinear models (Harrell)
- See also
  - Harrell - Biostatistics for Biomedical Research Ch. 13
Assumptions
- Independent observations (i.e. random assignment, avoid is having known relationships among participants in the study)
- Linearity: the relation between the covariate(s) and the dependent variable must be linear.
- Normality: the dependent variable must be normally distributed within each subpopulation. (only needed for small samples of n < 20 or so)
- Homogeneity of regression slopes: the beta-coefficient(s) for the covariate(s) must be equal among all subpopulations. (regression lines for these individual groups are assumed to be parallel)
  - Failure to meet this assumption implies that there is an interaction between the covariate and the treatment.
  - This assumption can be checked with an F test on the interaction of the independent variable(s) with the covariate(s).
    - If the F test is significant (i.e., significant interaction) then this assumption has been violated and the covariate should not be used as is.
    - A possible solution is converting the continuous scale of the covariate to a categorical (discrete) variable and making it a subsequent independent variable, and then use a factorial ANOVA to analyze the data.
- The covariate (adjustment variable) and the treatment are independent
```
model <- aov(grade ~ technique, data = data)
summary(model)

#>             Df Sum Sq Mean Sq F value Pr(>F)
#> technique    2    9.8    4.92    0.14  0.869
#> Residuals  87 3047.7  35.03
```
  - H₀: variables are independent

Homogeneity of variance: variance of the dependent variable must be equal over all subpopulations (only needed for sharply unequal sample sizes)

# response ~ treatment
leveneTest(exam ~ technique, data = data)

#>       Df F value    Pr(>F)   
#> group  2  13.752 6.464e-06 ***
#>       87

# alt test
fligner.test(size ~ location, my.dataframe)

H₀: Homogeneous variance
This one fails

Fit

ancova_model <- aov(exam ~ technique + grade, data = data)
car::Anova(ancova_model, type="III")

#>                 Sum Sq Df F value    Pr(>F)   
#>     (Intercept) 3492.4  1 57.1325 4.096e-11 ***
#>     technique  1085.8  2  8.8814 0.0003116 ***
#>     grade          4.0  1  0.0657 0.7982685   
#>     Residuals  5257.0 86

When adjusting for current grade (covariate), study technique (treatment) has a significant effect on the final exam score (response).

Does the effect differ by treatment

postHocs <- multicomp::glht(ancova_model, linfct = mcp(technique = "Tukey"))
summary(postHocs)

#>             Estimate Std. Error t value Pr(>|t|)   
#> B - A == 0   -5.279      2.021  -2.613  0.0284 * 
#> C - A == 0    3.138      2.022   1.552  0.2719   
#> C - B == 0    8.418      2.019   4.170  <0.001 ***

Also see Post-Hoc Analysis, Multilevel >> Tukey’s Test
$A$, $B$, and $C$ are the study techniques (treatment)
Significant differences between $B$ and $A$ and a pretty large difference between $B$ and $C$.

Example: RCT

\[ \begin{align} \text{post}_i &\sim \mathcal{N}(\mu_i, \sigma_\epsilon)\\ \mu_i &= \beta_0 + \beta_1 \text{tx}_i + \beta_2 \text{pre}_i \end{align} \]
```
w2 <- glm(
  data = dw,
  family = gaussian,
  post ~ 1 + tx + pre)
```
- Specification
  - post, pre: The post-treatment and pre-treatment measurement of the outcome variable
  - tx: The treatment indicator variable
  - $\beta_0$: Population mean for the outcome variable in the control group
  - $\beta_1$: Parameter is the population level difference in pre/post change in the treatment group, compared to the control group.
    - Also a causal estimate for the average treatment effect (ATE) in the population, τ
  - Because pre is added as a covariate, both $\beta_0$ and $\beta_1$ are conditional on the outcome variable, as collected at baseline before random assignment.

MANOVA

Multivariate Analysis of Variance is used to analyze how one or more factor variables affects a multivariate response (multiple response variables).
Assumptions
- Multivariate Normality – Response variables are multivariate normally distributed within each group of the factor variable(s).
- Independence – Each observation is randomly and independently sampled from the population.
- Equal Variance – The population covariance matrices of each group are equal.
- No Multivariate Outliers – There are no extreme multivariate outliers.
{MVN} (Vignette) - Tests for multivariate normality
- Contains the three most widely used multivariate normality: Mardia’s, Henze-Zirkler’s and Royston’s
- Graphical approaches, including chi-square Q-Q, perspective and contour plots.
- Two multivariate outlier detection methods, which are based on robust Mahalanobis distances

Mixed and Repeated Measures

Misc
- Mixed ANOVA: Subjects that undergo each condition are different.
  - e.g. Testing how men vs. women (between-subjects) perform on a memory test measured at three time points (within-subjects)
- Two-Way Repeated Measures ANOVA: The same subjects undergo both conditions. It examines the effects of two within-subject factors on a dependent variable.
  - e.g. Testing how the same group performs on memory tests in morning vs. evening (factor 1) using visual vs. auditory stimuli (factor 2)

Example: Repeated Measures (source)

model.aov <- aov(resp.var ~ group * time + 
                            Error(subID/(group*time)))
#OR
model.aov <- aov(resp.var ~ group * time + 
                            Error(subID/(group + time + group:time)))

summary(model.aov)
# 
# Error: subID
#       Df Sum Sq Mean Sq
# group  1    408     408
# 
# Error: subID:group
#       Df Sum Sq Mean Sq
# group  1  67.25   67.25
# 
# Error: subID:time
#      Df Sum Sq Mean Sq
# time  3  356.7   118.9
# 
# Error: subID:group:time
#      Df Sum Sq Mean Sq
# time  3  49.94   16.65
# 
# Error: Within
#            Df Sum Sq Mean Sq F value  Pr(>F)   
# group       1  13.86  13.863   6.105 0.03866 * 
# time        3  52.79  17.596   7.749 0.00943 **
# group:time  3   2.60   0.868   0.382 0.76887   
# Residuals   8  18.17   2.271                   
# ---
# Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

The error function has decomposed the residuals into:
- Subject ID
- Subject ID and group interaction
- Subject ID and time interaction
- Three way interaction of subject ID, group and time
Interaction was non-significant

Example: Repeated Measures ANOVA vs Mixed Effects Models (source)

data("stroop", package = "afex")
stroop_sub <- subset(stroop, !is.na(rt))

# repeated measures ANOVA
A <- afex::aov_ez(id = "pno", 
                  dv = "rt", 
                  data = stroop_sub, 
                  within = c("congruency", "condition"),
                  fun_aggregate = mean)

# the aggregation that's occurring
# stroop_agg <- 
#   stroop_sub |> 
#   summarize(mean_rt = mean(rt),
#             .by = c("pno", "congruency", "condition"))


# fit models on aggregated data 
# varying intercepts
MLM1 <- lmerTest::lmer(rt ~ congruency * condition + (1 | pno),
                       data = A$data$long)

# varying slopes and intercepts
MLM2 <- lmerTest::lmer(rt ~ congruency * condition + 
                         (congruency + condition | pno),
                       data = A$data$long)

cbind(
  A$anova_table[,"Pr(>F)", drop = FALSE],
  "p (vary_slop_intrc)" = anova(MLM2)[,"Pr(>F)"],
  "p (vary_intrc)" = anova(MLM1)[,"Pr(>F)"]  
) |> 
  apply(2, scales::label_pvalue())

#>                      Pr(>F)   p (vary_slop_intrc) p (vary_intrc)
#> congruency           "<0.001" "<0.001"            "<0.001"      
#> condition            "0.038"  "0.038"             "0.005"       
#> congruency:condition "0.002"  "0.002"             "0.160"

Shows the equivalency of the repeated anova to the varying slopes and intercepts mixed effects model
- The aggregation isn’t necessary, it was just done in order to present the edge case of “repeated” measure = 1 and showing that including varying slopes is the better fit even in this extreme case.
Omitting the varying slopes results in a model that doesn’t detect the significance of the interaction even using aggregated data (see source)