ANOVA

Misc

  • Notes from
  • Resources
  • Packages
    • {car} - Anova function that computes all 3 types of ANOVA table
      • Can also be applied to glm models to produce Analysis of Deviance tables (e.g. logistic, poisson, etc.)
      • Think the other packages wrap this function, so they can be used instead in order to advantage of their plotting, testing conveniences.
    • {grafify} - ANOVA wrappers, plotting, wrappers for {emmeans}
    • {afex} - Analysis of Factorial EXperiments
      • ANOVA helper functions that fit the lm, center, apply contrasts, etc. in one line of code
        • Example: afex::aov_car(Y ~ group * condition + Error(id), data = d)
        • Type III used, Factor variables created, Sum-to-Zero contrast is applied
      • Effect plotting functions
    • {BayesFactor} - A suite of functions for computing various Bayes factors for simple designs, including contingency tables, one- and two-sample designs, one-way designs, general ANOVA designs, and linear regression
  • ANOVA vs. Regression (GPT-3.5)
    • Different Research Questions:
      • ANOVA is typically used when you want to compare the means of three or more groups to determine if there are statistically significant differences among them. It’s suited for situations where you’re interested in group-level comparisons (e.g., comparing the average test scores of students from different schools).
      • Regression, on the other hand, is used to model the relationship between one or more independent variables and a dependent variable. It’s suitable for predicting or explaining a continuous outcome variable.
    • Data Type:
      • ANOVA is traditionally used with categorical independent variables and a continuous dependent variable. It helps assess whether the categorical variable has a significant impact on the continuous variable.
        • There are other variants such as ANCOVA (categorical and continuous IVs) and Analysis of Deviance (discrete outcome)
      • Regression can be used with both categorical and continuous independent variables to predict a continuous dependent variable or to examine the relationship between variables.
    • Multiple Factors:
      • ANOVA is designed to handle situations with multiple categorical independent variables (factors) and their interactions. It is useful when you are interested in understanding the combined effects of several factors.
      • Regression can accommodate multiple independent variables as well, but it focuses on predicting the value of the dependent variable rather than comparing groups.
    • Hypothesis Testing:
      • ANOVA tests for differences in means among groups and provides p-values to determine whether those differences are statistically significant.
      • Regression can be used for hypothesis testing, but it’s more often used for estimating the effect size and making predictions.
    • Assumptions:
      • ANOVA assumes that the groups are independent and that the residuals (the differences between observed values and group means) are normally distributed and have equal variances.
      • Regression makes similar assumptions about residuals but also assumes a linear relationship between independent and dependent variables.

General

  • Family of procedures which summarizes the relationship between the underlying model and the outcome by partitioning the variation in the outcome into components which can be uniquely attributable to different sources according to the law of total variance.
  • Essentially, each of the model’s terms is represented in a line in the ANOVA table which answers the question how much of the variation in Y can be attributed to the variation in X?
    • Where applicable, each source of variance has an accompanying test statistic (oftenF), sometimes called the omnibus test, which indicates the significance of the variance attributable to that term, often accompanied by some measure of effect size.
  • One-Way ANOVA - 1 categorical, independent variable
    • Determines whether there is a statistically significant difference in the means of the dependent variable across the different levels of the independent variable.
    • Example: A researcher wants to compare the average plant height grown using three different types of fertilizer. They would use a one-way ANOVA to test if there is a significant difference in height between the groups fertilized with each type.
  • Two-Way ANOVA - 2 categorical, independent variables
    • Example: 3 treatments are given to subjects and the researcher thinks that females and males will have different responses in general.
      • Test whether there are treatment differences after accounting for sex effects
      • Test whether there are sex differences after accounting for treatment effects
      • Test whether the treatment effect is different for females and males if you allow the treatment \(\times\) sex interaction to be in the model
  • Types
    • TL;DR;
      • I don’t see a reason not to run type III every time.
      • Type I: Sequential Attribution of Variation
      • Type II: Simultaneous Attribution of Variation
        • For interactions: Sequential-Simultaneous Attribution of Variation
      • Type III: Simultaneous Attribution of Variation for Main Effects and Interactions
      • If the categorical explanatory variables in the analysis are balanced, then all 3 types will give the same results. The results for each variable will be it’s unique contribution.
        • Example:

          # balanced
          table(d$Rx, d$condition)
          #>           Ca Cb
          #>   Placebo  5  5
          #>   Dose100  5  5
          #>   Dose250  5  5
          
          # imbalanced
          table(d$group, d$condition)
          #>      Ca Cb
          #>   Gb  6  6
          #>   Ga  5  6
          #>   Gc  4  3
    • Type I: Sequential Sum of Squares
      • Variance attribution is calculated sequentially so the order of variables in the model matters. Each term is attributed with a portion of the variation (represented by its SS) that has not yet been attributed to any of the previous terms.

      • Rarely used in practice because the order in which variation is attributed isn’t usually important

      • Example: Order of terms matters

        anova(lm(Y ~ group + X, data = d))
        #> Analysis of Variance Table
        #> 
        #> Response: Y
        #>           Df  Sum Sq Mean Sq F value   Pr(>F)   
        #> group      2    8783    4391  0.0918 0.912617   
        #> X          1  380471  380471  7.9503 0.009077 **
        #> Residuals 26 1244265   47856                    
        #> ---
        #> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
        
        anova(lm(Y ~ X + group, data = d))
        #> Analysis of Variance Table
        #> 
        #> Response: Y
        #>           Df  Sum Sq Mean Sq F value  Pr(>F)  
        #> X          1  325745  325745  6.8067 0.01486 *
        #> group      2   63509   31754  0.6635 0.52353  
        #> Residuals 26 1244265   47856                  
        #> ---
        #> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
        • Sum of Squares values change based on the order of the terms in the model
        • In the first model,
          • The effect of group does not represent its unique contribution to Y’s variance, but instead its total contribution.
            • This reminds me of a dual path DAG where group is influenced by X. Here X’s variance contribution is included in group’s contribution since X is not conditioned upon. (See Causal Inference >> Dual Path DAGs)
          • The effect of X represents only what X explains after removing the contribution of group — the variance attributed to X is strictly the variance that can be uniquely attributed to X, controlling for group
    • Type II: Simultaneous Sum of Squares
      • The variance attributed to each variable is its unique contribution — variance after controlling for the other variables. Order of terms does not matter.

      • Example

        car::Anova(m, type = 2)
        #> Anova Table (Type II tests)
        #> 
        #> Response: Y
        #>            Sum Sq Df F value   Pr(>F)   
        #> group       63509  2  0.6635 0.523533   
        #> X          380471  1  7.9503 0.009077 **
        #> Residuals 1244265 26                    
        #> ---
        #> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
        • Sum of Squares values are equal to values of the Type 1 results when each variable is last.
        • Note that factor variables, e.g. group, are treated as 1 term and not broken down into dummy variables for each level.
      • With interactions, the method of calculation could be called, Sequential-Simultaneous.

        • Terms are evaluated simultaneously in groups based on type of term, e.g. main effects, 2-way interactions, 3-way interactions, etc., but sequentially according to the order of that term where the order of main effects < 2-way interactions < 3-way interactions, etc.
        • All main effects (1st order) are tested simultaneously (accounting for one another), then all 2-way interactions (2nd order) are tested simultaneously (accounting for the main effects and one another), and finally the 3-way interaction is tested (accounting for all main effects and 2-way interactions).
        • So, if you use this way to test a model with interactions, only the highest order term’s Sum of Squares represents a unique variance contribution.
    • Type III: Simultaneous-Simultaneous Sum of Squares
      • The Sum-of-Squares for each main effect and interaction is calculated as its unique contribution (i.e. takes into account all other terms of the model).
      • Unlike Type II, it allows you compare variance contributions for every term in your model.
      • Without centering continuous variables and applying sum-to-zero contrasts to categorical variables, tests results can change depending on the categorical level of the moderator. (Also see Regression, Linear >> Contrasts >> Sum-to-Zero)
        • Example
          • No Centering, No Sum-to-Zero Contrasts

            m_int <- lm(Y ~ group * X, data = d)
            
            d$group <- relevel(d$group, ref = "Gb")
            m_int2 <- lm(Y ~ group * X, data = d)
            
            car::Anova(m_int, type = 3)
            #>             Sum Sq Df F value    Pr(>F)    
            #> (Intercept) 538630  1 22.9922 6.994e-05 ***
            #> group       738108  2 15.7536 4.269e-05 ***
            #> X           101495  1  4.3325   0.04823 *  
            #> group:X     682026  2 14.5566 7.246e-05 ***
            #> Residuals   562240 24     
            
            car::Anova(m_int2, type = 3)
            #>             Sum Sq Df F value    Pr(>F)    
            #> (Intercept) 219106  1  9.3528  0.005402 ** 
            #> group       738108  2 15.7536 4.269e-05 ***
            #> X           910646  1 38.8722 1.918e-06 ***
            #> group:X     682026  2 14.5566 7.246e-05 ***
            #> Residuals   562240 24  
            • The sum of squares and p-value change for X when the categorical variable’s reference level changed which shouldn’t matter given this is an omnibus test (i.e. the categorical variable is treated as 1 entity and not set of dummy variables).
          • Centered, Sum-to-Zero Contrasts Applied

            # center, contr.sum
            d_contr_sum <- d |> 
              mutate(X_c = scale(X, scale = FALSE))
            contrasts(d_contr_sum$group) <- contr.sum
            m_int_cont_sum <- lm(Y ~ group * X_c, data = d_contr_sum)
            car::Anova(m_int_cont_sum, type = 3)
            #>              Sum Sq Df  F value    Pr(>F)    
            #> (Intercept) 4743668  1 202.4902 3.401e-13 ***
            #> group         19640  2   0.4192   0.66231    
            #> X_c          143772  1   6.1371   0.02067 *  
            #> group:X_c    682026  2  14.5566 7.246e-05 ***
            #> Residuals    562240 24
            
            # change reference level
            d_rl <- d_contr_sum |> 
              mutate(group_rl = relevel(group, ref = "Gb"))
            contrasts(d_rl$group_rl) <- contr.sum
            car::Anova(lm(Y ~ group_rl * X_c, data = d_rl),
                       type = 3)
            #>               Sum Sq Df  F value    Pr(>F)    
            #> (Intercept)  4743668  1 202.4902 3.401e-13 ***
            #> group_rl       19640  2   0.4192   0.66231    
            #> X_c           143772  1   6.1371   0.02067 *  
            #> group_rl:X_c  682026  2  14.5566 7.246e-05 ***
            #> Residuals     562240 24   
            • Now when the reference level is changed, the sum-of-squares and p-value for X remain the same.

Assumptions

  • Each group category has a normal distribution.
  • Each group category is independent of each other and identically distributed (iid)
  • Group categories have of similar variance (i.e. homoskedastic variance)
    • If this is violated

      • If the ratio of the largest variance to the smallest variance is less than 4, then proceed with one-way ANOVA (robust to small differences)
      • If the ratio of the largest variance to the smallest variance is greater than 4, perform a Kruskal-Wallis test. This is considered the non-parametric equivalent to the one-way ANOVA. (example)
    • EDA

      data %>%
        group_by(program) %>%
        summarize(var=var(weight_loss))
      #> A tibble: 3 x 2
      #>   program  var   
      #> 1 A      0.819
      #> 2 B      1.53 
      #> 3 C      2.46
    • Perform a statisical test to see if these variables are statistically significant (See Post-Hoc Analysis, Difference-in-Means >> EDA >> Tests for Equal Variances)

Mathematics

  • Asides:

    • This lookd like the variance formula except for not dividing by the sample size to get the “average” squared distance
    • SSA formula - the second summation just translates to multiplying by ni, the group category sample size, since there is no j in that formula
  • Calculate SSA and SSE

    \[ \begin{align} \text{SST} &= \text{SSA} + \text{SSE} \\ &= \sum_{i = 1}^a \sum_{j=i}^{n_i} (x_{i,j} - \mu)^2 \\ &= \sum_{i = 1}^a \sum_{j=i}^{n_i} (\bar x_i - \mu)^2 + \sum_{i = 1}^a \sum_{j=i}^{n_i} (x_{i,j} - \bar x_i)^2 \end{align} \]

    • \(\text{SST}\): Sum of Squares Total
    • \(\text{SSA}\): Sum of Squares between categories, treatments, or factors
      • “A” stands for attributes (i.e. categories)
    • \(\text{SSE}\): Sum of Squares of Errors; randomness within categories, treatments, or factors
    • \(x_{ij}\): The jth observation of the ith category
    • \(\bar x_i\): The sample mean of category i
    • \(\mu\): The overall sample mean
    • \(n_i\): The group category sample size
    • \(a\): The number of group categories
  • Calculate MSA and MSE

    \[ \begin{align} \text{MSE} &= \frac{\text{SSE}}{N-a} \\ \text{MSA} &= \frac{\text{SSA}}{a-1} \end{align} \]

    • Where N is the total sample size
  • Calculate the F statistic and P-Value

    \[ F = \frac{\text{MSA}}{\text{MSE}} \]

    • Find the p-value (need a table to look it up)
    • If our F statistic is less than the critical value F statistic for a \(\alpha = 0.05\) than we cannot reject the null hypothesis (no statistical difference between categories)
  • Discussion

    • If there is a group category that has more variance than the others’ attribute error (SSA), we should then pick that up when we compare it to the random error (SSE)
      • If a group is further away from the overall mean, then it will increase SSA and thus influence the overall variance but might not always increase random error

Diagnostics

  • Eta Squared
    • Metric to describe the effect size of a variable
    • Range: [0, 1]; values closer to 1 indicating that a specific variable in the model can explain a greater fraction of the variation
    • lsr::etaSquared(anova_model) (use first column of output)
    • Guidelines
      • 0.01: Effect size is small.
      • 0.06: Effect size is medium.
      • Large effect size if the number is 0.14 or above
  • Post-ANOVA Tests
    • Assume approximately Normal distributions
    • For links to more details about each test, https://www.statisticshowto.com/probability-and-statistics/statistics-definitions/post-hoc/
    • Duncan’s new multiple range test (MRT)
      • When you run Analysis of Variance (ANOVA), the results will tell you if there is a difference in means. However, it won’t pinpoint the pairs of means that are different. Duncan’s Multiple Range Test will identify the pairs of means (from at least three) that differ. The MRT is similar to the LSD, but instead of a t-value, a Q Value is used.
    • Fisher’s Least Significant Difference (LSD)
      • A tool to identify which pairs of means are statistically different. Essentially the same as Duncan’s MRT, but with t-values instead of Q values.
    • Newman-Keuls
      • Like Tukey’s, this post-hoc test identifies sample means that are different from each other. Newman-Keuls uses different critical values for comparing pairs of means. Therefore, it is more likely to find significant differences.
    • Rodger’s Method
      • Considered by some to be the most powerful post-hoc test for detecting differences among groups. This test protects against loss of statistical power as the degrees of freedom increase.
    • Scheffé’s Method
      • Used when you want to look at post-hoc comparisons in general (as opposed to just pairwise comparisons). Scheffe’s controls for the overall confidence level. It is customarily used with unequal sample sizes.
    • Tukey’s Test
      • The purpose of Tukey’s test is to figure out which groups in your sample differ. It uses the “Honest Significant Difference,” a number that represents the distance between groups, to compare every mean with every other mean.
    • Dunnett’s Test
      • Like Tukey’s this post-hoc test is used to compare means. Unlike Tukey’s, it compares every mean to a control mean.
      • {DescTools::DunnettTest}
    • Benjamin-Hochberg (BH) Procedure
      • If you perform a very large amount of tests, one or more of the tests will have a significant result purely by chance alone. This post-hoc test accounts for that false discovery rate.

One-Way

  • Measures if there’s a difference in means between any group category
  • Example: 1 control, 2 Test groups
    • Data

      data <- data.frame(Group = rep(c("control", "Test1", "Test2"), each = 10),
      value = c(rnorm(10), rnorm(10),rnorm(10)))
      data$Group<-as.factor(data$Group)
      head(data)
      #>   Group      value
      #> 1 control  0.1932123
      #> 2 control -0.4346821
      #> 3 control  0.9132671
      #> 4 control  1.7933881
      #> 5 control  0.9966051
      #> 6 control  1.1074905
    • Fit model

      model <- aov(value ~ Group, data = data)
      summary(model)
      #>             Df    Sum Sq   Mean Sq  F value  Pr(>F) 
      #> Group        2     4.407    2.2036     3.71  0.0377 *
      #> Residuals   27    16.035    0.5939
      
      # or
      lm_mod <- lm(value ~ Group, data = data)
      anova(lm_mod)
      • P-Value < 0.05 says at least 1 group category has a statistically significant different mean from another category
    • Dunnett’s Test

      DescTools::DunnettTest(x=data$value, g=data$Group)
      
      #> Dunnett's test for comparing several treatments with a control : 
      #>     95% family-wise confidence level
      #> $control
      #>                     diff    lwr.ci      upr.ci  pval   
      #> Test1-control -0.8742469 -1.678514 -0.06998022 0.0320 * 
      #> Test2-control -0.7335283 -1.537795  0.07073836 0.0768 .
      • Measures if there is any difference between treatments and the control
      • The mean score of the test1 group was significantly higher than the control group. The mean score of the test2 group was not significantly higher than the control group.
    • Tukey’s HSD

      stats::TukeyHSD(model, conf.level=.95)
      • Measures difference in means between all categories and each other

ANCOVA

  • Analysis of Covariance is used to measure the main effect and interaction effects of categorical variables on a continuous dependent variable while controlling the effects of selected other continuous variables which co-vary with the dependent variable.

  • Misc

  • Assumptions

    • Independent observations (i.e. random assignment, avoid is having known relationships among participants in the study)

    • Linearity: the relation between the covariate(s) and the dependent variable must be linear.

    • Normality: the dependent variable must be normally distributed within each subpopulation. (only needed for small samples of n < 20 or so)

    • Homogeneity of regression slopes: the beta-coefficient(s) for the covariate(s) must be equal among all subpopulations. (regression lines for these individual groups are assumed to be parallel)

      • Failure to meet this assumption implies that there is an interaction between the covariate and the treatment.
      • This assumption can be checked with an F test on the interaction of the independent variable(s) with the covariate(s).
        • If the F test is significant (i.e., significant interaction) then this assumption has been violated and the covariate should not be used as is.
        • A possible solution is converting the continuous scale of the covariate to a categorical (discrete) variable and making it a subsequent independent variable, and then use a factorial ANOVA to analyze the data.
    • The covariate (adjustment variable) and the treatment are independent

      model <- aov(grade ~ technique, data = data)
      summary(model)
      
      #>             Df Sum Sq Mean Sq F value Pr(>F)
      #> technique    2    9.8    4.92    0.14  0.869
      #> Residuals  87 3047.7  35.03
      • H0: variables are independent
  • Homogeneity of variance: variance of the dependent variable must be equal over all subpopulations (only needed for sharply unequal sample sizes)

    # response ~ treatment
    leveneTest(exam ~ technique, data = data)
    
    #>       Df F value    Pr(>F)   
    #> group  2  13.752 6.464e-06 ***
    #>       87
    
    # alt test
    fligner.test(size ~ location, my.dataframe)
    • H0: Homogeneous variance
    • This one fails
  • Fit

    ancova_model <- aov(exam ~ technique + grade, data = data)
    car::Anova(ancova_model, type="III")
    
    #>                 Sum Sq Df F value    Pr(>F)   
    #>     (Intercept) 3492.4  1 57.1325 4.096e-11 ***
    #>     technique  1085.8  2  8.8814 0.0003116 ***
    #>     grade          4.0  1  0.0657 0.7982685   
    #>     Residuals  5257.0 86
    • When adjusting for current grade (covariate), study technique (treatment) has a significant effect on the final exam score (response).
  • Does the effect differ by treatment

    postHocs <- multicomp::glht(ancova_model, linfct = mcp(technique = "Tukey"))
    summary(postHocs)
    
    #>             Estimate Std. Error t value Pr(>|t|)   
    #> B - A == 0   -5.279      2.021  -2.613  0.0284 * 
    #> C - A == 0    3.138      2.022   1.552  0.2719   
    #> C - B == 0    8.418      2.019   4.170  <0.001 ***
  • Example: RCT

    \[ \begin{align} \text{post}_i &\sim \mathcal{N}(\mu_i, \sigma_\epsilon)\\ \mu_i &= \beta_0 + \beta_1 \text{tx}_i + \beta_2 \text{pre}_i \end{align} \]

    w2 <- glm(
      data = dw,
      family = gaussian,
      post ~ 1 + tx + pre)
    • Specification
      • post, pre: The post-treatment and pre-treatment measurement of the outcome variable
      • tx: The treatment indicator variable
      • \(\beta_0\): Population mean for the outcome variable in the control group
      • \(\beta_1\): Parameter is the population level difference in pre/post change in the treatment group, compared to the control group.
        • Also a causal estimate for the average treatment effect (ATE) in the population, τ
      • Because pre is added as a covariate, both \(\beta_0\) and \(\beta_1\) are conditional on the outcome variable, as collected at baseline before random assignment.

MANOVA

  • Multivariate Analysis of Variance is used to analyze how one or more factor variables affects a multivariate response (multiple response variables).
  • Assumptions
    • Multivariate Normality – Response variables are multivariate normally distributed within each group of the factor variable(s).
    • Independence – Each observation is randomly and independently sampled from the population.
    • Equal Variance – The population covariance matrices of each group are equal.
    • No Multivariate Outliers – There are no extreme multivariate outliers.
  • {MVN} (Vignette) - Tests for multivariate normality
    • Contains the three most widely used multivariate normality: Mardia’s, Henze-Zirkler’s and Royston’s
    • Graphical approaches, including chi-square Q-Q, perspective and contour plots.
    • Two multivariate outlier detection methods, which are based on robust Mahalanobis distances

Repeated Measures

  • Example; Relationship to Mixed Effects Models (source)

    data("stroop", package = "afex")
    stroop_sub <- subset(stroop, !is.na(rt))
    
    # repeated measures ANOVA
    A <- afex::aov_ez(id = "pno", 
                      dv = "rt", 
                      data = stroop_sub, 
                      within = c("congruency", "condition"),
                      fun_aggregate = mean)
    
    # the aggregation that's occurring
    # stroop_agg <- 
    #   stroop_sub |> 
    #   summarize(mean_rt = mean(rt),
    #             .by = c("pno", "congruency", "condition"))
    
    
    # fit models on aggregated data 
    # varying intercepts
    MLM1 <- lmerTest::lmer(rt ~ congruency * condition + (1 | pno),
                           data = A$data$long)
    
    # varying slopes and intercepts
    MLM2 <- lmerTest::lmer(rt ~ congruency * condition + 
                             (congruency + condition | pno),
                           data = A$data$long)
    
    cbind(
      A$anova_table[,"Pr(>F)", drop = FALSE],
      "p (vary_slop_intrc)" = anova(MLM2)[,"Pr(>F)"],
      "p (vary_intrc)" = anova(MLM1)[,"Pr(>F)"]  
    ) |> 
      apply(2, scales::label_pvalue())
    
    #>                      Pr(>F)   p (vary_slop_intrc) p (vary_intrc)
    #> congruency           "<0.001" "<0.001"            "<0.001"      
    #> condition            "0.038"  "0.038"             "0.005"       
    #> congruency:condition "0.002"  "0.002"             "0.160"  
    • Shows the equivalency of the repeated anova to the varying slopes and intercepts mixed effects model
      • The aggregation isn’t necessary, it was just done in order to present the edge case of “repeated” measure = 1 and showing that including varying slopes is the better fit even in this extreme case.
    • Omitting the varying slopes results in a model that doesn’t detect the significance of the interaction even using aggregated data (see source)