Testing

Misc

  • If the code is for a specific dataset/pipeline, then assertive testing makes more sense than traditional software testing.
  • If the code is general purpose, it should be in a package undergo traditional software testing
  • Also see
  • Packages
    • {ensure} - A RStudio addin for drafting {testthat} unit testing code using LLMs
      • Triggering the addin will open a corresponding test file and begin writing tests into it.
    • {assertthat}
    • {testthat}
    • {{pytest}} - It scales down, being super easy to use, but scales up, with mighty features and a rich ecosystem of plugins.
    • {{tox}} and {{nox}} - Useful to run tests on different versions of Python to be sure it works with all of them. Both are good but nox is recommended

Assertive Testing

  • Testing that happens within the function

  • Check for NAs in column

    if (anyNA(dataset$body_mass_g)) {
      rlang::abort("NAs are present in 'body_mass_g' column")
    }
  • Assert that I have not inadvertently changed the length of the output dataset either by accidentally dropping rows or accidentally introducing duplicates

    library(testthat)
    make_my_rectangle <- function(dataset_a, dataset_b, dataset_c) {
    
      ... Do stuff
    
      expect_equal(nrow(output_dataset), nrow(dataset_a))
      expect_false(any(duplicated(output_dataset$id)))
    
      output_dataset
    }
  • Error functions

    • Too much error code within a function reduces readability. Using functions reduces the lines of code and is reusable.

    • Tidyverse recommends using cli::abort

    • Example: From link

      check_if_squarable <- function(x,
                                     arg = rlang::caller_arg(x),
                                     call = rlang::caller_env()) {
        x_len <- length(x)
        dims <- sqrt(x_len)
      
        squarable_length <- floor(dims) == dims
      
        if (!squarable_length) {
          cli::cli_abort(
            message = c(
              "Provided vector is not of a squarable length",
              "{.arg {arg}} is of length {.num {x_len}}",
              "This cannot be represented as a perfect square"
            ),
            call = call
          )
        }
      }
      
      check_if_not_numeric <- function(x,
                                       arg = rlang::caller_arg(x),
                                       call = rlang::caller_env()) {
        if (!is.numeric(x)) {
          cli::cli_abort(
            message = c(
              "Provided vector, {.arg {arg}}, must be {.cls numeric}, not {.cls {class(x)}}",
              "We see that {.run is.numeric({.arg {arg}})} returns {.cls {class(x)}}"
            ),
            call = call
          )
        }
      
      }
      
      # Messages
      vector_to_square(data = 1:4)
      #>      [,1] [,2]
      #> [1,]    1    2
      #> [2,]    3    4
      vector_to_square(data = 1:5)
      #> Error in `vector_to_square()`:
      #> ! Provided vector is not of a squarable length
      #> `data` is of length 5
      #> This cannot be represented as a perfect square
      vector_to_square(data = LETTERS[1:4])
      #> Error in `vector_to_square()`:
      #> ! Provided vector, `data`, must be <numeric>, not <character>
      #> We see that `` is.numeric(`data`) `` returns <character>
      • Adding rlang::call_env makes it so when the error is tripped, the error message refers to the function and not the error function.
      • Adding rlang::call_arg makes it so when the error is tripped, the error message states the function’s argument and not “x”