Testing

Misc

If the code is for a specific dataset/pipeline, then assertive testing makes more sense than traditional software testing.
If the code is general purpose, it should be in a package undergo traditional software testing
Also see
- Package Development >> Testing
Packages
- {ensure} - A RStudio addin for drafting {testthat} unit testing code using LLMs
  - Triggering the addin will open a corresponding test file and begin writing tests into it.
- {assertthat}
- {testthat}
- {chk} - Check User-Supplied Function Arguments
- {{pytest}} - It scales down, being super easy to use, but scales up, with mighty features and a rich ecosystem of plugins.
  - Doesn’t add the entry point directory to sys.path. However, you can force it to do so with configuration. (See Make your Python life easier by learning how imports find things about entry points)
- {{tox}} and {{nox}} - Useful to run tests on different versions of Python to be sure it works with all of them. Both are good but nox is recommended
Optimize Testing Structure
- Notes from Optimize Your Unit Test Structure for Faster Feedback
- Break down large scripts with multiple functions into smaller ones when creating tests.
- Example: R/plots.R contains multiple plot functions.
  - Split tests/testthat/test-plots.R into:
    - tests/testthat/test-plots-barchart.R
    - tests/testthat/test-plots-boxplot.R
    - tests/testthat/test-plots-scatter.R
  - Then we can run tests only for a selected function that we’ve changed code for with testthat::test_file("tests/testthat/test-plots-barchart.R").
  - We can work on one function in isolation, until it meets all requirements and check integration with the rest of the codebase at a later stage.
- Subsets of tests can be ran using the filter argument
  - testthat::test_dir("tests/testthat", filter = "plots").
    - Runs all files that contain "plot" in their name.
  - Using a helper function
    .test <- function(filter = NULL) { testthat::test_dir("tests/testthat", filter = filter) }
    - Put it in .Rprofile
    - .test("barchart") to run test for a single function and get super fast feedback,
    - .test("plots") to run tests for a bigger code surface, for example to see how things integrate within a specific domain, or
    - .test() to run all tests,

Assertive Testing

Testing that happens within the function

Check for NAs in column

if (anyNA(dataset$body_mass_g)) {
  rlang::abort("NAs are present in 'body_mass_g' column")
}

Assert that I have not inadvertently changed the length of the output dataset either by accidentally dropping rows or accidentally introducing duplicates

library(testthat)
make_my_rectangle <- function(dataset_a, dataset_b, dataset_c) {

  ... Do stuff

  expect_equal(nrow(output_dataset), nrow(dataset_a))
  expect_false(any(duplicated(output_dataset$id)))

  output_dataset
}

Error functions

Too much error code within a function reduces readability. Using functions reduces the lines of code and is reusable.
Tidyverse recommends using cli::abort
- Makes it easy to generate bulleted lists.
- Uses glue style interpolation to insert data into the error.
- Supports a wide range of inline markup.
- Provides convenient tools to chain errors together.
- Can control the name of the function shown in the error.

Example: From link

check_if_squarable <- function(x,
                               arg = rlang::caller_arg(x),
                               call = rlang::caller_env()) {
  x_len <- length(x)
  dims <- sqrt(x_len)

  squarable_length <- floor(dims) == dims

  if (!squarable_length) {
    cli::cli_abort(
      message = c(
        "Provided vector is not of a squarable length",
        "{.arg {arg}} is of length {.num {x_len}}",
        "This cannot be represented as a perfect square"
      ),
      call = call
    )
  }
}

check_if_not_numeric <- function(x,
                                 arg = rlang::caller_arg(x),
                                 call = rlang::caller_env()) {
  if (!is.numeric(x)) {
    cli::cli_abort(
      message = c(
        "Provided vector, {.arg {arg}}, must be {.cls numeric}, not {.cls {class(x)}}",
        "We see that {.run is.numeric({.arg {arg}})} returns {.cls {class(x)}}"
      ),
      call = call
    )
  }

}

# Messages
vector_to_square(data = 1:4)
#>      [,1] [,2]
#> [1,]    1    2
#> [2,]    3    4
vector_to_square(data = 1:5)
#> Error in `vector_to_square()`:
#> ! Provided vector is not of a squarable length
#> `data` is of length 5
#> This cannot be represented as a perfect square
vector_to_square(data = LETTERS[1:4])
#> Error in `vector_to_square()`:
#> ! Provided vector, `data`, must be <numeric>, not <character>
#> We see that `` is.numeric(`data`) `` returns <character>

Adding rlang::call_env makes it so when the error is tripped, the error message refers to the function and not the error function.
Adding rlang::call_arg makes it so when the error is tripped, the error message states the function’s argument and not “x”