Testing
Misc
- If the code is for a specific dataset/pipeline, then assertive testing makes more sense than traditional software testing.
- If the code is general purpose, it should be in a package undergo traditional software testing
- Also see
- Packages
- {ensure} - A RStudio addin for drafting {testthat} unit testing code using LLMs
- Triggering the addin will open a corresponding test file and begin writing tests into it.
- {assertthat}
- {testthat}
- {chk} - Check User-Supplied Function Arguments
- {{pytest}} - It scales down, being super easy to use, but scales up, with mighty features and a rich ecosystem of plugins.
- Doesn’t add the entry point directory to
sys.path. However, you can force it to do so with configuration. (See Make your Python life easier by learning how imports find things about entry points)
- Doesn’t add the entry point directory to
- {{tox}} and {{nox}} - Useful to run tests on different versions of Python to be sure it works with all of them. Both are good but nox is recommended
- {ensure} - A RStudio addin for drafting {testthat} unit testing code using LLMs
- Optimize Testing Structure
- Notes from Optimize Your Unit Test Structure for Faster Feedback
- Break down large scripts with multiple functions into smaller ones when creating tests.
- Example:
R/plots.Rcontains multiple plot functions.- Split
tests/testthat/test-plots.Rinto:tests/testthat/test-plots-barchart.Rtests/testthat/test-plots-boxplot.Rtests/testthat/test-plots-scatter.R
- Then we can run tests only for a selected function that we’ve changed code for with
testthat::test_file("tests/testthat/test-plots-barchart.R"). - We can work on one function in isolation, until it meets all requirements and check integration with the rest of the codebase at a later stage.
- Split
- Subsets of tests can be ran using the filter argument
testthat::test_dir("tests/testthat", filter = "plots").- Runs all files that contain
"plot"in their name.
- Runs all files that contain
Using a helper function
.test <- function(filter = NULL) { testthat::test_dir("tests/testthat", filter = filter) }- Put it in
.Rprofile .test("barchart")to run test for a single function and get super fast feedback,.test("plots")to run tests for a bigger code surface, for example to see how things integrate within a specific domain, or.test()to run all tests,
- Put it in
Assertive Testing
Testing that happens within the function
Check for NAs in column
if (anyNA(dataset$body_mass_g)) { rlang::abort("NAs are present in 'body_mass_g' column") }Assert that I have not inadvertently changed the length of the output dataset either by accidentally dropping rows or accidentally introducing duplicates
library(testthat) make_my_rectangle <- function(dataset_a, dataset_b, dataset_c) { ... Do stuff expect_equal(nrow(output_dataset), nrow(dataset_a)) expect_false(any(duplicated(output_dataset$id))) output_dataset }Error functions
Too much error code within a function reduces readability. Using functions reduces the lines of code and is reusable.
Tidyverse recommends using
cli::abort- Makes it easy to generate bulleted lists.
- Uses glue style interpolation to insert data into the error.
- Supports a wide range of inline markup.
- Provides convenient tools to chain errors together.
- Can control the name of the function shown in the error.
Example: From link
check_if_squarable <- function(x, arg = rlang::caller_arg(x), call = rlang::caller_env()) { x_len <- length(x) dims <- sqrt(x_len) squarable_length <- floor(dims) == dims if (!squarable_length) { cli::cli_abort( message = c( "Provided vector is not of a squarable length", "{.arg {arg}} is of length {.num {x_len}}", "This cannot be represented as a perfect square" ), call = call ) } } check_if_not_numeric <- function(x, arg = rlang::caller_arg(x), call = rlang::caller_env()) { if (!is.numeric(x)) { cli::cli_abort( message = c( "Provided vector, {.arg {arg}}, must be {.cls numeric}, not {.cls {class(x)}}", "We see that {.run is.numeric({.arg {arg}})} returns {.cls {class(x)}}" ), call = call ) } } # Messages vector_to_square(data = 1:4) #> [,1] [,2] #> [1,] 1 2 #> [2,] 3 4 vector_to_square(data = 1:5) #> Error in `vector_to_square()`: #> ! Provided vector is not of a squarable length #> `data` is of length 5 #> This cannot be represented as a perfect square vector_to_square(data = LETTERS[1:4]) #> Error in `vector_to_square()`: #> ! Provided vector, `data`, must be <numeric>, not <character> #> We see that `` is.numeric(`data`) `` returns <character>- Adding
rlang::call_envmakes it so when the error is tripped, the error message refers to the function and not the error function. - Adding
rlang::call_argmakes it so when the error is tripped, the error message states the function’s argument and not “x”
- Adding