Testing
Misc
- If the code is for a specific dataset/pipeline, then assertive testing makes more sense than traditional software testing.
- If the code is general purpose, it should be in a package undergo traditional software testing
- Also see
- Packages
- {ensure} - A RStudio addin for drafting {testthat} unit testing code using LLMs
- Triggering the addin will open a corresponding test file and begin writing tests into it.
- {assertthat}
- {testthat}
- {{pytest}} - It scales down, being super easy to use, but scales up, with mighty features and a rich ecosystem of plugins.
- Doesn’t add the entry point directory to
sys.path
. However, you can force it to do so with configuration. (See Make your Python life easier by learning how imports find things about entry points)
- Doesn’t add the entry point directory to
- {{tox}} and {{nox}} - Useful to run tests on different versions of Python to be sure it works with all of them. Both are good but nox is recommended
- {ensure} - A RStudio addin for drafting {testthat} unit testing code using LLMs
Assertive Testing
Testing that happens within the function
Check for NAs in column
if (anyNA(dataset$body_mass_g)) { ::abort("NAs are present in 'body_mass_g' column") rlang }
Assert that I have not inadvertently changed the length of the output dataset either by accidentally dropping rows or accidentally introducing duplicates
library(testthat) <- function(dataset_a, dataset_b, dataset_c) { make_my_rectangle ... Do stuff expect_equal(nrow(output_dataset), nrow(dataset_a)) expect_false(any(duplicated(output_dataset$id))) output_dataset }
Error functions
Too much error code within a function reduces readability. Using functions reduces the lines of code and is reusable.
Tidyverse recommends using
cli::abort
- Makes it easy to generate bulleted lists.
- Uses glue style interpolation to insert data into the error.
- Supports a wide range of inline markup.
- Provides convenient tools to chain errors together.
- Can control the name of the function shown in the error.
Example: From link
<- function(x, check_if_squarable arg = rlang::caller_arg(x), call = rlang::caller_env()) { <- length(x) x_len <- sqrt(x_len) dims <- floor(dims) == dims squarable_length if (!squarable_length) { ::cli_abort( climessage = c( "Provided vector is not of a squarable length", "{.arg {arg}} is of length {.num {x_len}}", "This cannot be represented as a perfect square" ),call = call ) } } <- function(x, check_if_not_numeric arg = rlang::caller_arg(x), call = rlang::caller_env()) { if (!is.numeric(x)) { ::cli_abort( climessage = c( "Provided vector, {.arg {arg}}, must be {.cls numeric}, not {.cls {class(x)}}", "We see that {.run is.numeric({.arg {arg}})} returns {.cls {class(x)}}" ),call = call ) } } # Messages vector_to_square(data = 1:4) #> [,1] [,2] #> [1,] 1 2 #> [2,] 3 4 vector_to_square(data = 1:5) #> Error in `vector_to_square()`: #> ! Provided vector is not of a squarable length #> `data` is of length 5 #> This cannot be represented as a perfect square vector_to_square(data = LETTERS[1:4]) #> Error in `vector_to_square()`: #> ! Provided vector, `data`, must be <numeric>, not <character> #> We see that `` is.numeric(`data`) `` returns <character>
- Adding
rlang::call_env
makes it so when the error is tripped, the error message refers to the function and not the error function. - Adding
rlang::call_arg
makes it so when the error is tripped, the error message states the function’s argument and not “x”
- Adding