Testing
Misc
- If the code is for a specific dataset/pipeline, then assertive testing makes more sense than traditional software testing.
- If the code is general purpose, it should be in a package undergo traditional software testing
- Also see
- Packages
- {assertthat}
- {testthat}
- {{pytest}} - It scales down, being super easy to use, but scales up, with mighty features and a rich ecosystem of plugins.
- Doesn’t add the entry point directory to
sys.path
. However, you can force it to do so with configuration. (See Make your Python life easier by learning how imports find things about entry points)
- Doesn’t add the entry point directory to
- {{tox}} and {{nox}} - Useful to run tests on different versions of Python to be sure it works with all of them. Both are good but nox is recommended
Assertive Testing
Testing that happens within the function
Check for NAs in column
if (anyNA(dataset$body_mass_g)) { ::abort("NAs are present in 'body_mass_g' column") rlang }
Assert that I have not inadvertently changed the length of the output dataset either by accidentally dropping rows or accidentally introducing duplicates
library(testthat) <- function(dataset_a, dataset_b, dataset_c) { make_my_rectangle ... Do stuff expect_equal(nrow(output_dataset), nrow(dataset_a)) expect_false(any(duplicated(output_dataset$id))) output_dataset }
Error functions
Too much error code within a function reduces readability. Using functions reduces the lines of code and is reusable.
Tidyverse recommends using
cli::abort
- Makes it easy to generate bulleted lists.
- Uses glue style interpolation to insert data into the error.
- Supports a wide range of inline markup.
- Provides convenient tools to chain errors together.
- Can control the name of the function shown in the error.
Example: From link
<- function(x, check_if_squarable arg = rlang::caller_arg(x), call = rlang::caller_env()) { <- length(x) x_len <- sqrt(x_len) dims <- floor(dims) == dims squarable_length if (!squarable_length) { ::cli_abort( climessage = c( "Provided vector is not of a squarable length", "{.arg {arg}} is of length {.num {x_len}}", "This cannot be represented as a perfect square" ),call = call ) } } <- function(x, check_if_not_numeric arg = rlang::caller_arg(x), call = rlang::caller_env()) { if (!is.numeric(x)) { ::cli_abort( climessage = c( "Provided vector, {.arg {arg}}, must be {.cls numeric}, not {.cls {class(x)}}", "We see that {.run is.numeric({.arg {arg}})} returns {.cls {class(x)}}" ),call = call ) } } # Messages vector_to_square(data = 1:4) #> [,1] [,2] #> [1,] 1 2 #> [2,] 3 4 vector_to_square(data = 1:5) #> Error in `vector_to_square()`: #> ! Provided vector is not of a squarable length #> `data` is of length 5 #> This cannot be represented as a perfect square vector_to_square(data = LETTERS[1:4]) #> Error in `vector_to_square()`: #> ! Provided vector, `data`, must be <numeric>, not <character> #> We see that `` is.numeric(`data`) `` returns <character>
- Adding
rlang::call_env
makes it so when the error is tripped, the error message refers to the function and not the error function. - Adding
rlang::call_arg
makes it so when the error is tripped, the error message states the function’s argument and not “x”
- Adding