Base R
Misc
Magrittr + base
%>% {plot(.$hp, .$mpg)} mtcars %$% plot(hp, mpg) mtcars
- By wrapping the RHS in curly braces, we can override the rule where the LHS is passed to the first argument ## Options {#sec-baser-opts .unnumbered}
Remove scientific notation
options(scipen = 999)
Wide and long printing tibbles
# in .Rprofile makeActiveBinding(".wide", function() { print(.Last.value, width = Inf) }, .GlobalEnv)
- After printing a tibble, if you want to see it in wide, then just type .wide + ENTER.
- Can have similar bindings for `.long` and `.full`.
Heredocs - Powerful feature in various programming languages that allow you to define a block of text within the code, preserving line breaks, indentation, and other whitespace.
<- r"( text This is a multiline string in R)" cat(text)
User Defined Functions
Anonymous (aka lambda) functions:
\(x) {}
(> R 4.1)function(x) { which.max(x$mpg), ] x[ }# equivalent to the above \(x) {which.max(x$mpg), ] x[ }
Define and call an anonymous function at the same time
<- c(1:10) n <- (\(x) x+1)(n) moose moose#> [1] 2 3 4 5 6 7 8 9 10 11
Dots (…)
Misc
User Defined Functions
<- function(...) { moose <- list(...) dots <- names(dots) dots_names if (is.null(dots_names) || "" %in% dots_names { stop("All arguments must be named") }}
Nested Functions
<- function(...){ f02 <- list(...) vv print(vv) }<- function(...){ f01 f02(b = 2,...) } f01(a=1,c=3) #> $b #> [1] 2 #> #> $a #> [1] 1 #> #> $c #> [1] 3
Subset dots values
<- function(...) { add2 1 + ..2 .. }add2(3, 0.14) # 3.14
Subset dots dynamically:
...elt(n)
- Set a value to n and get back the value of that argument
Number of arguments in … :
...length()
Functions
do.call
- allows you to call other functions by constructing the function call as a listArgs
- what – Either a function or a non-empty character string naming the function to be called
- args – A list of arguments to the function call. The names attribute of args gives the argument names
- quote – A logical value indicating whether to quote the arguments
- envir – An environment within which to evaluate the call. This will be most useful if what is a character string and the arguments are symbols or quoted expressions
Example: Apply function to list of vectors
<- list(c(1, 2, 3), c(4, 5, 6), c(7, 8, 9)) vectors <- do.call(rbind, vectors) combined_matrix combined_matrix## [,1] [,2] [,3] ## [1,] 1 2 3 ## [2,] 4 5 6 ## [3,] 7 8 9
Example: Apply multiple functions
<- list( data_frames data.frame(a = 1:3), data.frame(a = 4:6), data.frame(a = 7:9) )<- do.call( mean_results rbind, lapply(data_frames, function(df) mean(df$a)) ) mean_results## [,1] ## [1,] 2 ## [2,] 5 ## [3,] 8
- First the mean is calculated for column a of each df using
lapply
lapply
is supplying the data fordo.call
in the required format, which is a list or character vector.
- Second the results are combined into a matrix with
rbind
- First the mean is calculated for column a of each df using
sink
- used to divert R output to an external connection.Use Cases: exporting data to a file, logging R output, or debugging R code.
Args
- file: The name of the file to which R output will be diverted. If file is NULL, then R output will be diverted to the console.
- append: A logical value indicating whether R output should be appended to the file (TRUE) or overwritten (FALSE). The default value is FALSE.
- type: A character string. Either the output stream or the messages stream. The name will be partially match so can be abbreviated.
- split: logical: if TRUE, output will be sent to the new sink and the current output stream, like the Unix program tee.
Example: Logging output of code to file
sink("r_output.log") # Redirect output to this file # Your R code goes here sink() # Turn off redirection
- output file could also have an extension like “.txt”
Example: Debugging
sink("my_function.log") # Redirect output to this file my_function() sink() # Turn off redirection
Example: Appending output to a file
sink("output.txt", append = TRUE) # Append output to the existing file cat("Additional text\n") # Append custom text plain textsink() # Turn off redirection
pmin
andpmax
Find the element-wise maximum and minimum values across vectors in R
Example
<- c(3, 9, 2, 6) vec1 <- c(7, 1, 8, 4) vec2 pmax(vec1, vec2) #> [1] 7 9 8 6 pmin(vec1, vec2) #> [1] 3 1 2 4
Example: With NAs
<- c(7, 3, NA, 12) data1 <- c(9, NA, 5, 8) data2 pmax(data1, data2, na.rm = TRUE) #> [1] 9 3 5 12
switch
Example:
switch(parallel, windows = "snow" -> para_proc, other = "multicore" -> para_proc, no = "no" -> para_proc, stop(sprintf("%s is not one of the 3 possible parallel argument values. See documentation.", parallel)))
parallel is the function argument. If it doesn’t match one of the 3 values, then an error is thrown.
If the argument value is matched, then the quoted value is stored in para_proc
dynGet
Looks for objects in the environment of a function.
When an object from the outer function is an input for a function nested around 3 layers deep or more, it may not be found by that most inner function.
dynGet
allows that function to find the object in the outer frameArguments
minframe: Integer specifying the minimal frame number to look into (i.e. how far back to look for the object)
inherits: Should the enclosing frames of the environment be searched?
Example:
1function(args) { if (method == "kj") { <- purrr::map2(grid$dat, ncv_list $repeats, gridfunction(dat, reps) { ::nested_cv(dat, rsampleoutside = vfold_cv(v = 10, repeats = dynGet("reps")), inside = bootstraps(times = 25)) }) } } 2function(data) { if (chk::vld_used(...)) { <- list(...) dots <- init_boot_args list(data = dynGet("data"), stat_fun = cles_boot, # internal function group_variables = group_variables, paired = paired) <- get_boot_args append(init_boot_args, dots) }<- cles_booted do.call( get_boot_ci, get_boot_args ) }
- 1
- Example from Nested Cross-Validation Comparison
- 2
- Example from {ebtools::cles}
match.arg
Partially matches a function’s argument values to list of choices. If the value doesn’t match the choices, then an error is thrown
Example:
<- "input_le" keep_input <- keep_input_val match.arg(keep_input, choices = c("input_lags", "input_leads", "both"), several.ok = FALSE) keep_input_val#> [1] "input_leads"
- several.ok = FALSE says only 1 match is allowed otherwise an error is thrown.
- The error message is pretty informative btw.
match.fun
Example
<- function(a,b) { f + b a }<- function(a,b,c) { g + b) * c (a }<- function(d,e) { h - e d }<- function(FUN, ...) { yolo <- match.fun(FUN) FUN <- list(...) params <- formals(FUN) FUN_formals <- names(params) %in% names(FUN) idx do.call(FUN, params[idx]) }yolo(h, d = 2, e = 3) #> -1
Pipe
Benefits of base pipe
- Magrittr pipe is bloated with special features which may make it slower than the base pipe
- If not using tidyverse, it’s one less dependency (maybe one day it will be deprecated in tidyverse)
Base pipe with base and anonymous functions
# verbosely |> (function(.) plot(.$hp, .$mpg))() mtcars # using the anonymous function shortcut, emulating the dot syntax |> (\(.) plot(.$hp, .$mpg))() mtcars # or if you prefer x to . |> (\(x) plot(x$hp, x$mpg))() mtcars # or if you prefer to be explicit with argument names |> (\(data) plot(data$hp, data$mpg))() mtcars
Using “_” placeholder:
mtcars |> lm(mpg ~ disp, data = _)
mtcars |> lm(mpg ~ disp, data = _) |> _$coef
Base pipe .[ ] hack
|> wiki read_html() |> html_nodes("table") |> 2]])() |> (\(.) .[[html_table(fill = TRUE) |> clean_names() # instead of <- wiki %>% djia read_html() %>% html_nodes("table") %>% 2]] %>% .[[html_table(fill = TRUE) %>% clean_names()
Magrittr, base pipe differences
- magrittr: %>% allows you change the placement with a . placeholder.
- base: R 4.2.0 added a _ placeholder to the base pipe, with one additional restriction: the argument has to be named
- magrittr: With %>% you can use . on the left-hand side of operators like “\(", \[\[, \[ and use in multiple arguments (e.g. df %>% {split(.\)x, .$y))
- base: can hack this by using anonymous function
- see Base pipe with base and anonymous functions above
- see Base pipe .[ ] hack above
- base: can hack this by using anonymous function
- magrittr: %>% allows you to drop the parentheses when calling a function with no other arguments (e.g. dat %>% distinct)
- base: |> always requires the parentheses. (e.g. dat |> distinct())
- magrittr: %>% allows you to start a pipe with . to create a function rather than immediately executing the pipe
- magrittr: %>% allows you change the placement with a . placeholder.
Purrr with base pipe
|> data_list map(\(x) clean_names(x)) # instead of %>% data_list map( ~.x %>% clean_names) # with split |> star split(~variable) |> map_df(\(.) hedg_g(., reading ~ value), .id = "variable") # instead of %>% star split(.$variable) %>% map_df(. %>% hedg_g(., reading ~ value), .id = "variable")
Strings
Resources
- From base R - base R equivalents to {stringr} functions
sprintf
<- 123.456 # Create example data x sprintf("%f", x) # sprintf with default specification #> [1] "123.456000" sprintf("%.10f", x) # sprintf with ten decimal places #> [1] "123.4560000000" sprintf("%.2f", x) # sprintf with two rounded decimal places #> [1] "123.46" sprintf("%1.0f", x) # sprintf without decimal places #> [1] "123" sprintf("%10.0f", x) # sprintf with space before number #> [1] " 123" sprintf("%10.1f", x) # Space before number & decimal places #> [1] " 123.5" sprintf("%-15f", x) # Space on right side #> [1] "123.456000 " sprintf("%+f", x) # Print plus sign before number #> [1] "+123.456000" sprintf("%e", x) # Exponential notation #> [1] "1.234560e+02" sprintf("%E", x) # Exponential with upper case E #> [1] "1.234560E+02" sprintf("%g", x) # sprintf without decimal zeros #> [1] "123.456" sprintf("%g", 1e10 * x) # Scientific notation #> [1] "1.23456e+12" sprintf("%.13g", 1e10 * x) # Fixed decimal zeros #> [1] "1234560000000" paste0(sprintf("%f", x), # Print %-sign at the end of number "%") #> [1] "123.456000%" sprintf("Let's create %1.0f more complex example %1.0f you.", 1, 4) #> [1] "Let's create 1 more complex example 4 you."
str2lang
- Allows you to turn plain text into code.<- "circumference / age" growth_rate class(str2lang(growth_rate)) #> [1] "call"
Example: Basic
eval(str2lang("2 + 2")) #> [1] 4 eval(str2lang("x <- 3")) x#> [1] 3
Example: Run formula against a df
<- "circumference / age" growth_rate with(Orange, eval(str2lang(growth_rate))) #> [1] 0.25423729 0.11983471 0.13102410 0.11454183 0.09748172 0.10349854 #> [7] 0.09165613 0.27966102 0.14256198 0.16716867 0.15537849 0.13972380 #> [13] 0.14795918 0.12831858 0.25423729 0.10537190 0.11295181 0.10756972 #> [19] 0.09341998 0.10131195 0.08849558 0.27118644 0.12809917 0.16867470 #> [25] 0.16633466 0.14541024 0.15233236 0.13527181 0.25423729 0.10123967 #> [31] 0.12198795 0.12450199 0.11535337 0.12682216 0.11188369
Conditionals
&& and || are intended for use solely with scalars, they return a single logical value.
- Since they always return a scalar logical, you should use && and || in your if/while conditional expressions (when needed). If an & or | is used, you may end up with a non-scalar vector inside if (…) {} and R will throw an error.
& and | work with multivalued vectors, they return a vector whose length matches their input arguments.
Alternative way of negating a condition or set of conditions:
if (!(condition))
Makes it less readable IMO, but maybe for a complicated set of conditions if makes more sense in your head to do it this way
Example
if (!(nr == nrow(iris) || (nr == nrow(iris) - 2))) {print("moose")}
Using
else if
if (condition1) { expr1else if (condition2) { } expr2else { } expr3 }
stopifnot
<- function(steps_forward, newdata) { pred_fn stopifnot(steps_forward >= 1) stopifnot(nrow(newdata) == 1) = model_map[[steps_forward]] model_f # apply the model to the last "before the test period" row to get # the k-steps_forward prediction as.numeric(predict(model_f, newdata = newdata)) }
%||%
Collapse operator which acts like:
`%||%` <- function(x, y) { if (is_null(x)) y else x }
- Says if the first (left-hand) input
x
isNULL
, returny
. Ifx
is notNULL
, return the input
- Says if the first (left-hand) input
Use Cases
Determine whether a function argument is NULL
<- github_remote function(repo, username = NULL, ...) { <- parse_git_repo(repo) meta $username <- username %||% metagetOption("github.user") %||% stop("Unknown username") }
Within the
print
argument collapselibrary(rlang) <- function(x) { add_commas if (length(x) <= 1) { <- NULL collapse_arg else { } <- ", " collapse_arg }print(paste0(x, collapse = collapse_arg %||% "")) } add_commas(c("apples")) 1] "apples" [add_commas(c("apples", "bananas")) 1] "apples, bananas" [
Ordering Columns and Sorting Rows
Ascending:
with(df, order(col2)), ] df[# or order(df$col2), ] df[# or sort_by(df, df$col2)
- col2 is the column used to sort the df by
Descending:
df[with(df, order(-col2)), ]
By Multiple Columns
- Sequentially:
sort_by(df, df$col1, df$col2, df$col3)
- Says sort by col1 then col2 then col3
- Descending then Ascending:
df[with(df, order(-col2, id)), ]
- Sequentially:
Change position of columns
# Reorder column by index manually <- df[, c(5, 1, 4, 2, 3)] df2 <- df[, c(1, 5, 2:4)] df3 # Reorder column by name manually = c("emp_id","name","superior_emp_id","dept_id","dept_branch_id") new_order <- df[, new_order] df2
Set Operations
Unique values in A that are not in B
<- c("thing", "object") a <- c("thing", "gift") b unique(a[!(a %in% b)]) #> [1] "object" setdiff(a, b)
setdiff
is slower
Unique values of the two vectors combined
unique(c(a, b)) #> [1] "thing" "object" "gift" union(a, b)
union
is just a wrapper forunique
Subsetting
Lists and Vectors
Removing Rows
# Remove specific value from vector !x == 'A'] x[ # Remove multiple values by list !x %in% c('A', 'D', 'E')] x[ # Using setdiff setdiff(x, c('A','D','E')) # Remove elements by index -c(1,2,5)] x[ # Using which -which(x %in% c('D','E') )] x[ # Remove elements by name <- c(C1='A',C2='B',C3='C',C4='E',C5='G') x !names(x) %in% c('C1','C2')] x[
Dataframes
Remove specific Rows
<- df[-c(25, 3, 62), ] df
Remove column by name
<- df[, which(names(df) == "col_name")] df <- subset(df, select = -c(col_name)) df <- df[, !names(df) %in% c("col1", "col2"), drop = FALSE] df
Filter and Select
<- subset(df, subset = col1 > 56, select = c(col2, col3)) df
Joins
- Inner join:
inner <- merge(flights, weather, by = mergeCols)
- Left (outer) join:
left <- merge(flights, weather, by = mergeCols, all.x = TRUE)
- Right (outer) join:
right <- merge(flights, weather, by = mergeCols, all.y = TRUE)
- Full (outer) join:
full <- merge(flights, weather, by = mergeCols, all = TRUE)
- Cross Join (Cartesian product):
cross <- merge(flights, weather, by = NULL)
- Natural join:
natural <- merge(flights, weather)
Error Handling
stop
Example:
switch(parallel, windows = "snow" -> para_proc, other = "multicore" -> para_proc, no = "no" -> para_proc, stop(sprintf("%s is not one of the 3 possible parallel argument values. See documentation.", parallel)))
- parallel is the function argument. If it doesn’t match one of the 3 values, then an error is thrown.
try
If something errors, then do something else
Example
<- try(remDr$findElement(using = "xpath", current '//*[contains(concat( " ", @class, " " ), concat( " ", "product-price-value", " " ))]'), silent = T) #If error : current price is NA if(class(current) =='try-error'){ <- NA currentp[i] else { } # do stuff }
tryCatch
Run the main code, but if it “catches” an error, then the secondary code (the workaround) will run.
Example: Basic
for (r in 1:nrow(res)) { cat(r, "\n") <- get_wikitext(res$film[r], res$year[r]) tmp_wikitext # skip if get_wikitext fails if (is.na(tmp_wikitext)) next if (length(tmp_wikitext) == 0) next # give the text to openai <- tryCatch( tmp_chat get_results(client, tmp_wikitext), error = \(x) NA ) # if openai returned a dict of 2 if (length(tmp_chat) == 2) { $writer[r] <- tmp_chat$writer res$producer[r] <- tmp_chat$producer res } }
get_results
is called during each iteration, and if there’s an error a NA is returned.
Example
<- function(n1, n2) { pct_difference_error_handling # Try the main code tryCatch(pct_diff <- (n1-n2)/n1, # If you find an error, use this code instead error = return( cat( 'The difference between', as.integer(n1), 'and', as.integer(n2), 'is', as.integer(n1)-as.integer(n2)), 'which is', (100*(as.integer(n1)-as.integer(n2))/as.integer(n1), '% of', n1 )#cat ),# finally = print('Code ended') # optional #trycatch )# If no error happens, return this statement return ( cat('The difference between', n1, 'and', n2, 'is', n1-n2, ', which is', pct_diff*100, '% of', n1) ) }
- Assumes the error will be the user enters a string instead of a numeric. If errors, converts string to numeric and calcs.
- “finally” - This argument will always run, regardless if the try block raises an error or not. So it could be a completion message or a summary, for example.
Models
reformulate
- Create formula sytax programmatically# Creating a formula using reformulate() <- reformulate(c("hp", "cyl"), response = "mpg") formula # Fitting a linear regression model <- lm(formula, data = mtcars) model formula##> mpg ~ hp + cyl
- Can also use
as.formula
- Can also use
DF2formula
- Turns the column names from a data frame into a formula. The first column will become the outcome variable, and the rest will be used as predictorsDF2formula(Orange) #> Tree ~ age + circumference
formula
- Provides a way of extracting formulae which have been included in other objects|> prep() |> formula() rec_obj
- Where “rec_obj” is a tidymodels recipe object
Data from Model Object
model$model
return the data dataframedeparse(model$call$data)
gives you that name of your data object as a string.model$call$data
gives you the data as an unevaluated symbol;
eval(model$call$data)
gives you back the original data object, if it is available in the current environment.