Base R

Misc

Magrittr + base
```
mtcars %>% {plot(.$hp, .$mpg)}
mtcars %$% plot(hp, mpg)
```
- By wrapping the RHS in curly braces, we can override the rule where the LHS is passed to the first argument ## Options {#sec-baser-opts .unnumbered}
Remove scientific notation
```
options(scipen = 999)
```
Wide and long printing tibbles
```
# in .Rprofile
makeActiveBinding(".wide", function() { print(.Last.value, width = Inf) }, .GlobalEnv)
```
- After printing a tibble, if you want to see it in wide, then just type .wide + ENTER.
- Can have similar bindings for `.long` and `.full`.
Heredocs - Powerful feature in various programming languages that allow you to define a block of text within the code, preserving line breaks, indentation, and other whitespace.
```
text <- r"(
This is a
multiline string
in R)"

cat(text)
```

User Defined Functions

Anonymous (aka lambda) functions: \(x) {} (> R 4.1)

function(x) {
  x[which.max(x$mpg), ]
}
# equivalent to the above
\(x) {
  x[which.max(x$mpg), ]
}

Define and call an anonymous function at the same time

n <- c(1:10)
moose <- (\(x) x+1)(n)
moose
#> [1]  2  3  4  5  6  7  8  9 10 11

Dots (…)

Misc
- {ellipsis}: Functions for testing functions with dots so they fail loudly
- {rlang} dynamic dots: article
  - Splice arguments saved in a list with the splice operator, !!! .
  - Inject names with glue syntax on the left-hand side of := .

User Defined Functions

moose <- function(...) {
    dots <- list(...)
    dots_names <- names(dots)
    if (is.null(dots_names) || "" %in% dots_names {
        stop("All arguments must be named")
    }
}

Nested Functions

f02 <- function(...){
  vv <- list(...)
  print(vv)
}
f01 <- function(...){
  f02(b = 2,...)
}

f01(a=1,c=3)
#> $b
#> [1] 2
#> 
#> $a
#> [1] 1
#> 
#> $c
#> [1] 3

Subset dots values

add2 <- function(...) {
    ..1 + ..2
}
add2(3, 0.14)
# 3.14

Subset dots dynamically: ...elt(n)
- Set a value to n and get back the value of that argument
Number of arguments in … : ...length()

Functions

do.call - allows you to call other functions by constructing the function call as a list
- Args
  - what – Either a function or a non-empty character string naming the function to be called
  - args – A list of arguments to the function call. The names attribute of args gives the argument names
  - quote – A logical value indicating whether to quote the arguments
  - envir – An environment within which to evaluate the call. This will be most useful if what is a character string and the arguments are symbols or quoted expressions
- Example: Apply function to list of vectors
```
vectors <- list(c(1, 2, 3), c(4, 5, 6), c(7, 8, 9))
combined_matrix <- do.call(rbind, vectors)

combined_matrix
##      [,1] [,2] [,3]
## [1,]    1    2    3
## [2,]    4    5    6
## [3,]    7    8    9
```
- Example: Apply multiple functions
```
data_frames <- list(
  data.frame(a = 1:3), 
  data.frame(a = 4:6), 
  data.frame(a = 7:9)
  )
mean_results <- do.call(
  rbind, 
  lapply(data_frames, function(df) mean(df$a))
  )

mean_results
##      [,1]
## [1,]    2
## [2,]    5
## [3,]    8
```
  - First the mean is calculated for column a of each df using lapply
    - lapply is supplying the data for do.call in the required format, which is a list or character vector.
  - Second the results are combined into a matrix with rbind
sink - used to divert R output to an external connection.
- Use Cases: exporting data to a file, logging R output, or debugging R code.
- Args
  - file: The name of the file to which R output will be diverted. If file is NULL, then R output will be diverted to the console.
  - append: A logical value indicating whether R output should be appended to the file (TRUE) or overwritten (FALSE). The default value is FALSE.
  - type: A character string. Either the output stream or the messages stream. The name will be partially match so can be abbreviated.
  - split: logical: if TRUE, output will be sent to the new sink and the current output stream, like the Unix program tee.
- Example: Logging output of code to file
```
sink("r_output.log")      # Redirect output to this file
# Your R code goes here
sink()                    # Turn off redirection
```
  - output file could also have an extension like “.txt”
- Example: Debugging
```
sink("my_function.log")   # Redirect output to this file
my_function()
sink()                    # Turn off redirection
```
- Example: Appending output to a file
```
sink("output.txt", append = TRUE)  # Append output to the existing file
cat("Additional text\n")  # Append custom text
plain text
sink()  # Turn off redirection
```

pmin and pmax

Find the element-wise maximum and minimum values across vectors in R

Example

vec1 <- c(3, 9, 2, 6)
vec2 <- c(7, 1, 8, 4)
pmax(vec1, vec2)
#> [1] 7 9 8 6
pmin(vec1, vec2)
#> [1] 3 1 2 4

Example: With NAs

data1 <- c(7, 3, NA, 12)
data2 <- c(9, NA, 5, 8)
pmax(data1, data2, na.rm = TRUE)
#> [1] 9 3 5 12

switch

Example:

switch(parallel,
         windows = "snow" -> para_proc,
         other = "multicore" -> para_proc,
         no = "no" -> para_proc,
         stop(sprintf("%s is not one of the 3 possible parallel argument values. See documentation.", parallel)))

parallel is the function argument. If it doesn’t match one of the 3 values, then an error is thrown.
If the argument value is matched, then the quoted value is stored in para_proc

dynGet

Looks for objects in the environment of a function.
When an object from the outer function is an input for a function nested around 3 layers deep or more, it may not be found by that most inner function. dynGet allows that function to find the object in the outer frame
Arguments
- minframe: Integer specifying the minimal frame number to look into (i.e. how far back to look for the object)
- inherits: Should the enclosing frames of the environment be searched?

Example:

1function(args) {
  if (method == "kj") {
      ncv_list <- purrr::map2(grid$dat, 
                              grid$repeats, 
                              function(dat, reps) {
         rsample::nested_cv(dat,
                            outside = vfold_cv(v = 10, 
                                               repeats = dynGet("reps")),
                            inside = bootstraps(times = 25))
      })
  }
}

2function(data) {
    if (chk::vld_used(...)) {
        dots <- list(...)
        init_boot_args <-
          list(data = dynGet("data"),
               stat_fun = cles_boot, # internal function
               group_variables = group_variables,
               paired = paired)
        get_boot_args <-
          append(init_boot_args,
                 dots)
    }
    cles_booted <-
      do.call(
        get_boot_ci,
        get_boot_args
      )
}

1: Example from Nested Cross-Validation Comparison
2: Example from {ebtools::cles}

match.arg

Partially matches a function’s argument values to list of choices. If the value doesn’t match the choices, then an error is thrown

Example:

keep_input <- "input_le"
keep_input_val <- 
  match.arg(keep_input,
            choices = c("input_lags",
                        "input_leads",
                        "both"),
            several.ok = FALSE)
keep_input_val
#> [1] "input_leads"

several.ok = FALSE says only 1 match is allowed otherwise an error is thrown.
The error message is pretty informative btw.

match.fun

Example

f <- function(a,b) {
  a + b
}
g <- function(a,b,c) {
  (a + b) * c
}
h <- function(d,e) {
  d - e
}
yolo <- function(FUN, ...) {
  FUN <- match.fun(FUN)
  params <- list(...)
  FUN_formals <- formals(FUN)
  idx <- names(params) %in% names(FUN)
  do.call(FUN, params[idx])
}
yolo(h, d = 2, e = 3)
#> -1

Pipe

Benefits of base pipe
- Magrittr pipe is bloated with special features which may make it slower than the base pipe
- If not using tidyverse, it’s one less dependency (maybe one day it will be deprecated in tidyverse)

Base pipe with base and anonymous functions

# verbosely
mtcars |> (function(.) plot(.$hp, .$mpg))()
# using the anonymous function shortcut, emulating the dot syntax
mtcars |> (\(.) plot(.$hp, .$mpg))()
# or if you prefer x to .
mtcars |> (\(x) plot(x$hp, x$mpg))()
# or if you prefer to be explicit with argument names
mtcars |> (\(data) plot(data$hp, data$mpg))()

Using “_” placeholder:
- mtcars |> lm(mpg ~ disp, data = _)
- mtcars |> lm(mpg ~ disp, data = _) |> _$coef

Base pipe .[ ] hack

wiki |>
  read_html() |>
  html_nodes("table") |>
  (\(.) .[[2]])() |>
  html_table(fill = TRUE) |>
  clean_names()
# instead of
djia <- wiki %>%
  read_html() %>%
  html_nodes("table") %>%
  .[[2]] %>%
  html_table(fill = TRUE) %>%
  clean_names()

Magrittr, base pipe differences
- magrittr: %>% allows you change the placement with a . placeholder.
  - base: R 4.2.0 added a _ placeholder to the base pipe, with one additional restriction: the argument has to be named
- magrittr: With %>% you can use . on the left-hand side of operators like “$", \[\[, \[ and use in multiple arguments (e.g. df %>% {split(.$x, .$y))
  - base: can hack this by using anonymous function
    - see Base pipe with base and anonymous functions above
    - see Base pipe .[ ] hack above
- magrittr: %>% allows you to drop the parentheses when calling a function with no other arguments (e.g. dat %>% distinct)
  - base: |> always requires the parentheses. (e.g. dat |> distinct())
- magrittr: %>% allows you to start a pipe with . to create a function rather than immediately executing the pipe

Purrr with base pipe

data_list |>
  map(\(x) clean_names(x))
# instead of
data_list %>%
  map( ~.x %>% clean_names)
# with split
star |>
  split(~variable) |>
  map_df(\(.) hedg_g(., reading ~ value), .id = "variable")
# instead of
star %>%
  split(.$variable) %>%
  map_df(. %>% hedg_g(., reading ~ value), .id = "variable")

Strings

Resources
- From base R - base R equivalents to {stringr} functions

sprintf

x <- 123.456               # Create example data

sprintf("%f", x)           # sprintf with default specification
#> [1] "123.456000"

sprintf("%.10f", x)        # sprintf with ten decimal places
#> [1] "123.4560000000"

sprintf("%.2f", x)         # sprintf with two rounded decimal places
#> [1] "123.46"

sprintf("%1.0f", x)        # sprintf without decimal places
#> [1] "123"

sprintf("%10.0f", x)       # sprintf with space before number
#> [1] "       123"

sprintf("%10.1f", x)       # Space before number & decimal places
#> [1] "     123.5"

sprintf("%-15f", x)        # Space on right side
#> [1] "123.456000     "

sprintf("%+f", x)          # Print plus sign before number
#> [1] "+123.456000"

sprintf("%e", x)           # Exponential notation
#> [1] "1.234560e+02"

sprintf("%E", x)           # Exponential with upper case E
#> [1] "1.234560E+02"

sprintf("%g", x)           # sprintf without decimal zeros
#> [1] "123.456"

sprintf("%g", 1e10 * x)    # Scientific notation
#> [1] "1.23456e+12"

sprintf("%.13g", 1e10 * x) # Fixed decimal zeros
#> [1] "1234560000000"

paste0(sprintf("%f", x),   # Print %-sign at the end of number
       "%")
#> [1] "123.456000%"

sprintf("Let's create %1.0f more complex example %1.0f you.", 1, 4)
#> [1] "Let's create 1 more complex example 4 you."

str2lang - Allows you to turn plain text into code.

growth_rate <- "circumference / age"
class(str2lang(growth_rate))
#> [1] "call"

Example: Basic

eval(str2lang("2 + 2"))
#> [1] 4

eval(str2lang("x <- 3"))
x
#> [1] 3

Example: Run formula against a df

growth_rate <- "circumference / age"
with(Orange, eval(str2lang(growth_rate)))

#>   [1] 0.25423729 0.11983471 0.13102410 0.11454183 0.09748172 0.10349854
#>   [7] 0.09165613 0.27966102 0.14256198 0.16716867 0.15537849 0.13972380
#>  [13] 0.14795918 0.12831858 0.25423729 0.10537190 0.11295181 0.10756972
#>  [19] 0.09341998 0.10131195 0.08849558 0.27118644 0.12809917 0.16867470
#>  [25] 0.16633466 0.14541024 0.15233236 0.13527181 0.25423729 0.10123967
#>  [31] 0.12198795 0.12450199 0.11535337 0.12682216 0.11188369

Conditionals

&& and || are intended for use solely with scalars, they return a single logical value.
- Since they always return a scalar logical, you should use && and || in your if/while conditional expressions (when needed). If an & or | is used, you may end up with a non-scalar vector inside if (…) {} and R will throw an error.
& and | work with multivalued vectors, they return a vector whose length matches their input arguments.
Alternative way of negating a condition or set of conditions: if (!(condition))
- Makes it less readable IMO, but maybe for a complicated set of conditions if makes more sense in your head to do it this way
- Example
```
if (!(nr == nrow(iris) || (nr == nrow(iris) - 2))) {print("moose")}
```

Using else if

if (condition1) {
  expr1
} else if (condition2) {
  expr2
} else {
  expr3
}

stopifnot

pred_fn <- function(steps_forward, newdata) {
  stopifnot(steps_forward >= 1)
  stopifnot(nrow(newdata) == 1)
  model_f = model_map[[steps_forward]]
  # apply the model to the last "before the test period" row to get
  # the k-steps_forward prediction
  as.numeric(predict(model_f, newdata = newdata))
}

%||%

Collapse operator which acts like:
```
`%||%` <- function(x, y) {
   if (is_null(x)) y else x
}
```
- Says if the first (left-hand) input x is NULL, return y. If x is not NULL, return the input

Use Cases

Determine whether a function argument is NULL

github_remote <- 
  function(repo, username = NULL, ...) {
    meta <- parse_git_repo(repo)
    meta$username <- username %||%
      getOption("github.user") %||%
      stop("Unknown username")
  }

Within the print argument collapse

library(rlang)

add_commas <- function(x) {
  if (length(x) <= 1) {
    collapse_arg <- NULL
  } else {
    collapse_arg <- ", "
  }
  print(paste0(x, collapse = collapse_arg %||% ""))
}

add_commas(c("apples"))
[1] "apples"
add_commas(c("apples", "bananas"))
[1] "apples, bananas"

Ordering Columns and Sorting Rows

Ascending:

df[with(df, order(col2)), ]
# or
df[order(df$col2), ]
# or
sort_by(df, df$col2)

col2 is the column used to sort the df by

Descending: df[with(df, order(-col2)), ]
By Multiple Columns
- Sequentially: sort_by(df, df$col1, df$col2, df$col3)
  - Says sort by col1 then col2 then col3
- Descending then Ascending: df[with(df, order(-col2, id)), ]

Change position of columns

# Reorder column by index manually
df2 <- df[, c(5, 1, 4, 2, 3)]
df3 <- df[, c(1, 5, 2:4)]
# Reorder column by name manually
new_order = c("emp_id","name","superior_emp_id","dept_id","dept_branch_id")
df2 <- df[, new_order]

Set Operations

Unique values in A that are not in B

a <- c("thing", "object")
b <- c("thing", "gift")

unique(a[!(a %in% b)])
#> [1] "object"

setdiff(a, b)

setdiff is slower

Unique values of the two vectors combined
```
unique(c(a, b))
#> [1] "thing"  "object" "gift"

union(a, b)
```
- union is just a wrapper for unique

Subsetting

Lists and Vectors

Removing Rows

# Remove specific value from vector
x[!x == 'A']

# Remove multiple values by list
x[!x %in% c('A', 'D', 'E')]

# Using setdiff
setdiff(x, c('A','D','E'))

# Remove elements by index
x[-c(1,2,5)]

# Using which
x[-which(x %in% c('D','E') )]

# Remove elements by name
x <- c(C1='A',C2='B',C3='C',C4='E',C5='G')
x[!names(x) %in% c('C1','C2')]

Dataframes

Remove specific Rows
```
df <- df[-c(25, 3, 62), ]
```

Remove column by name

df <- df[, which(names(df) == "col_name")]
df <- subset(df, select = -c(col_name))
df <- df[, !names(df) %in% c("col1", "col2"), drop = FALSE]

Filter and Select

df <- subset(df, subset = col1 > 56, select = c(col2, col3))

Joins

Inner join: inner <- merge(flights, weather, by = mergeCols)
Left (outer) join: left <- merge(flights, weather, by = mergeCols, all.x = TRUE)
Right (outer) join: right <- merge(flights, weather, by = mergeCols, all.y = TRUE)
Full (outer) join: full <- merge(flights, weather, by = mergeCols, all = TRUE)
Cross Join (Cartesian product): cross <- merge(flights, weather, by = NULL)
Natural join: natural <- merge(flights, weather)

Error Handling

stop

Example:

switch(parallel,
       windows = "snow" -> para_proc,
       other = "multicore" -> para_proc,
       no = "no" -> para_proc,
       stop(sprintf("%s is not one of the 3 possible parallel argument values. See documentation.", parallel)))

parallel is the function argument. If it doesn’t match one of the 3 values, then an error is thrown.

try

If something errors, then do something else

Example

current <- try(remDr$findElement(using = "xpath",
                                '//*[contains(concat( " ", @class, " " ),
                                concat( " ", "product-price-value", " " ))]'),
                                silent = T)
#If error : current price is NA
if(class(current) =='try-error'){
    currentp[i] <- NA
} else {
    # do stuff
}

tryCatch

Run the main code, but if it “catches” an error, then the secondary code (the workaround) will run.

Example: Basic

for (r in 1:nrow(res)) {
  cat(r, "\n")

  tmp_wikitext <- get_wikitext(res$film[r], res$year[r])

  # skip if get_wikitext fails
  if (is.na(tmp_wikitext)) next
  if (length(tmp_wikitext) == 0) next

  # give the text to openai
  tmp_chat <- tryCatch(
    get_results(client, tmp_wikitext),
    error = \(x) NA
  )

  # if openai returned a dict of 2
  if (length(tmp_chat) == 2) {
    res$writer[r] <- tmp_chat$writer
    res$producer[r] <- tmp_chat$producer
  }
}

get_results is called during each iteration, and if there’s an error a NA is returned.

Example

pct_difference_error_handling <- function(n1, n2) {
# Try the main code
  tryCatch(pct_diff <- (n1-n2)/n1,
        # If you find an error, use this code instead
          error = return(
            cat( 'The difference between', as.integer(n1), 'and', as.integer(n2), 'is',
                  (as.integer(n1)-as.integer(n2)), 'which is',
                  100*(as.integer(n1)-as.integer(n2))/as.integer(n1),
                  '% of', n1 )#cat
            ),
          # finally = print('Code ended') # optional
          )#trycatch
  # If no error happens, return this statement
  return ( cat('The difference between', n1, 'and', n2, 'is', n1-n2,
              ', which is', pct_diff*100, '% of', n1) )
}

Assumes the error will be the user enters a string instead of a numeric. If errors, converts string to numeric and calcs.
“finally” - This argument will always run, regardless if the try block raises an error or not. So it could be a completion message or a summary, for example.

Models

reformulate - Create formula sytax programmatically

# Creating a formula using reformulate()
formula <- reformulate(c("hp", "cyl"), response = "mpg")

# Fitting a linear regression model
model <- lm(formula, data = mtcars)

formula
##> mpg ~ hp + cyl

Can also use as.formula

DF2formula - Turns the column names from a data frame into a formula. The first column will become the outcome variable, and the rest will be used as predictors
```
DF2formula(Orange)
#> Tree ~ age + circumference
```
formula - Provides a way of extracting formulae which have been included in other objects
```
rec_obj |> prep() |> formula()
```
- Where “rec_obj” is a tidymodels recipe object
Data from Model Object
- model$model return the data dataframe
- deparse(model$call$data) gives you that name of your data object as a string.
  - model$call$data gives you the data as an unevaluated symbol;
- eval(model$call$data) gives you back the original data object, if it is available in the current environment.