Privacy/Security

Misc

  • {xxhashlite} - Very fast hash functions using xxHash
  • {{metasyn}} - For generating synthetic tabular data with a focus on privacy
  • {simPop} (Vignette) - Tools and methods to simulate populations for surveys based on auxiliary data. The tools include model-based methods, calibration and combinatorial optimization algorithms
  • {sdcmicro} (CRAN)- For the generation of anonymized (micro)data, i.e. for the creation of public- and scientific-use files.
  • {diffpriv} - Implements the formal framework of differential privacy: differentially-private mechanisms can safely release to untrusted third parties: statistics computed, models fit, or arbitrary structures derived on privacy-sensitive data.
  • {encryptr} - Encrypt and decrypt data frame or tibble columns using the strong RSA public/private keys
  • {encryptedRmd} - Encrypt Html Reports Using ‘Libsodium’
  • {randomForestSRC} - Fast Unified Random Forests for Survival, Regression, and Classification (RF-SRC)
    • Anonymous random forests for data privacy
  • {deident} (JOSS) - A framework for the replicable removal of personally identifiable data (PID) in data sets.
  • {rmonocypher} - Simple encryption of R objects using a strong modern technique.
  • Also see Simulation, Data
  • Tools
    • rclone - A command-line program to manage files on cloud storage. It is a feature-rich alternative to cloud vendors’ web storage interfaces.
      • Over 70 cloud storage products support rclone including S3 object stores, business & consumer file storage services, as well as standard transfer protocols.
      • Able to add the encryption to files and the keys will be on your local device
    • {staticrypt} - Password protect a static HTML page, decrypted in-browser in JS with no dependency. No server logic needed. (Useful for HTML reports)
      • hrbrmstr: “Please use long,”complex” *passphrases* for this tool/docs. These documents are susceptible to brute-force attacks, so you gotta make it hard for the attacker.

        It uses solid encryption practices (AES-256 encryption; PBKDF2 for password hashing w/decent iterations); but — unlike public/private key-based message exchanges — once the password leaks, there’s no access revocation”

    • VeraCrypt - Encrypt files before cloud upload for extra security

Tags

  • Tag sensitive information in dataframes

    names(df)
    [1] "date" "first_name" "card_number" "payment"
    # assign pii tags
    attr(df, "pii") <- c("name", "ccn", "transaction")
    • Personally Identifiable Information (PII)
  • Tag dataframes with the names of regulations that are applicable

    attr(df, "regs") <- c("CCPA", "GDPR", "GLBA")
    • CCPA is the privacy regulation for California
    • GDPR is the privacy regulation for the European Union
    • GLBA is the financial regulation for the United States
      • Needed because df has credit card and financial information
    • Saving objects as .rds files preserves tags

Hashing

  • {digest}
    • Hash Function
    • Apply Hash Function to PII Fields