Privacy/Security

Misc

  • Also see Simulation, Data
  • Packages
    • {xxhashlite} - Very fast hash functions using xxHash
    • {{metasyn}} - For generating synthetic tabular data with a focus on privacy
    • {simPop} (Vignette) - Tools and methods to simulate populations for surveys based on auxiliary data. The tools include model-based methods, calibration and combinatorial optimization algorithms
    • {sdcmicro} (CRAN)- For the generation of anonymized (micro)data, i.e. for the creation of public- and scientific-use files.
    • {diffpriv} - Implements the formal framework of differential privacy: differentially-private mechanisms can safely release to untrusted third parties: statistics computed, models fit, or arbitrary structures derived on privacy-sensitive data.
    • {encryptr} - Encrypt and decrypt data frame or tibble columns using the strong RSA public/private keys
    • {encryptedRmd} - Encrypt Html Reports Using ‘Libsodium’
    • {randomForestSRC} - Fast Unified Random Forests for Survival, Regression, and Classification (RF-SRC)
      • Anonymous random forests for data privacy
  • Tools
    • staticrypt - Password protect a static HTML page, decrypted in-browser in JS with no dependency. No server logic needed. (Useful for HTML reports)
      • hrbrmstr: “Please use long,”complex” *passphrases* for this tool/docs. These documents are susceptible to brute-force attacks, so you gotta make it hard for the attacker.

        It uses solid encryption practices (AES-256 encryption; PBKDF2 for password hashing w/decent iterations); but — unlike public/private key-based message exchanges — once the password leaks, there’s no access revocation”

    • VeraCrypt - Encrypt files before cloud upload for extra security

Tags

  • Tag sensitive information in dataframes

    names(df)
    [1] "date" "first_name" "card_number" "payment"
    # assign pii tags
    attr(df, "pii") <- c("name", "ccn", "transaction")
    • Personally Identifiable Information (PII)
  • Tag dataframes with the names of regulations that are applicable

    attr(df, "regs") <- c("CCPA", "GDPR", "GLBA")
    • CCPA is the privacy regulation for California
    • GDPR is the privacy regulation for the European Union
    • GLBA is the financial regulation for the United States
      • Needed because df has credit card and financial information
    • Saving objects as .rds files preserves tags

Hashing

  • {digest}
    • Hash Function
    • Apply Hash Function to PII Fields