Privacy/Security
Misc
Packages
- {askpass} - Password Entry Utilities for R, Git, and SSH
- Cross-platform utilities for prompting the user for credentials or a passphrase, for example to authenticate with a server or read a protected key.
- Includes native programs for MacOS and Windows, hence no ‘tcltk’ is required.
- Password entry can be invoked in two different ways:
- Directly from R via the askpass() function
- Indirectly as password-entry back-end for ‘ssh-agent’ or ‘git-credential’ via the SSH_ASKPASS and GIT_ASKPASS environment variables.
- The user can be prompted for credentials or a passphrase if needed when R calls out to git or ssh.
- {deident} (JOSS) - A framework for the replicable removal of personally identifiable data (PID) in data sets.
- {diffpriv} - Implements the formal framework of differential privacy: differentially-private mechanisms can safely release to untrusted third parties: statistics computed, models fit, or arbitrary structures derived on privacy-sensitive data.
- {encryptr} - Encrypt and decrypt data frame or tibble columns using the strong RSA public/private keys
- {encryptedRmd} - Encrypt Html Reports Using ‘Libsodium’
- {{metasyn}} - For generating synthetic tabular data with a focus on privacy
- {randomForestSRC} - Fast Unified Random Forests for Survival, Regression, and Classification (RF-SRC)
- Anonymous random forests for data privacy
- {rmonocypher} - Simple encryption of R objects using a strong modern technique.
- {sdcmicro} (CRAN)- For the generation of anonymized (micro)data, i.e. for the creation of public- and scientific-use files.
- {simPop} (Vignette) - Tools and methods to simulate populations for surveys based on auxiliary data. The tools include model-based methods, calibration and combinatorial optimization algorithms
- {staticryptR} - Encrypt HTML Files Using staticrypt
- {tangles} - Anonymisation of Spatial Point Patterns and Grids
- {trustmebro} - Provides functions that make it easy to inspect various subject-generated ID codes (SGIC) for plausibility.
- Helps with inspecting other common identifiers, ensuring that your data stays clean and reliable. Beyond plausibility checks, trustmebro offers a few tools for smooth data import and convenient recoding.
- SGIC - Often consists of a set of stable participant characteristics that enable a respondent’s answers to be matched across multiple points in time, while preserving participant anonymity (i.e., no personal identifiers such as name, date of birth, or address are collected) (source)
- {vvbitwarden} - Provides functions to **securely retrieve secrets from a ‘Bitwarden** Secrets Manager’ vault using the ‘Bitwarden CLI’, enabling secret and configuration management within R packages and workflows{
- {xxhashlite} - Very fast hash functions using xxHash
- Also see Simulation, Data
- Tools
- rclone - A command-line program to manage files on cloud storage. It is a feature-rich alternative to cloud vendors’ web storage interfaces.
- Over 70 cloud storage products support rclone including S3 object stores, business & consumer file storage services, as well as standard transfer protocols.
- Able to add the encryption to files and the keys will be on your local device
- staticrypt - Password protect a static HTML page, decrypted in-browser in JS with no dependency. No server logic needed. (Useful for HTML reports) (Also see R package)
- hrbrmstr: “Please use long,”complex” *passphrases* for this tool/docs. These documents are susceptible to brute-force attacks, so you gotta make it hard for the attacker. It uses solid encryption practices (AES-256 encryption; PBKDF2 for password hashing w/decent iterations); but — unlike public/private key-based message exchanges — once the password leaks, there’s no access revocation”
- VeraCrypt - Encrypt files before cloud upload for extra security
- Does not encrypt single files directly—instead, it creates an encrypted volume (like a virtual disk), and you can store any files inside that volume.
- Offers strong encryption (AES, Serpent, Twofish, etc.)
- For cloud storage, you’d create and manage an encrypted container locally, and then upload that container to the cloud.
- Cloud Workflow (ChatGPT)
- Create a VeraCrypt Encrypted Container
- Open VeraCrypt and click
Create Volume
. - Select “Create an encrypted file container”.
- Choose “Standard VeraCrypt volume”.
- Set a file name and location for your container (e.g.,
secure_docs.vc
). - Choose your preferred encryption algorithm (e.g., AES).
- Specify the size of the volume (e.g., 500 MB).
- Create a strong password.
- Choose a file system (e.g., FAT or exFAT for compatibility) and format the volume.
- Open VeraCrypt and click
- Mount the VeraCrypt Volume and Add Files
- Open VeraCrypt and select an unused drive letter (e.g.,
Z:
). - Mount your encrypted volume file (e.g.,
secure_docs.vc
). - Copy or move your sensitive files (e.g., PDFs) into the mounted drive.
- Dismount the volume when done — this re-locks the encryption.
- Open VeraCrypt and select an unused drive letter (e.g.,
- Upload the Encrypted Container to the Cloud
- Upload the
.vc
file to a cloud service like Dropbox, Google Drive, OneDrive, or a remote server via SFTP/rsync. - Alternatively, store the
.vc
file as a binary object (BLOB) in a database. - Note: The entire container is encrypted. No need to encrypt files individually.
- Upload the
- Access or Update the Files Later
- Download the
.vc
file from your cloud storage or database. - Mount the volume with VeraCrypt using your password.
- View, update, or add new files as needed.
- Dismount the volume and re-upload if you made changes.
- Download the
- Create a VeraCrypt Encrypted Container
- rclone - A command-line program to manage files on cloud storage. It is a feature-rich alternative to cloud vendors’ web storage interfaces.
Tags
Tag sensitive information in dataframes
names(df) 1] "date" "first_name" "card_number" "payment" [# assign pii tags attr(df, "pii") <- c("name", "ccn", "transaction")
- Personally Identifiable Information (PII)
Tag dataframes with the names of regulations that are applicable
attr(df, "regs") <- c("CCPA", "GDPR", "GLBA")
- CCPA is the privacy regulation for California
- GDPR is the privacy regulation for the European Union
- GLBA is the financial regulation for the United States
- Needed because df has credit card and financial information
- Saving objects as .rds files preserves tags