General
Misc
- Resources
- Tools
- direnv - Augments existing shells with a new feature that can load and unload environment variables depending on the current directory
ctrl-r
shell command history search- McFly - intelligent command history search engine that takes into account your working directory and the context of recently executed commands. McFly’s suggestions are prioritized in real time with a small neural network
- Path to a folder that’s above root folder:
- 1 level up:
../desired-folder
- 2 levels up:
../../desired-folder
- 1 level up:
R
Misc
The “shebang” line starting
#!
allows a script to be run directly from the command line without explicitly passing it throughRscript
orr
. It’s not required but is a helpful convenience on Unix-like systems.#!/usr/bin/env -S Rscript --vanilla
- The shebang attempts to use
/usr/bin/env
to locate theRscript
executable and then passes –vanilla as an argument toRscript
- The shebang attempts to use
Alter line endings when writing an R script on Windows but executing it on Linux
Windows uses \r\n (carriage return + newline) as line endings.
Linux/Unix uses \n (newline only) as line endings.
Command that makes the script compatible with Linux systems
sed -i 's/\r//' my-script.R
sed
: Stream editor for filtering and transforming text.- -i: Edits the file “in place.”
- ‘s/\\r//’: Removes (s///) all occurrences of \r (carriage return).
- filename: The file to process.
Resources
- Invoking R from the command line for using
R
andR CMD
- Invoking R from the command line for using
Packages
Rscript
need to be onPATH
Run R (default version) in the shell:
RS# or rig run
RS
might require {rig} to be installed- To run a specific R version that’s already installed:
R-4.2
Run an R script:
"path\to\my-script.R" Rscript # or -f <script-file> rig run # or +x my-script.R chmod ./my-script.R
Evaluate an R expression:
-e <expression> Rscript # or -e <expression> rig run
Run an R app:
rig run <path-to-app>
- Plumber APIs
- Shiny apps
- Quarto documents (also with embedded Shiny apps)
- Rmd documents (also with embedded Shiny apps)
- Static web sites
Make an R script pipeable (From link)
parallel "echo 'zipping bin {}'; cat chunked/*_bin_{}_*.csv | ./upload_as_rds.R '$S3_DEST'/chr_'$DESIRED_CHR'_bin_{}.rds"
#!/usr/bin/env Rscript library(readr) library(aws.s3) # Read first command line argument <- commandArgs(trailingOnly = TRUE)[1] data_destination <- list(SNP_Name = 'c', ...) data_cols s3saveRDS( read_csv( file("stdin"), col_names = names(data_cols), col_types = data_cols ),object = data_destination )
- By passing
readr::read_csv
the function,file("stdin")
, it loads the data piped to the R script into a dataframe, which then gets written as an .rds file directly to s3 using {aws.s3}.
- By passing
Killing a process
system("taskkill /im java.exe /f", intern=FALSE, ignore.stdout=FALSE)
Starting a process in the background
# start MLflow server ::exec_background("mlflow server") sys
Check file sizes in a directory
file.info(Sys.glob("*.csv"))["size"] #> size #> Data8277.csv 857672667 #> DimenLookupAge8277.csv 2720 #> DimenLookupArea8277.csv 65400 #> DimenLookupEthnic8277.csv 272 #> DimenLookupSex8277.csv 74 #> DimenLookupYear8277.csv 67
- First one is about 800MB
Read first ten lines of a file
cat(paste(readLines("Data8277.csv", n=10), collapse="\n")) #> Year,Age,Ethnic,Sex,Area,count #> 2018,000,1,1,01,795 #> 2018,000,1,1,02,5067 #> 2018,000,1,1,03,2229 #> 2018,000,1,1,04,1356 #> 2018,000,1,1,05,180 #> 2018,000,1,1,06,738 #> 2018,000,1,1,07,630 #> 2018,000,1,1,08,1188 #> 2018,000,1,1,09,2157
Delete an opened file in the same R session
You **MUST** unlink it before any kind of manipulation of object
- I think this works because readr loads files lazily by default
Example:
<- "COVID-19_Historical_Data_by_County.csv" wisc_csv_filename <- file.path(Sys.getenv("USERPROFILE"), "Downloads") download_location <- file.path(download_location, wisc_csv_filename) wisc_file_path <- readr::read_csv(wisc_file_path) wisc_tests_new # key part, must unlink before any kind of code interaction # supposedly need recursive = TRUE for Windows, but I didn't need it # Throws an error (hence safely) but still works <- purrr::safely(unlink) safe_unlink safe_unlink(wisc_tests_new) # manipulate obj <- wisc_tests_new %>% wisc_tests_clean ::clean_names() %>% janitorselect(date, geo, county = name, negative, positive) %>% filter(geo == "County") %>% mutate(date = lubridate::as_date(date)) %>% select(-geo) # clean-up ::file_delete(wisc_file_path) fs
Find out which process is locking or using a file
- Open Resource Monitor, which can be found
- By searching for Resource Monitor or resmon.exe in the start menu, or
- As a button on the Performance tab in your Task Manager
- Go to the CPU tab
- Use the search field in the Associated Handles section
- type the name of file in the search field and it’ll search automatically
- 35548
- Open Resource Monitor, which can be found
Python
Notes from Python’s many command-line utilities
- Lists and describes all CLI utilities that are available through Python’s standard library
Packages
Linux utilities through Python in CLI
Command Purpose More python3.12 -m uuid
Like uuidgen
CLI utilityDocs python3.12 -m sqlite3
Like sqlite3
CLI utilityDocs python -m zipfile
Like zip
&unzip
CLI utilitiesDocs python -m gzip
Like gzip
&gunzip
CLI utilitiesDocs python -m tarfile
Like the tar
CLI utilityDocs python -m base64
Like the base64
CLI utilitypython -m ftplib
Like the ftp
utilitypython -m smtplib
Like the sendmail
utilitypython -m poplib
Like using curl
to read emailpython -m imaplib
Like using curl
to read emailpython -m telnetlib
Like the telnet
utilityuuid
andsqlite3
require version 3.12 or above.
Code Utilities
Command Purpose More python -m pip
Install third-party Python packages Docs python -m venv
Create a virtual environment Docs python -m pdb
Run the Python Debugger Docs python -m unittest
Run unittest
tests in a directoryDocs python -m pydoc
Show documentation for given string Docs python -m doctest
Run doctests for a given Python file Docs python -m ensurepip
Install pip
if it’s not installedDocs python -m idlelib
Launch Python’s IDLE graphical REPL Docs python -m zipapp
Turn Python module into runnable ZIP Docs python -m compileall
Pre-compile Python files to bytecode Docs
AWK
Misc
- Resources
Print first few rows of columns 1 and 2
-F, '{print $1,$2}' adult_t.csv|head awk
Extract every 4th line starting from the line 1 (i.e. 1, 5, 9, 13, …)
(NR%4==1)' file.txt awk '
Filter lines where no of hours/ week (13th column) > 98
-F, ‘$13 > 98’ adult_t.csv|head awk
Filter lines with “Doctorate” and print first 3 columns
/Doctorate/{print $1, $2, $3}' adult_t.csv awk '
Random sample 8% of the total lines from a .csv (keeps header)
{srand()} !/^$/ {if(rand()<=0.08||FNR==1) print > "rand.samp.csv"}' big_fn.csv 'BEGIN
Decompresses, chunks, sorts, and writes back to S3 (From link)
# Let S3 use as many threads as it wants default.s3.max_concurrent_requests 50 aws configure set for chunk_file in $(aws s3 ls $DATA_LOC | awk '{print $4}' | grep 'chr'$DESIRED_CHR'.csv') ; do ://$batch_loc$chunk_file - | aws s3 cp s3-dc | pigz --block 100M --pipe \ parallel "awk -F '\t' '{print \$1\",...\"$30\">\"chunked/{#}_chr\"\$15\".csv\"}'" # Combine all the parallel process chunks to single files / | ls chunked-d '_' -f 2 | cut -u | sort /*_{} | sort -k5 -n -S 80% -t, | aws s3 cp - '$s3_dest'/batch_'$batch_num'_{}' parallel 'cat chunked # Clean up intermediate data /* rm chunked done
Vim
- Command-line based text editor
- Common Usage
- Edit text files while in CLI
- Logging into a remote machine and need to make a code change there. vim is a standard program and therefore usually available on any machine you work on.
- When running
git commit
, by default git opens vim for writing a commit message. So at the very least you’ll want to know how to write, save, and close a file.
- Resources
- 2 modes: Navigation Mode; Edit Mode
- When Vim is launched you’re in Navigation mode
- Press i to start edit mode, in which you can make changes to the file.
- Press Esc key to leave edit mode and go back to navigation mode.
- Commands
x
deletes a characterdd
deletes an entire rowb
(back) goes to the previous wordn
(next) goes to the next word:wq
saves your changes and closes the file:q!
ignores your changes and closes the fileh
is \(\leftarrow\)j
is \(\downarrow\)k
is \(\uparrow\)l
(i.e. lower L) is \(\rightarrow\)