GNU Make

Misc

Benefits
- Detects changes in files (source $\rightarrow$ binary)
- Manages dependencies
- Manages default values for variables much easier than Bash
- Allows to build in parallel
- OS detection
- The binary is only ~16kB in size
- Available on any OS
Notes from
- article
- project
  - Good example of an advanced makefile for a practical data science project
- Video
  - Goes through the differect components of executing a python project with make (e.g. testing, clean-up, defining variables, setting up the virtual environment, etc.)
Resources
- Docs - All on one page so you can just ctrl + f
- Nice little tutorial
- Another tutorial
  - Haven’t read it, but looks pretty thorough
- Docs for variable types
- Docs for functions
Project orchestration system that only builds steps that have changed
- {drake}/{targets} are based on this system
Assuming that you’ve named the file “makefile” or “Makefile” or something like that, simply typing make at the command line while inside your project’s directory will execute the build process.
- make -B --recon shows all the commands used to build the project (i.e. kind of like a DAG)
- make -B rebuilds the entire project even if no targets have changed

General

Syntax

targets: prerequisites
command
command
command

The prerequisites are also file names, separated by spaces. These files need to exist before the commands for the target are run. These are also called dependencies.

.PHONY - Helpful to avoid conflicts between target names and file names

Considered best practice to use

Example

.PHONY: install
install:
        python3.9 -m venv venv && source venv/bin/activat && pip install -r requirements-dev.txt

.PHONY: dbsetup
dbsetup:
        source venv/bin/activate && python -m youtube.db

.PHONY: lint
lint:
        flake8 emojisearcher tests

Rules

Makefiles are made-up of rules. Each rule is a code chunk.
Example
```
# Inside Makefile
data/raw/NIPS_1987-2015.csv:
curl -o $@ https://archive.ics.uci.edu/ml/machine-learning-databases/00371/NIPS_1987-2015.csv
```
- Downloads a file
- “data/raw/NIPS_1987-2015.csv” is the file path and target for this rule.
- $@ is a Make automatic variable that fills in the target name for the file name arg in the curl command.
- There is no prerequisite required for this command. So, this syntax is just target : command.

Targets

The targets are file names, separated by spaces. Typically, there is only one per rule.
Dummy Targets - A target with no commands directly associated with it (it is sort of a meta-target).
- Useful if you want to only rebuild part of the project
- Example: if you have a couple of scripts that involve data acquisition and cleaning, another few that involve data analysis, and a few that involve the presentation of results (paper, plot), then you might define a dummy for each of them.
```
all: data model paper
data: raw.csv
model: model.rds
paper: plot.png paper.pdf
```
  - Executing make paper in the CLI and in the project directory will call the commands that built “plot.png” and “paper.pdf”

Variables

Expanded Variables
- Values are accessed using $(x) or ${x})
- “Recursively Expanded” Variables are defined using = operator
```
x = hello
y = $(x)
# Both $(x) and $(y) will now yield "hello"
x = world
# Both $(x) and $(y) will now yield "world"
```
  - Any functions referenced in the definition will be executed every time the variable is expanded
  - Can cause infitinite loops
- “Simply Expanded” Variables are defined using the := or ::= operator
```
x := hello
y := $(x)
# Both $(x) and $(y) will now yield "hello"
x := world
# $(x) will now yield "world", and $(y) will yield "hello"
```
Automatic Variables
- $@ is a Make variable that “expands” into the (first?) target name
- $^ is a Make variable that “expands” into all of the prerequisites
- $< is a Make variable that “expands” into the first prerequisite
- $? is a Make variable that “expands” into any prerequisites which have a time stamp more recent than the target
- % is a wildcard; looks for any targets in the makefile that matches it’s pattern or files in the project directory (also see abstraction section below)
```
foo%.o: %.c
    $(CC) $(CFLAGS) -c $< -o $@
```
  - Will match target lib/foobar.o, with:
    - Stem ($*): lib/bar
    - Target name ($@): lib/foobar.o
    - Prerequisites ($<, $^): lib/foobar.c
- $*is a Make variable that “expands” the “stem” (i.e. value) of wildcard

Commands

The commands are a series of steps typically used to make the target(s). These need to start with a tab character, not spaces.
See command used to generate a target
```
# CLI
>> make --recon <target>
```
Update a specific target
```
# CLI
>> make data/raw/NIPS_1987-2015.csv
```
- This will re-run the rule that created the file. In this case, it’s the curl command in the “Download a file” section
- If you run this command again, you’ll receive this message: make:data/raw/NIPS_1987-2015.csv’ is up to date.`

Execute a script

Example
- Makefile
```
data/processed/NIPS_1987-2015.csv : src/data/transpose.py data/raw/NIPS_1987-2015.csv
    $(PYTHON_INTERPRETER) $^ $@
```
  - “$(PYTHON_INTERPRETER)” is an environment variable set in the Make file for python3 interpreter
  - The function in this example has 2 args: input file path and output file path
  - $^ fills in the prerequisites which takes care of <script> <arg1>
  - $@ fills in the target name for <arg2>
- CLI
```
>> make --recon data/processed/NIPS_1987-2015.csv
python3 src/data/transpose.py data/raw/NIPS_1987-2015.csv data/processed/NIPS_1987-2015.csv
```
  - make --recon shows us the translation of command line in the Make file
  - Basic format for executing a python script in the cli is python3 <script> <arg1> <arg2> ... <argn>

Abstraction

Example
```
all: models/10_topics.png models/20_topics.png

models/%_topics.png : src/models/fit_lda.py data/processed/NIPS_1987-2015.csv src/models/prodlda.py
    $(PYTHON_INTERPRETER) $< $(word 2, $^) $@ --topics $*
```
- % matches both targets in the dummy target, “all” and takes the stem 10 and 20
  - So this rule runs twice: once with the value 10 then with the value 20.
- $< is an autmatic variable that expands into “src/models/fit_lda.py”
- Built-in Make text function, $(word n,text) , returns the nth word of text. (see Misc >> Resources for function docs)
  - The legitimate values of n start from 1. If n is bigger than the number of words in text, the value is empty
  - In this example, it’s used to return the 2nd prerequisite to become the 1st argument of the fit_lda.py script
    - 1st arg is the input file path
- $@ is an automatic variable that expands into the target name which becomes the 2nd argument of the fit_lda.py script
  - 2nd arg is the output file path
- --topics is a function option for fit_ldy.py which is defined in the script using decorators from {{click}}
  - $* is the stem of the wildcard which is a numeric in this case and provides the value the topics flag
Example Clean text files in data directory
```
data/processed/%.txt: data/raw/%.txt
sed 1,20d $^ > $@
```
- Takes all text files in the raw directory, removes some rows (sed 1,20d), outputs (>) the processed file into target with the same file name ($@)

Scripts

Notes from
- How I stopped worrying and loved Makefiles
Python project using pip
```
.PHONY: requirements test

.venv:
    python3 -m venv .venv

requirements:
    source .venv/bin/activate && \
        python3 -m pip install -r requirements.txt && \
        python3 -m pip install pytest

test: .venv requirements dev-requirements
    source .venv/bin/activate && \
        pytest
```
- .PHONY: requirements test: Declares requirements and test as phony targets to ensure they are always executed regardless of file existence.
- .venv:: Creates a Python virtual environment named .venv if it doesn’t already exist.
- requirements:: Installs Python packages listed in requirements.txt into the virtual environment created earlier. Additionally, it installs the pytest package globally.
- test: .venv requirements dev-requirements: Sets up dependencies for testing, including the virtual environment and specified requirements. Then, it activates the virtual environment and runs the tests using pytest.

Python project using poetry

.PHONY: requirements test

requirements:
    poetry install

test: requirements
    poetry run pytest

Terraform project

.PHONY: init
SHELL=/bin/bash

# those variables you should initialize outside of this script
# and export, Make will just set then based on what you will
# have set in your environment. You can use for eg. `aws sts`
AWS_ACCESS_KEY_ID ?=
AWS_SECRET_ACCESS_KEY ?=
AWS_REGION ?= "us-west-2"

# dev by default
ENVIRONMENT ?= dev
STATE_FILE_BUCKET ?= s3-bucket-$(AWS_ACCESS_KEY_ID)-$(ENVIRONMENT)-terraform-state
STATE_FILE_KEY ?= state/some_service/$(ENVIRONMENT)/terraform.tfstate

# make some variable available in Terraform
export TF_VAR_something ?= something1
export TF_VAR_something_else ?= something-else

.terraform:
    terraform init \
        -reconfigure \
        -backend-config='key=$(STATE_FILE_KEY)' \
        -backend-config='bucket=$(STATE_FILE_BUCKET)' \
        -var-file=environments/$(ENVIRONMENT)/variables.tfvars \
        -out terraform.plan

    terraform get

# this will switch Terraform version to the one that your project needs
# https://github.com/tfutils/tfenv
init: .terraform
    tfenv install

plan: init
    terraform plan

apply: plan
    terraform apply \
        -auto-approve \
        terraform.plan

destroy:
    terraform destroy \
        -auto-approve \
        -var-file=environments/$(ENVIRONMENT)/variables.tfvars

dev-plan: export AWS_ACCESS_KEY_ID=dev-key
dev-plan: plan

dev-apply: export AWS_ACCESS_KEY_ID=dev-key
dev-apply: apply

dev-destroy: export AWS_ACCESS_KEY_ID=dev-key
dev-destroy: destroy

prod-plan: export AWS_ACCESS_KEY_ID=prod-key
prod-plan: plan

prod-apply: export AWS_ACCESS_KEY_ID=prod-key
prod-apply: apply

clean:
    @rm -rf .terraform/modules
    @rm -f terraform.*

Fetch latest version of the modules
prod and dev are production and development branches respectively
This is a partial backend configuration. Allows you to use same codebase for all the environments. All customizations have to be listed as variables in variables.tfvars files. It can be easily extended to support 4 or 6 environments
Execute:
```
make dev-plan
make dev-apply
```

Expect this file structure

$ tree example/
.
├── main.tf
├── variables.tf
├── provider.tf
├── backend.tf
├── outputs.tf
├── ...
├── environments/
│   ├── dev
│   │   ├── variables.tfars
│   ├── prod/
│   │   ├── variables.tfars
│   ├── .../

backend.tf

terraform {
    backend "s3" {
        region = "us-west-2"
        encrypt = true
    }
}