GNU Make

Misc

  • Benefits
    • Detects changes in files (source \(\rightarrow\) binary)
    • Manages dependencies
    • Manages default values for variables much easier than Bash
    • Allows to build in parallel
    • OS detection
    • The binary is only ~16kB in size
    • Available on any OS
  • Notes from
    • article
    • project
      • Good example of an advanced makefile for a practical data science project
    • Video
      • Goes through the differect components of executing a python project with make (e.g. testing, clean-up, defining variables, setting up the virtual environment, etc.)
  • Resources
    • Docs - All on one page so you can just ctrl + f
    • Nice little tutorial
    • Another tutorial
      • Haven’t read it, but looks pretty thorough
    • Docs for variable types
    • Docs for functions
  • Project orchestration system that only builds steps that have changed
    • {drake}/{targets} are based on this system
  • Assuming that you’ve named the file “makefile” or “Makefile” or something like that, simply typing make at the command line while inside your project’s directory will execute the build process.
    • make -B --recon shows all the commands used to build the project (i.e. kind of like a DAG)
    • make -B rebuilds the entire project even if no targets have changed

General

  • Syntax

    targets: prerequisites
    command
    command
    command
  • The prerequisites are also file names, separated by spaces. These files need to exist before the commands for the target are run. These are also called dependencies.

  • .PHONY - Helpful to avoid conflicts between target names and file names

    • Considered best practice to use

    • Example

      .PHONY: install
      install:
              python3.9 -m venv venv && source venv/bin/activat && pip install -r requirements-dev.txt
      
      .PHONY: dbsetup
      dbsetup:
              source venv/bin/activate && python -m youtube.db
      
      .PHONY: lint
      lint:
              flake8 emojisearcher tests

Rules

  • Makefiles are made-up of rules. Each rule is a code chunk.

  • Example

    # Inside Makefile
    data/raw/NIPS_1987-2015.csv:
    curl -o $@ https://archive.ics.uci.edu/ml/machine-learning-databases/00371/NIPS_1987-2015.csv
    • Downloads a file
    • “data/raw/NIPS_1987-2015.csv” is the file path and target for this rule.
    • $@  is a Make automatic variable that fills in the target name for the file name arg in the curl command.
    • There is no prerequisite required for this command. So, this syntax is just target : command.

Targets

  • The targets are file names, separated by spaces. Typically, there is only one per rule.
  • Dummy Targets - A target with no commands directly associated with it (it is sort of a meta-target).
    • Useful if you want to only rebuild part of the project

    • Example: if you have a couple of scripts that involve data acquisition and cleaning, another few that involve data analysis, and a few that involve the presentation of results (paper, plot), then you might define a dummy for each of them.

      all: data model paper
      data: raw.csv
      model: model.rds
      paper: plot.png paper.pdf
      • Executing make paper in the CLI and in the project directory will call the commands that built “plot.png” and “paper.pdf”

Variables

  • Expanded Variables
    • Values are accessed using $(x) or ${x})

    • “Recursively Expanded” Variables are defined using = operator

      x = hello
      y = $(x)
      # Both $(x) and $(y) will now yield "hello"
      x = world
      # Both $(x) and $(y) will now yield "world"
      • Any functions referenced in the definition will be executed every time the variable is expanded
      • Can cause infitinite loops
    • “Simply Expanded” Variables are defined using the := or ::= operator

      x := hello
      y := $(x)
      # Both $(x) and $(y) will now yield "hello"
      x := world
      # $(x) will now yield "world", and $(y) will yield "hello"
  • Automatic Variables
    • $@ is a Make variable that “expands” into the (first?) target name

    • $^ is a Make variable that “expands” into all of the prerequisites

    • $< is a Make variable that “expands” into the first prerequisite

    • $? is a Make variable that “expands” into any prerequisites which have a time stamp more recent than the target

    • % is a wildcard; looks for any targets in the makefile that matches it’s pattern or files in the project directory (also see abstraction section below)

      foo%.o: %.c
          $(CC) $(CFLAGS) -c $< -o $@
      • Will match target lib/foobar.o, with:
        • Stem ($*): lib/bar
        • Target name ($@): lib/foobar.o
        • Prerequisites ($<, $^): lib/foobar.c
    • $*is a Make variable that “expands” the “stem” (i.e. value) of wildcard

Commands

  • The commands are a series of steps typically used to make the target(s). These need to start with a tab character, not spaces.

  • See command used to generate a target

    # CLI
    >> make --recon <target>
  • Update a specific target

    # CLI
    >> make data/raw/NIPS_1987-2015.csv
    • This will re-run the rule that created the file. In this case, it’s the curl command in the “Download a file” section
    • If you run this command again, you’ll receive this message: make:data/raw/NIPS_1987-2015.csv’ is up to date.`

Execute a script

  • Example
    • Makefile

      data/processed/NIPS_1987-2015.csv : src/data/transpose.py data/raw/NIPS_1987-2015.csv
          $(PYTHON_INTERPRETER) $^ $@
      • “$(PYTHON_INTERPRETER)” is an environment variable set in the Make file for python3 interpreter
      • The function in this example has 2 args: input file path and output file path
      • $^ fills in the prerequisites which takes care of <script> <arg1>
      • $@ fills in the target name for <arg2>
    • CLI

      >> make --recon data/processed/NIPS_1987-2015.csv
      python3 src/data/transpose.py data/raw/NIPS_1987-2015.csv data/processed/NIPS_1987-2015.csv
      • make --recon shows us the translation of command line in the Make file
      • Basic format for executing a python script in the cli is python3 <script> <arg1> <arg2> ... <argn>

Abstraction

  • Example

    all: models/10_topics.png models/20_topics.png
    
    models/%_topics.png : src/models/fit_lda.py data/processed/NIPS_1987-2015.csv src/models/prodlda.py
        $(PYTHON_INTERPRETER) $< $(word 2, $^) $@ --topics $*
    • % matches both targets in the dummy target, “all” and takes the stem 10 and 20
      • So this rule runs twice: once with the value 10 then with the value 20.
    • $< is an autmatic variable that expands into “src/models/fit_lda.py”
    • Built-in Make text function, $(word n,text) , returns the nth word of text. (see Misc >> Resources for function docs)
      • The legitimate values of n start from 1. If n is bigger than the number of words in text, the value is empty
      • In this example, it’s used to return the 2nd prerequisite to become the 1st argument of the fit_lda.py script
        • 1st arg is the input file path
    • $@ is an automatic variable that expands into the target name which becomes the 2nd argument of the fit_lda.py script
      • 2nd arg is the output file path
    • --topics is a function option for fit_ldy.py which is defined in the script using decorators from {{click}}
      • $* is the stem of the wildcard which is a numeric in this case and provides the value the topics flag
  • Example Clean text files in data directory

    data/processed/%.txt: data/raw/%.txt
    sed 1,20d $^ > $@
    • Takes all text files in the raw directory, removes some rows (sed 1,20d), outputs (>) the processed file into target with the same file name ($@)

Scripts

  • Notes from

  • Python project using pip

    .PHONY: requirements test
    
    .venv:
        python3 -m venv .venv
    
    requirements:
        source .venv/bin/activate && \
            python3 -m pip install -r requirements.txt && \
            python3 -m pip install pytest
    
    test: .venv requirements dev-requirements
        source .venv/bin/activate && \
            pytest
    • .PHONY: requirements test: Declares requirements and test as phony targets to ensure they are always executed regardless of file existence.
    • .venv:: Creates a Python virtual environment named .venv if it doesn’t already exist.
    • requirements:: Installs Python packages listed in requirements.txt into the virtual environment created earlier. Additionally, it installs the pytest package globally.
    • test: .venv requirements dev-requirements: Sets up dependencies for testing, including the virtual environment and specified requirements. Then, it activates the virtual environment and runs the tests using pytest.
  • Python project using poetry

    .PHONY: requirements test
    
    requirements:
        poetry install
    
    test: requirements
        poetry run pytest
  • Terraform project

    .PHONY: init
    SHELL=/bin/bash
    
    # those variables you should initialize outside of this script
    # and export, Make will just set then based on what you will
    # have set in your environment. You can use for eg. `aws sts`
    AWS_ACCESS_KEY_ID ?=
    AWS_SECRET_ACCESS_KEY ?=
    AWS_REGION ?= "us-west-2"
    
    # dev by default
    ENVIRONMENT ?= dev
    STATE_FILE_BUCKET ?= s3-bucket-$(AWS_ACCESS_KEY_ID)-$(ENVIRONMENT)-terraform-state
    STATE_FILE_KEY ?= state/some_service/$(ENVIRONMENT)/terraform.tfstate
    
    # make some variable available in Terraform
    export TF_VAR_something ?= something1
    export TF_VAR_something_else ?= something-else
    
    .terraform:
        terraform init \
            -reconfigure \
            -backend-config='key=$(STATE_FILE_KEY)' \
            -backend-config='bucket=$(STATE_FILE_BUCKET)' \
            -var-file=environments/$(ENVIRONMENT)/variables.tfvars \
            -out terraform.plan
    
        terraform get
    
    # this will switch Terraform version to the one that your project needs
    # https://github.com/tfutils/tfenv
    init: .terraform
        tfenv install
    
    plan: init
        terraform plan
    
    apply: plan
        terraform apply \
            -auto-approve \
            terraform.plan
    
    destroy:
        terraform destroy \
            -auto-approve \
            -var-file=environments/$(ENVIRONMENT)/variables.tfvars
    
    dev-plan: export AWS_ACCESS_KEY_ID=dev-key
    dev-plan: plan
    
    dev-apply: export AWS_ACCESS_KEY_ID=dev-key
    dev-apply: apply
    
    dev-destroy: export AWS_ACCESS_KEY_ID=dev-key
    dev-destroy: destroy
    
    prod-plan: export AWS_ACCESS_KEY_ID=prod-key
    prod-plan: plan
    
    prod-apply: export AWS_ACCESS_KEY_ID=prod-key
    prod-apply: apply
    
    clean:
        @rm -rf .terraform/modules
        @rm -f terraform.*
    • Fetch latest version of the modules

    • prod and dev are production and development branches respectively

    • This is a partial backend configuration. Allows you to use same codebase for all the environments. All customizations have to be listed as variables in variables.tfvars files. It can be easily extended to support 4 or 6 environments

    • Execute:

      make dev-plan
      make dev-apply
    • Expect this file structure

      $ tree example/
      .
      ├── main.tf
      ├── variables.tf
      ├── provider.tf
      ├── backend.tf
      ├── outputs.tf
      ├── ...
      ├── environments/
      │   ├── dev
      │   │   ├── variables.tfars
      │   ├── prod/
      │   │   ├── variables.tfars
      │   ├── .../
    • backend.tf

      terraform {
          backend "s3" {
              region = "us-west-2"
              encrypt = true
          }
      }