minimal make A minimal tutorial on make
I would argue that the most important tool for reproducible research is not Sweave or knitr but GNU make.
Consider, for example, all of the files associated with a manuscript. In the simplest case, I would have an R script for each figure plus a LaTeX file for the main text. And then a BibTeX file for the references.
Compiling the final PDF is a bit of work:
- Run each R script through R to produce the relevant figure.
- Run latex and then bibtex and then latex a couple of more times.
And the R scripts need to be run before latex is, and only if they’ve changed.
A simple example
GNU make makes this easy. In your
directory for the manuscript, you create a text file called Makefile
that looks something like the following (here using
pdflatex).
mypaper.pdf: mypaper.bib mypaper.tex Figs/fig1.pdf Figs/fig2.pdf
pdflatex mypaper
bibtex mypaper
pdflatex mypaper
pdflatex mypaper
Figs/fig1.pdf: R/fig1.R
cd R;R CMD BATCH fig1.R
Figs/fig2.pdf: R/fig2.R
cd R;R CMD BATCH fig2.R
Each batch of lines indicates a file to be created (the target), the files it depends on (the prerequisites), and then a set of commands needed to construct the target from the dependent files. Note that the lines with the commands must start with a tab character (not spaces).
Another great feature: in the example above, you’d only build
fig1.pdf
when fig1.R
changed. And note that the dependencies
propagate. If you change fig1.R
, then fig1.pdf
will change, and
so mypaper.pdf
will be re-built.
One oddity: if you need to change directories to run a command, do
the cd
on the same line as the related command. The following
would not work:
### this doesn't work ###
Figs/fig1.pdf: R/fig1.R
cd R
R CMD BATCH fig1.R
You can, however, use \
for a continuation line, line so:
### this works ###
Figs/fig1.pdf: R/fig1.R
cd R;\
R CMD BATCH fig1.R
Note that you still need to use the semicolon (;
).
Using GNU make
You probably already have GNU make installed on your computer. Type
make --version
in a terminal/shell to see. (On Windows,
go here to download make.)
To use make:
- Go into the the directory for your project.
- Create the
Makefile
file. - Every time you want to build the project, type
make
. - In the example above, if you want to build
fig1.pdf
without buildingmypaper.pdf
, just typemake fig1.pdf
.
Frills
You can go a long way with just simple make files as above, specifying the target files, their dependencies, and the commands to create them. But there are a lot of frills you can add, to save some typing.
Here are some of the options that I use. (See the make documentation for further details.)
Variables
If you’ll be repeating the same piece of code multiple times, you might want to define a variable.
For example, you might want to run R with the flag --vanilla
. You
could then define a variable R_OPTS
:
R_OPTS=--vanilla
You refer to this variable as $(R_OPTS)
(or ${R_OPTS}
; either
parentheses or curly braces is allowed), so in the R commands you
would use something like
cd R;R CMD BATCH $(R_OPTS) fig1.R
An advantage of this is that you just need to type out the options you want once; if you change your mind about the R options you want to use, you just have to change them in the one place.
For example, I actually like to use the following:
R_OPTS=--no-save --no-restore --no-init-file --no-site-file
This is like --vanilla
but without --no-environ
(which I need
because I use the .Renviron
file to define R_LIBS
, to say that I
have R packages defined in an alternative directory).
Automatic variables
There are a bunch of automatic variables that you can use to save yourself a lot of typing. Here are the ones that I use most:
$@
the file name of the target$<
the name of the first prerequisite (i.e., dependency)$^
the names of all prerequisites (i.e., dependencies)$(@D)
the directory part of the target$(@F)
the file part of the target$(<D)
the directory part of the first prerequisite (i.e., dependency)$(<F)
the file part of the first prerequisite (i.e., dependency)
For example, in our simple example, we could simplify the lines
Figs/fig1.pdf: R/fig1.R
cd R;R CMD BATCH fig1.R
We could instead write
Figs/fig1.pdf: R/fig1.R
cd $(<D);R CMD BATCH $(<F)
The automatic variable $(<D)
will take the value of the directory of
the first prerequisite, R
in this case. $(<F)
will take value of
the file part of the first prerequisite, fig1.R
in this case.
Okay, that’s not really a simplification. There doesn’t seem to be much advantage to this, unless perhaps the directory were an obnoxiously long string and we wanted to avoid having to type it twice. The main advantage comes in the next section.
Pattern rules
If a number of files are to be built in the same way, you may want to
use a
pattern rule.
The key idea is that you can use the symbol %
as a wildcard, to be
expanded to any string of text.
For example, our two figures are being built in basically the same
way. We could simplify the example by including one set of lines
covering both fig1.pdf
and fig2.pdf
:
Figs/%.pdf: R/%.R
cd $(<D);R CMD BATCH $(<F)
This saves typing and makes the file easier to maintain and extend. If
you want to add a third figure, you just add it as another dependency
(i.e., prerequisite) for mypaper.pdf
.
Our example, with the frills
Adding all of this together, here’s what our example Makefile
will look like.
R_OPTS=--vanilla
mypaper.pdf: mypaper.bib mypaper.tex Figs/fig1.pdf Figs/fig2.pdf
pdflatex mypaper
bibtex mypaper
pdflatex mypaper
pdflatex mypaper
Figs/%.pdf: R/%.R
cd $(<D);R CMD BATCH $(R_OPTS) $(<F)
The advantage of the added frills: less typing, and it’s easier to extend to include additional figures. The disadvantage: it’s harder for others who are less familiar with GNU Make to understand what it’s doing.
More complicated examples
There are complicated Makefiles all over the place. Poke around github and study them.
Here are some of my own examples:
-
Makefile for my AIL probabilities paper
-
Makefile for a talk on QTL mapping for function-valued traits.
-
Makefile for my R/qtlcharts package.
And here are some examples from Mike Bostock:
Also look at the Makefile for Yihui Xie’s knitr package for R.
Also of interest is targets, a make-like pipeline for R.
Resources
-
O’Reilly Managing projects with GNU make book (part of the Open Books project)
-
targets, the successor to drake, is an R package providing an R-focused version of make.
-
makepipe another R package providing an R-focused version of make, more minimalistic than targets.
The source for this minimal tutorial is on github.
Also see my tutorials on git/github, knitr, R packages, making a web site with GitHub Pages, data organization, and reproducible research.