Knitr overview
Writing reports
I’ll start with a bit of motivating blather. Skip to the next section when you get bored.
Statisticians write a lot of reports. You do a bunch of analyses, create a bunch of figures and tables, and you want to describe what you’ve done to a collaborator.
When I was first starting out, I’d create a bunch of figures and tables and email them to my collaborator with a description of the findings in the body of the email. That was cumbersome for me and for the collaborator. (“Which figure are we talking about, again?”)
I moved towards writing formal reports in LaTeX and sending my collaborator a PDF. But that was a lot of work, and if I later wanted to re-run things (e.g., if additional data were added), it was a real hassle.
Sweave was a big help. Your LaTeX document could contain chunks of R code, and when processed through Sweave, the R code would be replaced by the results of the analysis or by the figures generated. Then, if new data were added or analyses modified, it was easy to re-compile the document and get an updated PDF.
But getting LaTeX-based documents to look nice can be a lot of work,
particularly if you are as compulsive as I am. The biggest
problem, in my experience, is dealing with page breaks and the
positioning of figures. Sweave had a number of annoyances, too. The
biggest one, for me, was figuring out how to reference that critical
Sweave.sty
file; I ended up just putting a local copy in every
project directory. Yeah, I can be an idiot.
Knitr is Sweave reborn. Yihui Xie took the Sweave idea and started over, removing the annoyances and making something even better. With knitr, you can mix basically any kind of text with basically any kind of code. This is a really big deal, as lots of people who should be writing these sorts of literate programming documents (e.g., many statistics graduate students) are completely turned off by LaTeX and just skip the whole business.
Now, I deliver my informal reports to collaborators as html documents that can be viewed in a browser. A big advantage to this is that I don’t have to worry about page breaks. For example, I can have very tall figures, with say 30 panels. That makes it easy to show the results in detail, and you don’t have to worry about how to get figures to fit nicely into a page.
But I’m not writing html for this. I use
Markdown or
AsciiDoc. These are two systems
for writing simple, readable text, with the sort of marks that you’d
use in an email message (for example, **bold**
for bold or
_italics_
for italics), that can be easily converted to html. And
both Markdown and Asciidoc allow the figures to be embedded within the
html document, so you only need to email the one file to your
collaborator.
Technically, I’m not using Markdown but rather R Markdown, a variant of Markdown developed by the folks at RStudio.
Enough blather; now how does this work?
Code chunks
The basic idea in knitr (and sweave before that, and literate programming more generally) is that your regular text document will be interrupted by chunks of code delimited in a special way.
Here’s an example with R Markdown:
We see that this is an intercross with `r nind(sug)` individuals.
There are `r nphe(sug)` phenotypes, and genotype data at
`r totmar(sug)` markers across the `r nchr(sug)` autosomes. The genotype
data is quite complete.
Use `plot()` to get a summary plot of the data.
```{r summary-plot, fig.height=8}
plot(sug)
```
The backticks (`
) indicate code. The bits like `r nind(sug)`
are indicating R code. When processed by knitr, they’ll be evaluated
and replaced by the result. So the first paragraph would end up
as something like this:
We see that this is an intercross with 163 individuals. There are 6 phenotypes, and genotype data at 93 markers across the 19 autosomes. The genotype data is quite complete.
In the second paragraph, `plot()`
would just appear as plot()
(that is, rendered like code, in a monospace font).
The most useful bit is the last “paragraph”. When the document is run
through knitr, plot(sug)
will be evaluated, producing a figure,
which will then be inserted at that point in the final document.
In R Markdown, code chunks start with a line like
```{r chunk-name, options}
The chunk name (here, chunk-name
) is optional; if included it needs
to be distinct from that for another chunk. Then there are a bunch of
chunk options, here fig.height=8
indicates the height of the figure.
A code chunk ends with a line that is just three backticks:
```
In knitr, different types of text (e.g., R Markdown, AsciiDoc, LaTeX) have different ways of delimiting the code chunks (as well as the in-line bits of code). This is because knitr is basically doing a search-and-replace for these chunks and depending on the type of text, different patterns will be easier to find.
In AsciiDoc, the above would be written as follows:
We see that this is an intercross with +r nind(sug)+ individuals.
There are +r nphe(sug)+ phenotypes, and genotype data at
+r totmar(sug)+ markers across the +r nchr(sug)+ autosomes. The genotype
data is quite complete.
Use +plot()+ to get a summary plot of the data.
//begin.rcode summary-plot, fig.height=8
plot(sug)
//end.rcode
In LaTeX, it would be:
We see that this is an intercross with \Sexpr{nind(sug)} individuals.
There are \Sexpr{nphe(sug)} phenotypes, and genotype data at
\Sexpr{totmar(sug)} markers across the \Sexpr{nchr(sug)} autosomes. The genotype
data is quite complete.
Use {\tt plot()} to get a summary plot of the data.
<<summary-plot, fig.height=8>>=
plot(sug)
@
Yihui would probably yell at me for the {\tt }
bit, but that seemed easiest for me.
A knitr document will often have many code chunks. They are evaluated in order, in a single R session, and the state of the various variables in one code chunk are preserved in future chunks.
The examples above are taken from longer examples that you can find
here, including an
R Markdown example (and
its html product), an
AsciiDoc example (and
its html product), and a
LaTeX example (and
its pdf product).
All of these examples require installation of my R/qtl
package (sorry!). In R, type install.packages("qtl")
.
Compiling the document
Once you’ve created a knitr document (e.g., R Markdown with chunks of R code), how do you use knitr to process it, to create the final document?
If you’re creating an R Markdown document in RStudio, it’s dead easy: there’s a button. And it’s a particularly cute little button, with a ball of yarn and a knitting needle.
More generally, you’d call R and use the render()
function in the
rmarkdown package.
I prefer to create a
GNU make file, like
this one, for the examples I’d mentioned above.
(See also my minimal make tutorial.)
What next?
That’s knitr in a knutshell: chunks of R code inserted within a text document. When processed by knitr, the R code chunks are executed and results and/or figures inserted.
Now go to my pages about Markdown and Knitr with R Markdown.
Even if you’re mostly interested in Asciidoc or LaTeX, start with the Markdown and R Markdown page, as I’ll give the full details about knitr there and will only explain the extra stuff in the other two pages. Plus, I think you’ll find Knitr with R Markdown useful, at least for short, informal reports.
If you’re an experienced Sweave user, you might look at my Knitr from Sweave page, or Yihui’s page, transition from Sweave to knitr.