What we’ve discussed so far is sufficient for a project to be fully reproducible:

But scripts will define what you did and not necessarily why you did it. And it’s better to include the motivation, such as the different figures that you looked at. This is particularly important for the data cleaning process; for example, why did you omit those particular samples?

My preference is to replace most of those scripts with reproducible reports (using R Markdown and knitr, or IPython notebooks). Such a report is a mixture of code and text: the code that does the work and your text that describes what you’re doing and why you’re doing it.

With either R Markdown/knitr or IPython notebooks, chunks of code will be run and figures created, and a nicely formatted report will be produced. Rather just running a script, you’ll compile the report. Compiling the report will do the work that the script would have done, but will also produce a report that describes the what and the why, with figures and tables that support your decisions.

To learn more about knitr and R Markdown, see my knitr in a knutshell tutorial.

To learn more about IPython, see the documentation and tutorials at its website.

The construction of such reports is definitely more work than writing a simple script, but the product may save you a lot of time and effort down the road. For example, imagine that a reviewer (or coauthor) asks, “Why did you omit those samples?” You won’t be left scratching your head; you’ll have a document that explains why.


Now go to the page about turning repeated code into functions.