Course summary

A minimal standard for data analysis and other scientific computations is that they be reproducible: that the code and data are assembled in a way so that another group can re-create all of the results (e.g., the figures in a paper). The importance of such reproducibility is now widely recognized, but it is still not so widely practiced as it should be, in large part because many computational scientists (and particularly statisticians) have not fully adopted the required tools for reproducible research.

In this course, we will discuss general principles for reproducible research but will focus primarily on the use of relevant tools (particularly make, git, and knitr), with the goal that the students leave the course ready and willing to ensure that all aspects of their computational research (software, data analyses, papers, presentations, posters) are reproducible.


Course number: BMI 826-003

Instructor: Karl Broman, 2126 Genetics-Biotechnology

Prerequisite: Some knowledge of R.

Lectures: Fridays, 11:00–11:50am, 2321 Engineering Hall (except for 15 Apr, which will be in 1289 Comp Sci) No class on 1 Apr

Office hours: Wed 2:30–3:30pm (or by appointment)

Schedule (with links to lecture notes)

Resources and further reading

Recommended books
    C Gandrud, Reproducible research with R and RStudio
    Y Xie, Dynamic documents with R and knitr

Project: There will be one small project, developed over the course of the semester:

  • Implement something in R (e.g., simulation + fancy plot).
  • Develop it in a git repository on github or bitbucket.
  • Make it an R package.
  • Use knitr to make a vignette.
  • Use testthat to include a unit test.
  • Make sure it passes R CMD check.

Sources on github: