Differences between R/qtl and R/qtl2

R/qtl2 is a reimplementation of the QTL analysis software R/qtl, to better handle high-dimensional data and complex cross designs.

There have been a number of big changes. We summarize some of them here in order to assist long-time R/qtl users in their transition to R/qtl2.

New data file formats

The input file format has been completely changed. R/qtl2 allows just a single format, with data split across multiple CSV files, and with a control file (in YAML or JSON format).

For more information about the new file format, see the related vignette, as well as sample input data files and the qtl2data repository.

R/qtl cross objects can be converted to the new format with the qtl2geno::convert2cross2() function, and so one can continue to use the R/qtl function read.cross() to read in data and then convert them to the new format:

mycross <- read.cross("csv", file="mydata.csv")
mycross <- convert2cross2(mycross)

New data structures

The data structures used by R/qtl2 are completely different than those used by R/qtl. The new data structures are no simpler than before, but they tend to be a bit “flatter” (that is, less deeply nested). Most everything is a “list”, and we’re using fewer “attributes” and instead including such things as components at the top level.

Split into multiple packages

R/qtl2 is not a single package as R/qtl is. Rather it’s split into multiple packages:

We have in mind that, for high-dimensional data, the QTL genotype probability calculations with qtl2geno will be performed once and saved, and that the genome scans with qtl2scan will be performed in “batch” (e.g., on a cluster) and also saved, and that interactive analyses will mostly be in the data visualizations with qtl2plot.

It can be confusing to remember which function is in which package. For this reason, we created an additional, largely empty package qtl2. If you load qtl2 with library(qtl2), the three main packages, qtl2geno, qtl2scan, and qtl2plot, will all be loaded.

New functions names

The names of all of the main functions have changed, mostly with periods replaced by underscores. For example:

Treatment of intermediate calculations

We’re no longer storing intermediate calculations as part of the cross object. For example, calc_genoprob(), to calculate QTL genotype probabilities given the observed marker data, returns a list with the probabilities. scan1(), to perform a genome scan, takes these probabilities plus a phenotype matrix.

Use of individual identifiers for aligning data

Individual identifiers are now used to ensure the alignment of individuals, for example between the QTL genotype probabilties and the phenotype data.

For example, in scan1(), to perform a genome scan, it’s necessary that the phenotype data carry the corresponding individual IDs as row names.

As a result, when subsetting out, say, females, when calling scan1, you only need to subset one of the inputs, and the rest will be automatically subset for you.

out_all <- scan1(probs, pheno, kinship)
out_f <- scan1(probs, pheno[sex=="female",], kinship)

Order of arguments when subsetting cross objects

In R/qtl, when subsetting a cross object, you can use square brackets, like this:

mycross[chr, ind]

But the order of those two arguments was not very well chosen. It’s better to think of individuals as rows and chromosomes as column, and so put the individuals first.

And so in R/qtl2, we’ve switched the order of arguments: in bracket subsetting of cross objects, individuals now come first.

mycross[ind, chr]

In scan of X chr, need to provide special covariates

In R/qtl, when you perform a single-QTL scan of the X chromosome, it identifies appropriate covariates to include, to avoid spurious linkage due to sex and cross-direction differences.

In R/qtl2, you need to provide such covariates yourself via the Xcovar argument to scan1(), There is a function in qtl2geno, get_x_covar, for deriving these, but you’re a bit more on your own.