R/qtl2 is a reimplementation of the QTL analysis software R/qtl, to better handle high-dimensional data and complex cross designs.
There have been a number of big changes. We summarize some of them here in order to assist long-time R/qtl users in their transition to R/qtl2.
The input file format has been completely changed. R/qtl2 allows just a single format, with data split across multiple CSV files, and with a control file (in YAML or JSON format).
For more information about the new file format, see the related vignette, as well as sample input data files and the qtl2data repository.
R/qtl cross objects can be converted to the new format with the qtl2::convert2cross2()
function, and so one can continue to use the R/qtl function read.cross()
to read in data and then convert them to the new format:
library(qtl)
mycross <- read.cross("csv", file="mydata.csv")
library(qtl2)
mycross <- convert2cross2(mycross)
The data structures used by R/qtl2 are completely different than those used by R/qtl. The new data structures are no simpler than before, but they tend to be a bit “flatter” (that is, less deeply nested). Most everything is a “list”, and we’re using fewer “attributes” and instead including such things as components at the top level.
The names of all of the main functions have changed, mostly with periods replaced by underscores. For example:
read.cross()
→ read_cross2()
calc.genoprob()
→ calc_genoprob()
sim.geno()
→ sim_geno()
argmax.geno()
→ viterbi()
scanone()
→ scan1()
nmar()
→ n_mar()
phenames()
→ pheno_names()
markernames()
→ marker_names()
We’re no longer storing intermediate calculations as part of the cross object. For example, calc_genoprob()
, to calculate QTL genotype probabilities given the observed marker data, returns a list with the probabilities. scan1()
, to perform a genome scan, takes these probabilities plus a phenotype matrix.
Individual identifiers are now used to ensure the alignment of individuals, for example between the QTL genotype probabilties and the phenotype data.
For example, in scan1()
, to perform a genome scan, it’s necessary that the phenotype data carry the corresponding individual IDs as row names.
As a result, when subsetting out, say, females, when calling scan1
, you only need to subset one of the inputs, and the rest will be automatically subset for you.
out_all <- scan1(probs, pheno, kinship)
out_f <- scan1(probs, pheno[sex=="female",], kinship)
In R/qtl, when subsetting a cross object, you can use square brackets, like this:
library(qtl)
mycross[chr, ind]
But the order of those two arguments was not very well chosen. It’s better to think of individuals as rows and chromosomes as column, and so put the individuals first.
And so in R/qtl2, we’ve switched the order of arguments: in bracket subsetting of cross objects, individuals now come first.
library(qtl2)
mycross[ind, chr]
In R/qtl, when you perform a single-QTL scan of the X chromosome, it identifies appropriate covariates to include, to avoid spurious linkage due to sex and cross-direction differences.
In R/qtl2, you need to provide such covariates yourself via the Xcovar
argument to scan1()
, There is a function get_x_covar()
, for deriving these, but you’re a bit more on your own.