Performance compared to pylmm

R/lmmlite is a light implementation of the fit of linear mixed models for QTL analysis, following pylmm. It includes both plain R and C++ implementations (the latter using RcppEigen).

Here I seek to compare the speed of my R and C++ implementations to pylmm. For timings within R, I use the microbenchmark package.

The source for this vignette is at GitHub.

I’ll focus on the recla dataset that I include with the package. This is data on a set of diversity outcross mice, from Recla et al. (2014) and Logan et al. (2013). The data are originally from the QTL Archive/Mouse Phenotype Database and I’ve also placed them online in the format used by R/qtl2. I used R/qtl2 to calculate genotype probabilities from the MUGA SNP array data. The data include 261 mice, and form a list with three components: the estimated kinship matrix, a matrix with 26 phenotypes, and a covariate matrix that contains an intercept (all 1’s) and sex (0 for female and 1 for male).

library(lmmlite)
library(microbenchmark)
library(broman)
data(recla)

lmmlite

As we’ll see, about half of the time is spent on an initial eigen decomposition of the kinship matrix. Missing values in the phenotype data are a bit of an annoyance, as we’ll need to re-compute the eigen decomposition for each batch of phenotypes that has the same set of missing values, as we need to omit individuals with missing phenotypes from the analysis.

I first determine the batches of phenotypes with common missing data patterns.

batch_cols_by_missing <-
    function(mat)
{
    # pattern of missing data (as character string with 0's and 1's
    pat <- apply(is.na(mat), 2, function(a) paste(as.numeric(a), collapse=""))

    u <- unique(pat)

    result <- split(seq(along=pat), match(pat, u))
    names(result) <- NULL

    missing_pat <- lapply(strsplit(u, ""), function(a) which(a=="1"))
    attr(result, "missing_pattern") <- missing_pat

    result
}
batches <- batch_cols_by_missing(recla$pheno)

Now let’s compare the full analysis for the R and C++ implementations in lmmlite. I’ll run the code 10 times.

time_full <- microbenchmark(R={
    for(i in seq(along=batches)) {
        omit <- attr(batches, "missing_pattern")[[i]]
        keep <- is.na(match(1:nrow(recla$pheno), omit))
        eigenrot <- eigen_rotation(recla$kinship[keep,keep],
                                   recla$pheno[keep,batches[[i]]],
                                   recla$covar[keep,], use_cpp=FALSE)
        for(j in 1:ncol(eigenrot$y))
            res <- fitLMM(eigenrot$Kva, eigenrot$y[,j], eigenrot$X, reml=TRUE, use_cpp=FALSE)
    } },
               cpp={
    for(i in seq(along=batches)) {
        omit <- attr(batches, "missing_pattern")[[i]]
        keep <- is.na(match(1:nrow(recla$pheno), omit))
        eigenrot <- eigen_rotation(recla$kinship[keep,keep],
                                   recla$pheno[keep,batches[[i]]],
                                   recla$covar[keep,], use_cpp=TRUE)
        for(j in 1:ncol(eigenrot$y))
            res <- fitLMM(eigenrot$Kva, eigenrot$y[,j], eigenrot$X, reml=TRUE, use_cpp=TRUE)
    } }, times=10)
print(time_full, digits=4)

## Unit: milliseconds
##  expr   min    lq  mean median    uq   max neval cld
##     R 421.4 422.0 450.7  449.3 476.5 490.4    10   b
##   cpp 168.3 170.1 170.7  170.4 171.3 174.1    10  a

The average computation time for the R code is 451 milliseconds, while the C++ code takes 171 milliseconds, a speed-up of 2.6x.

Here’s just the code to do the initial eigen decomposition and the “rotation” of the phenotypes and covariates.

time_eigen <- microbenchmark(R={
    for(i in seq(along=batches)) {
        omit <- attr(batches, "missing_pattern")[[i]]
        keep <- is.na(match(1:nrow(recla$pheno), omit))
        eigenrot <- eigen_rotation(recla$kinship[keep,keep],
                                   recla$pheno[keep,batches[[i]]],
                                   recla$covar[keep,], use_cpp=FALSE)
    } },
               cpp={
    for(i in seq(along=batches)) {
        omit <- attr(batches, "missing_pattern")[[i]]
        keep <- is.na(match(1:nrow(recla$pheno), omit))
        eigenrot <- eigen_rotation(recla$kinship[keep,keep],
                                   recla$pheno[keep,batches[[i]]],
                                   recla$covar[keep,], use_cpp=TRUE)
    } }, times=10)
print(time_eigen, digits=4)

## Unit: milliseconds
##  expr   min    lq  mean median    uq   max neval cld
##     R 209.2 268.3 275.3  269.8 274.8 327.9    10   b
##   cpp 159.1 160.0 167.2  161.5 162.7 221.1    10  a

The mean computation time for the R code is 275 milliseconds, while the C++ code takes 167 milliseconds, a speed-up of 1.6x.

I’ll re-calculate the eigen decomposition for each batch, and save the results

eigenrot_cpp <- eigenrot_R <- vector("list", length(batches))
for(i in seq(along=batches)) {
    omit <- attr(batches, "missing_pattern")[[i]]
    keep <- is.na(match(1:nrow(recla$pheno), omit))
    eigenrot_R[[i]] <- eigen_rotation(recla$kinship[keep,keep],
                                    recla$pheno[keep,batches[[i]]],
                                    recla$covar[keep,], use_cpp=FALSE)
    eigenrot_cpp[[i]] <- eigen_rotation(recla$kinship[keep,keep],
                                    recla$pheno[keep,batches[[i]]],
                                    recla$covar[keep,], use_cpp=TRUE)
}

And now I’ll time the fit of the LMM.

time_llm <- microbenchmark(R={
    for(i in seq(along=batches)) {
        for(j in 1:ncol(eigenrot_R[[i]]$y))
            res <- fitLMM(eigenrot_R[[i]]$Kva, eigenrot_R[[i]]$y[,j], eigenrot_R[[i]]$X, reml=TRUE, use_cpp=FALSE)

    } },
               cpp={
    for(i in seq(along=batches)) {
        for(j in 1:ncol(eigenrot_cpp[[i]]$y))
            res <- fitLMM(eigenrot_cpp[[i]]$Kva, eigenrot_cpp[[i]]$y[,j], eigenrot_cpp[[i]]$X, reml=TRUE, use_cpp=TRUE)
    } }, times=10)
print(time_llm, digits=4)

## Unit: milliseconds
##  expr     min      lq    mean median      uq     max neval cld
##     R 161.464 162.051 164.437 163.37 163.698 178.903    10   b
##   cpp   9.654   9.683   9.711   9.69   9.741   9.826    10  a

The mean computation time for the R code is 164 milliseconds, while the C++ code takes 10 milliseconds, a speed-up of 16.9x.

pylmm

Now I’ll try running pylmm in the same way. First, I need to save the data to a set of CSV files.

for(i in c("kinship", "pheno", "covar"))
    write.table(recla[[i]], paste0(i, ".csv"), sep=",", quote=FALSE,
                row.names=FALSE, col.names=FALSE)

I’m not sure the best way to time the code, so I’ll just grab the start and stop time for the interesting bit.

# this chunk is python code
import lmm
import numpy
import time

n_times = 10

# load the datasets
kinship = numpy.genfromtxt('kinship.csv', delimiter=',')
pheno =   numpy.genfromtxt('pheno.csv', delimiter=',')
covar =   numpy.genfromtxt('covar.csv', delimiter=',')

# loop over the columns in pheno
#    fit the LMM by REML or ML and print the results to STDIN
start_time = time.time()
for j in range(n_times):
    for i in range(pheno.shape[1]):
        result = lmm.LMM(pheno[:,[i]], kinship, X0=covar).fit(REML=True)
stop_time = time.time()
ave_time = (stop_time - start_time)/n_times
open('pylmm_time.txt', 'w').write(str(ave_time) + "\n")

This isn’t really a fair comparison, but I’m seeing a time of 1.47 sec for pylmm, 0.45 sec for the R version of lmmlite, and 0.17 sec for the C++ version of lmmlite, so speed-ups of 3.3x and 8.6x for the R and C++ code in lmmlite, vs pylmm.

As a more fair comparison, I’ll re-run the lmmlite analysis, considering each phenotype individually.

time_full2 <- microbenchmark(R={
    for(i in 1:ncol(recla$pheno)) {
        keep <- !is.na(recla$pheno[,i])
        eigenrot <- eigen_rotation(recla$kinship[keep,keep],
                                   recla$pheno[keep,i],
                                   recla$covar[keep,], use_cpp=FALSE)
        res <- fitLMM(eigenrot$Kva, eigenrot$y, eigenrot$X, reml=TRUE, use_cpp=FALSE)
    }},
               cpp={
    for(i in 1:ncol(recla$pheno)) {
        keep <- !is.na(recla$pheno[,i])
        eigenrot <- eigen_rotation(recla$kinship[keep,keep],
                                   recla$pheno[keep,i],
                                   recla$covar[keep,], use_cpp=TRUE)
        res <- fitLMM(eigenrot$Kva, eigenrot$y, eigenrot$X, reml=TRUE, use_cpp=TRUE)
    }}, times=10)
print(time_full2, digits=4)

## Unit: seconds
##  expr   min    lq  mean median    uq   max neval cld
##     R 1.649 1.714 1.736  1.737 1.747 1.837    10   b
##   cpp 1.058 1.061 1.081  1.065 1.073 1.217    10  a

Now, in comparison to 1.47 sec for pylmm, I see 1.74 sec for the R version of lmmlite, and 1.08 sec for the C++ version of lmmlite, so speed-ups of 0.8x and 1.4x for the R and C++ code in lmmlite, vs pylmm. (Actually, the R version of lmmlite is slower than pylmm, so that’s not really a “speed-up”. The C++ version is faster, though.)

Session info

devtools::session_info()

## Session info --------------------------------------------------------------

##  setting  value
##  version  R version 3.2.2 (2015-08-14)
##  system   x86_64, darwin13.4.0
##  ui       X11
##  language en_US.UTF-8
##  collate  en_US.UTF-8
##  tz       America/Chicago
##  date     2015-11-24

## Packages ------------------------------------------------------------------

##  package        * version    date       source
##  assertthat       0.1        2013-12-06 CRAN (R 3.2.0)
##  broman         * 0.61-3     2015-11-22 local
##  codetools        0.2-14     2015-07-15 CRAN (R 3.2.0)
##  colorspace       1.2-6      2015-03-11 CRAN (R 3.2.0)
##  devtools         1.9.1      2015-09-11 CRAN (R 3.2.0)
##  digest           0.6.8      2014-12-31 CRAN (R 3.2.0)
##  evaluate         0.8        2015-09-18 CRAN (R 3.2.0)
##  formatR          1.2.1      2015-09-18 CRAN (R 3.2.0)
##  ggplot2          1.0.1.9003 2015-11-11 Github (hadley/ggplot2@23a748f)
##  gtable           0.1.2      2012-12-05 CRAN (R 3.2.0)
##  htmltools        0.2.6      2014-09-08 CRAN (R 3.2.0)
##  knitr            1.11.16    2015-11-16 Github (yihui/knitr@6e8ce0c)
##  lattice          0.20-33    2015-07-14 CRAN (R 3.2.0)
##  lmmlite        * 0.1-9      2015-11-24 local
##  magrittr         1.5        2014-11-22 CRAN (R 3.2.0)
##  memoise          0.2.1      2014-04-22 CRAN (R 3.2.0)
##  microbenchmark * 1.4-2      2014-09-28 CRAN (R 3.2.0)
##  multcomp         1.4-1      2015-07-23 CRAN (R 3.2.0)
##  munsell          0.4.2      2013-07-11 CRAN (R 3.2.0)
##  mvtnorm          1.0-3      2015-07-22 CRAN (R 3.2.0)
##  plyr             1.8.3      2015-06-12 CRAN (R 3.2.0)
##  Rcpp             0.12.2     2015-11-15 CRAN (R 3.2.2)
##  rmarkdown        0.8.1      2015-10-10 CRAN (R 3.2.2)
##  sandwich         2.3-4      2015-09-24 CRAN (R 3.2.2)
##  scales           0.3.0      2015-08-25 CRAN (R 3.2.0)
##  stringi          1.0-1      2015-10-22 CRAN (R 3.2.0)
##  stringr          1.0.0      2015-04-30 CRAN (R 3.2.0)
##  survival         2.38-3     2015-07-02 CRAN (R 3.2.0)
##  TH.data          1.0-6      2015-01-05 CRAN (R 3.2.0)
##  yaml             2.1.13     2014-06-12 CRAN (R 3.2.0)
##  zoo              1.7-12     2015-03-16 CRAN (R 3.2.0)