R/qtl2scan includes code with several options for linear regression. We’re mostly focusing on using the Eigen library via RcppEigen, particularly linear regression via QR decomposition with column pivoting. We want speed, but we often have to deal with rank-deficient matrices.

A previous version of qtl2scan (0.3-8) included code to do linear regression via LAPACK (more below), but we’ve since removed it, since Eigen seems sufficient, and inclusion of the LAPACK code requires availability of a Fortran compiler. The results reported here were obtained using that older version of qtl2scan.

In addition to routines that return coefficient estimates and estimated standard errors (SEs), we also have routines that return just the residual sum of squares (RSS), as for the calculation of LOD scores, that’s all that matters. We might perform a lot of regressions, getting just the RSS as a measure of quality-of-fit, and then go back to the interesting cases to look at the full set of coefficients and SEs.

Here, we present a few benchmarks comparing a couple of different routines from the Eigen library (QR decomposition with column pivoting and Cholesky decomposition; the latter doesn’t handle the rank-deficient case). We compare these to the LAPACK routines used in R/qtl: dgels and dgelsy. We also consider R’s lm.fit, which is the workhorse behind lm.

These linear regression functions are not exported, so we need to use qtl2:::.

Simple case

First let’s look at a simple case, of a single covariate.

set.seed(27343534)
n <- 1000
x <- rnorm(n, 50, 10)
X <- cbind(1, x)
y <- 30 + 0.5*x + rnorm(n, 0, 2.5)
Y <- as.matrix(y)

We use the microbenchmark package and run each of the routines 1000 times.

library(qtl2scan)
library(microbenchmark)
microbenchmark(lmfit =        lm.fit(X, y),
               dgels=         qtl2scan:::calc_rss_lapack(X, Y),
               dgelsy=        qtl2scan:::calc_rss_lapack(X, Y, skip_dgels=TRUE),
               eigen_qr_rss=  qtl2scan:::calc_rss_eigenqr(X, y),
               eigen_qr=      qtl2scan:::fit_linreg_eigenqr(X, y),
               eigen_chol_rss=qtl2scan:::calc_rss_eigenchol(X, y),
               eigen_chol=    qtl2scan:::fit_linreg_eigenchol(X, y),
               times=1000)
## Unit: microseconds
##            expr     min       lq      mean   median       uq      max neval   cld
##           lmfit 146.487 155.1875 173.18889 156.8080 159.4725 1466.197  1000     e
##           dgels  56.780  60.3695  66.06252  62.0075  64.1240 1253.733  1000   c
##          dgelsy  82.091  85.2330  94.77884  87.1060  89.1975 6307.761  1000    d
##    eigen_qr_rss  47.847  50.8585  52.69912  52.1250  53.8070   90.419  1000 ab
##        eigen_qr  55.177  58.7410  62.91448  59.9665  62.0185 1102.191  1000  bc
##  eigen_chol_rss  36.497  39.9840  41.72484  41.2695  43.1555  114.879  1000 a
##      eigen_chol  43.440  47.0105  55.10859  48.4150  50.6455 1546.386  1000  bc

It’s easy to beat lm.fit, but it’s great to see considerable gains with Eigen over LAPACK. The use of the Cholesky decomposition is somewhat faster than QR decomposition, but it can’t be used in the rank-deficient case.

Rank-deficient case

Let’s now consider a rank-deficient case. We use an example from Bates and Eddelbuettel (2013) (the RcppEigen paper).

dd <- data.frame(f1 = gl(4, 6, labels = LETTERS[1:4]),
                 f2 = gl(3, 2, labels = letters[1:3]))[-(7:8), ]
mm <- model.matrix(~ f1*f2, dd)
y <- mm %*% seq_len(ncol(mm)) + rnorm(nrow(mm), sd = 0.1)
Y <- as.matrix(y)

And here are the benchmark results.

microbenchmark(lmfit=       lm.fit(mm, y),
               dgels=       qtl2scan:::calc_rss_lapack(mm, Y),
               dgelsy=      qtl2scan:::calc_rss_lapack(mm, Y, skip_dgels=TRUE),
               eigen_qr_rss=qtl2scan:::calc_rss_eigenqr(mm, y),
               eigen_qr=    qtl2scan:::fit_linreg_eigenqr(mm, y),
               times=1000)
## Unit: microseconds
##          expr     min       lq      mean   median       uq      max neval  cld
##         lmfit 107.582 118.7895 128.68169 121.3000 127.7230 2700.053  1000    d
##         dgels  52.998  57.9310  61.92482  61.2320  64.1460  290.752  1000   c
##        dgelsy  43.105  47.7965  51.69704  50.9500  53.6810  188.029  1000  b
##  eigen_qr_rss  33.332  38.9520  43.53250  43.6290  46.3765   75.915  1000 a
##      eigen_qr  39.691  45.2230  52.68842  50.2595  53.5445 2451.936  1000  b

We skip the methods based on Cholesky decomposition, but we again see performance gains with Eigen over LAPACK.

dgels is slower than dgelsy here, because with that routine, I’m calling dgels to check the matrix rank and then switch to dgelsy if the matrix is rank-deficient.

For further discussion of dgels vs. dgelsy, see http://www.netlib.org/lapack/lug/node71.html, which says:

DGELS is the fastest. DGELSY [uses] QR with pivoting, and so [handles] rank-deficient problems more reliably than DGELS but can be more expensive.

Bigger matrix

Let’s consider a bigger matrix, with 2,500 rows and 300 columns. I don’t want it to be too big, or compiling this vignette will take too long. Try this yourself with bigger matrices, if you want.

n <- 2500
p <- 300
X <- matrix(rnorm(n*p), nrow=n)
y <- X %*% runif(p) + rnorm(n)
Y <- as.matrix(y)

We use the full set again, as we can assume this to be a full-rank case. But we’ll just do 10 replicates.

microbenchmark(lmfit =        lm.fit(X, y),
               dgels=         qtl2scan:::calc_rss_lapack(X, Y),
               dgelsy=        qtl2scan:::calc_rss_lapack(X, Y, skip_dgels=TRUE),
               eigen_qr_rss=  qtl2scan:::calc_rss_eigenqr(X, y),
               eigen_qr=      qtl2scan:::fit_linreg_eigenqr(X, y),
               eigen_chol_rss=qtl2scan:::calc_rss_eigenchol(X, y),
               eigen_chol=    qtl2scan:::fit_linreg_eigenchol(X, y),
               times=10)
## Unit: milliseconds
##            expr       min        lq      mean    median        uq       max neval  cld
##           lmfit 237.34363 237.68151 263.54147 240.37376 242.03065 362.72523    10    d
##           dgels 251.07461 252.04762 253.73086 253.65026 255.26914 256.61895    10    d
##          dgelsy 265.04467 265.44633 266.45315 266.43768 266.74483 269.30555    10    d
##    eigen_qr_rss 138.26806 139.14996 141.15809 140.45880 144.04247 144.64668    10  b
##        eigen_qr 168.74685 168.88070 170.04622 169.49833 170.17323 174.20644    10   c
##  eigen_chol_rss  30.22184  30.39548  30.96240  30.77930  31.17019  32.69367    10 a
##      eigen_chol  34.87328  34.89145  35.81083  35.45013  36.35371  38.11364    10 a

Multiple phenotypes

We’ll go back to the case with a single covariate, but we’ll consider 500 phenotypes. We have a few routines to handle this, focusing solely on calculating the residual sum of squares for each phenotype.

n <- 1000
x <- rnorm(n, 50, 10)
X <- cbind(1, x)
y <- 30 + 0.5*x + rnorm(n, 0, 2.5)
Y <- qtl2scan:::permute_nvector(500, y)

Now we’ll get the benchmarks.

microbenchmark(lmfit =    colSums(lm.fit(X, Y)$resid^2),
               dgels=     qtl2scan:::calc_rss_lapack(X, Y),
               dgelsy=    qtl2scan:::calc_rss_lapack(X, Y, skip_dgels=TRUE),
               eigen_qr=  qtl2scan:::calc_mvrss_eigenqr(X, Y),
               eigen_chol=qtl2scan:::calc_mvrss_eigenchol(X, Y),
               times=50)
## Unit: milliseconds
##        expr       min        lq      mean     median         uq        max neval cld
##       lmfit 13.032342 14.662763 91.435604 133.382386 135.258572 136.616153    50   b
##       dgels  6.272388  6.495671 11.895399   6.613942   8.060865 127.826941    50  a
##      dgelsy  9.461087  9.596972 10.055839   9.699649  10.084062  12.205911    50  a
##    eigen_qr  5.466869  5.583197  5.868297   5.667449   5.933052   9.914935    50  a
##  eigen_chol  4.942614  5.082557  5.339065   5.159700   5.353120   7.946770    50  a

Multiple phenotypes, rank-deficient matrix

Let’s do the same thing with a rank-deficient matrix.

dd <- data.frame(f1 = gl(4, 6, labels = LETTERS[1:4]),
                 f2 = gl(3, 2, labels = letters[1:3]))[-(7:8), ]
mm <- model.matrix(~ f1*f2, dd)
y <- mm %*% seq_len(ncol(mm)) + rnorm(nrow(mm), sd = 0.1)
Y <- qtl2scan:::permute_nvector(500, y)

And here are the benchmarks.

microbenchmark(lmfit =    colSums(lm.fit(mm, Y)$resid^2),
               dgels=     qtl2scan:::calc_rss_lapack(mm, Y),
               dgelsy=    qtl2scan:::calc_rss_lapack(mm, Y, skip_dgels=TRUE),
               eigen_qr=  qtl2scan:::calc_mvrss_eigenqr(mm, Y),
               times=100)
## Unit: microseconds
##      expr      min       lq      mean   median       uq      max neval cld
##     lmfit 1049.911 1122.129 1412.9006 1249.513 1394.615 6488.325   100   c
##     dgels 1112.870 1146.343 1254.5212 1208.819 1297.528 2315.482   100  b
##    dgelsy  694.283  716.444  798.2769  755.031  800.184 1812.793   100 a
##  eigen_qr 1000.551 1056.899 1167.3293 1135.697 1218.653 2044.882   100  b

Session information

The following shows the R and package versions I was using.

devtools::session_info()
## Session info ---------------------------------------------------------------------------------------
##  setting  value
##  version  R version 3.3.3 (2017-03-06)
##  system   x86_64, darwin13.4.0
##  ui       X11
##  language en_US.UTF-8
##  collate  en_US.UTF-8
##  tz       America/Chicago
##  date     2017-04-19
## Packages -------------------------------------------------------------------------------------------
##  package        * version date       source
##  backports        1.0.5   2017-01-18 CRAN (R 3.3.2)
##  codetools        0.2-15  2016-10-05 CRAN (R 3.3.0)
##  colorspace       1.3-2   2016-12-14 CRAN (R 3.3.2)
##  devtools         1.12.0  2016-06-24 CRAN (R 3.3.0)
##  digest           0.6.12  2017-01-27 CRAN (R 3.3.2)
##  evaluate         0.10    2016-10-11 CRAN (R 3.3.0)
##  ggplot2          2.2.1   2016-12-30 CRAN (R 3.3.2)
##  gtable           0.2.0   2016-02-26 CRAN (R 3.3.0)
##  htmltools        0.3.5   2016-03-21 CRAN (R 3.3.0)
##  knitr            1.15.1  2016-11-22 CRAN (R 3.3.2)
##  lattice          0.20-35 2017-03-25 CRAN (R 3.3.2)
##  lazyeval         0.2.0   2016-06-12 CRAN (R 3.3.0)
##  magrittr         1.5     2014-11-22 CRAN (R 3.3.0)
##  MASS             7.3-45  2016-04-21 CRAN (R 3.3.0)
##  Matrix           1.2-8   2017-01-20 CRAN (R 3.3.2)
##  memoise          1.0.0   2016-01-29 CRAN (R 3.3.0)
##  microbenchmark * 1.4-2.1 2015-11-25 CRAN (R 3.3.0)
##  multcomp         1.4-6   2016-07-14 CRAN (R 3.3.0)
##  munsell          0.4.3   2016-02-13 CRAN (R 3.3.0)
##  mvtnorm          1.0-6   2017-03-02 CRAN (R 3.3.2)
##  plyr             1.8.4   2016-06-08 CRAN (R 3.3.0)
##  qtl2scan       * 0.3-8   2017-04-19 Github (kbroman/qtl2scan@f1b9bd3)
##  Rcpp             0.12.10 2017-03-19 CRAN (R 3.3.3)
##  rmarkdown        1.4     2017-03-24 CRAN (R 3.3.2)
##  rprojroot        1.2     2017-01-16 CRAN (R 3.3.2)
##  sandwich         2.3-4   2015-09-24 CRAN (R 3.3.0)
##  scales           0.4.1   2016-11-09 CRAN (R 3.3.2)
##  stringi          1.1.5   2017-04-07 CRAN (R 3.3.2)
##  stringr          1.2.0   2017-02-18 CRAN (R 3.3.2)
##  survival         2.41-3  2017-04-04 CRAN (R 3.3.2)
##  TH.data          1.0-8   2017-01-23 CRAN (R 3.3.2)
##  tibble           1.3.0   2017-04-01 CRAN (R 3.3.2)
##  withr            1.0.2   2016-06-20 CRAN (R 3.3.0)
##  yaml             2.1.14  2016-11-12 CRAN (R 3.3.2)
##  zoo              1.8-0   2017-04-12 CRAN (R 3.3.3)