R/lmmlite is an R package for fitting linear mixed models in the context of genome-wide association studies (GWAS) and quantitative trait locus (QTL) mapping. At present, **it is not intended for “production” use.** I wrote this to try to get a better understanding of how to fit these models, and I’ve not yet put any effort into checking arguments and omitting cases with missing values.

The code closely follows Nick Furlotte’s pylmm.

The sort of model I want to fit is \(y = X\beta + \epsilon\) where \(\epsilon\) is multivariate normal with mean 0 and variance \(\sigma^2_g K + \sigma^2_e I\), where \(K\) is a known kinship matrix and \(I\) is the identity. In pylmm, the focus is on \(\sigma^2 = \sigma^2_g + \sigma^2_e\) and the heritability \(h^2 = \sigma^2_g / \sigma^2\).

We’ll focus here on the `recla`

dataset that I include with the package. This is data on a set of diversity outcross mice, from Recla et al. (2014) and Logan et al. (2013). The data are originally from the QTL Archive/Mouse Phenotype Database and I’ve also placed them online in the format used by R/qtl2. I used qtl2geno to calculate genotype probabilities from the MUGA SNP array data. The data include 261 mice, and form a list with three components: the estimated kinship matrix, a matrix with 26 phenotypes, and a covariate matrix that contains an intercept (all 1’s) and sex (0 for female and 1 for male).

```
library(lmmlite)
data(recla)
```

The package includes two implementations of the fit of linear mixed models (LMMs): plain R, and C++ (using the Rcpp and RcppEigen packages, the latter being an interface to the Eigen C++ linear algebra library).

R/lmmlite includes just four functions: `eigen_rotation()`

, `getMLsoln()`

, `calcLL()`

, and `fitLMM()`

. In each case, use `use_cpp=FALSE`

to use the plain R implementation and `use_cpp=TRUE`

(the default) to use the C++ implementation.

The first function, `eigen_rotation()`

, calculates the eigen decomposition of the kinship matrix and “rotates” the phenotypes and covariates by pre-multiplying them with the tranpose of the matrix of eigenvectors. This rotation greatly simplifies later calculations by turning a general least squares problem into a weighted least squares problem.

Care must be taken with missing values in the phenotypes and covariates. In practice, one must work with batches of phenotypes that have a common missing data pattern, and first remove individuals with missing values. Here, we’ll focus on the first phenotype.

`e <- eigen_rotation(recla$kinship, recla$pheno[,1], recla$covar)`

The output is a list containing four components: `Kva`

is the vector of eigenvalues of the kinship matrix, `Kva_t`

is the transposed matrix of eigenvectors, `y`

is the rotated phenotype matrix, and `X`

is the rotated covariate matrix.

The next function `getMLsoln()`

will estimates the coefficients (\(\beta\)) and residual variance (\(\sigma^2 = \sigma^2_g + \sigma^2_e\)) for a given value of the heritability (\(h^2 = \sigma^2_g / \sigma^2\)). It takes, as input, a single value for the heritability, and then `Kva`

, `y`

, and `X`

, as returned from `eigen_rotation()`

. A further argument `reml`

, indicates whether REML will later be used, in which case the log determinant of the matrix \(X' W X\), where \(W\) is a diagonal matrix of weights, is calculated.

`ml_soln <- getMLsoln(0.5, e$Kva, e$y, e$X)`

The result is a list containing `beta`

and `sigmasq`

, with the residual sum of squares (RSS) and log det(\(X' W X\)) included as attributes.

The function `calcLL()`

calculates the log likelihood for a given value of the heritability. (It can also take a vector of heritabilities.) It takes the same arguments as `getMLsoln()`

, and in practice one would probably not call `getMLsoln()`

directly.

```
hsq <- seq(0, 1, by=0.01)
ll <- calcLL(hsq, e$Kva, e$y, e$X)
```

We can plot the results as follows.

```
plot(hsq, ll, type="l", lwd=2, las=1,
xlab="heritability", ylab="log likelihood")
```

If you call `calcLL()`

with a single heritability value, the results include the estimates of `beta`

and `sigmasq`

as attributes.

```
one_ll <- calcLL(0.5, e$Kva, e$y, e$X)
attr(one_ll, "beta")
```

`## [1] 1413.98691 -28.17931`

`attr(one_ll, "sigmasq")`

`## [1] 276158.5`

The final function, `fitLMM()`

, optimizes the log likelihood to estimate the heritability. The result is a list containing `beta`

, `sigmasq`

, `hsq`

, `sigmasq_g`

, `sigmasq_e`

, and `loglik`

.

`(out <- fitLMM(e$Kva, e$y, e$X))`

```
## $beta
## [1] 1417.98330 -23.24218
##
## $sigmasq
## [1] 295340.5
##
## $hsq
## [1] 0.7642841
##
## $sigmasq_g
## [1] 225724.1
##
## $sigmasq_e
## [1] 69616.46
##
## $loglik
## [1] -2332.84
```

The `fitLMM()`

function includes an argument `check_boundary`

; if `TRUE`

(the default), the 0 and 1 boundaries are checked explicitly, in seeking to estimate the heritability.