Resources for mouse genome build 39
The mouse genome build GRCm39 was released on 2020-06-24. Here we describe available resources for using build 39 with R/qtl2, and how to move your projects from build 38 to build 39.
We’ve focused mostly on projects related to the Collaborative Cross or Diversity Outbred (DO) mice, and on the use of the MUGA SNP arrays, such as the MegaMUGA and GigaMUGA arrays.
-
The mouse genetic map of Cox et al (2009) has been revised for the new genome build. The main changes concern inversions at the centromeres of chromosomes 10 and 14. See the github repository CoxMapV3.
-
The MUGA array annotations have been revised to use positions from the GRCm39 genome build. See the github repository MUGAarrays.
-
The data files with founder genotypes, organized for use with data in R/qtl2 format, have been revised for build GRCm39.
-
We’ve created an R package, mmconvert, which is available on CRAN, to convert map positions between build GRCm39 and the revised Cox genetic map. It is intended to serve the role of the “mouse map converter” web service from Gary Churchill’s group, which is no longer available.
The mmconvert package includes a function
cross2_to_grcm39()
for converting across2
object (created withread_cross2()
, and for a cross using the MegaMUGA and/or GigaMUGA arrays) to build GRCm39.Here is an example of its use with DO data from Karen Svenson and colleagues
library(qtl2) file <- paste0("https://raw.githubusercontent.com/rqtl/", "qtl2data/main/DO_Svenson291/svenson.zip") do <- read_cross2(file, quiet=FALSE) library(mmconvert) do_grcm39 <- cross2_to_grcm39(do)
-
A new SQLite database with CC/DO founder variants from Sanger along with ensembl genes is available on figshare. (Created by Matt Vincent at the Jackson Lab.)
Download this ~10.2 GB database as
fv.2021.snps.db3
:download.file("https://figshare.com/ndownloader/files/40157572", "fv.2021.snps.db3")
You can then use
create_variant_query_func()
as before, though you need to use theid_field
argument, as follows:qvf <- create_variant_query_func("fv.2021.snps.db3", id_field="variants_id")
The genes database uses different names for several fields, and so use
create_gene_query_func()
as follows:qgf <- create_gene_query_func("fv.2021.snps.db3", chr_field="chromosome", name_field="symbol", start_field="start_position", stop_field="end_position")