GWASapi User Guide

The GWASapi package provides access to the NHGRI-EBI catalog of GWAS summary statistics. For details on the API, see its documentation, as well as Pjotr Prins’s documentation at github.

Installation

You can install GWASapi from GitHub.

You first need to install the devtools.

install.packages("devtools")

Then use devtools::install_github() to install GWASapi.

library(devtools)
install_github("rqtl/GWASapi")

Load the package with library().

library(GWASapi)

Lists of things

The purpose of the GWASapi package is to provide access to summary statistics for human GWAS. First, you can get lists of studies and traits that are available.

To get lists of studies, use list_studies(). The default is to return just 20 studies. You can control that limit with the argument size. You can also use start to step through the full set.

list_studies(size=5)
## [1] "GCST000028" "GCST000392" "GCST000510" "GCST000553" "GCST000568"

To retrieve all studies, set a higher limit

all_studies <- list_studies(size=2000)
length(all_studies)
## [1] 1051

To get a list of traits, use list_traits(). Again the default is to return just 20 values. To get all traits, use the size argument.

all_traits <- list_traits(size=2000)
length(all_traits)
## [1] 505

The traits are returned as identifiers like EFO_0000249. To get a description of a trait, you can use the ontology lookup service, for example https://www.ebi.ac.uk/efo/EFO_0001360

Chromosomes are stored as integers 1-24.

list_chr()
##  [1]  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24

Get associations

To get associations for a specific variant by its rs-number, use get_variant(). If you know the chromosome it is on, you’ll get faster results by providing the chromosome. And again, the default is to return just 20 values, so use the size and start arguments if you want a comprehensive list.

result <- get_variant("rs2228603", 19, size=5)
result[,c("p_value", "study_accession", "trait")]
##      p_value study_accession       trait
## 0 0.12280000      GCST000028 EFO_0001360
## 1 0.02980604      GCST000392 EFO_0001359
## 2 0.26000000      GCST000510 EFO_0004309
## 3 0.63000000      GCST000553 EFO_0004839
## 4 0.39340000      GCST000568 EFO_0004465

Use the arguments p_lower and p_upper to focus on associations with p-value in a specified range. For example, to get all of the associations with p-value < 10-10, you would do:

result <- get_variant("rs2228603", 19, p_upper=1e-10)
result[,c("p_value", "study_accession", "trait")]
##          p_value study_accession       trait
## 0   1.739000e-57      GCST002216 EFO_0004530
## 1   1.049000e-62      GCST002221 EFO_0004574
## 2   4.433000e-44      GCST002222 EFO_0004611
## 3   3.652000e-11      GCST007516 EFO_0001360
## 4   3.652000e-11      GCST007518 EFO_0001360
## 5  2.225074e-308      GCST010144 EFO_0004631
## 6   4.782919e-13      GCST010772 EFO_0004918
## 7   2.926260e-17    GCST90000614 EFO_0004631
## 8   6.308030e-18    GCST90000615 EFO_0004631
## 9   3.616430e-17    GCST90000616 EFO_0004631
## 10  1.431780e-15    GCST90000617 EFO_0004631
## 11  2.721982e-14    GCST90000618 EFO_0004631
## 12 1.800000e-205    GCST90002412 EFO_0004611
## 13  3.750000e-64    GCST90013663 EFO_0004735
## 14  1.535000e-33    GCST90013664 EFO_0004736
## 15  8.800000e-65    GCST90016673 EFO_0010821

To get associations for a specific region, use get_asso(). For example, to get the region from 19.2 Mbp to 19.3 Mbp on chr 19:

result <- get_asso(chr=19, bp_lower=19200000, bp_upper=19300000)
result[,c("chromosome", "base_pair_location", "p_value", "study_accession", "trait")]
##    chromosome base_pair_location    p_value study_accession       trait
## 0          19           19219115 0.12280000      GCST000028 EFO_0001360
## 1          19           19230637 0.12400000      GCST000028 EFO_0001360
## 2          19           19233770 0.12400000      GCST000028 EFO_0001360
## 3          19           19238138 0.27990000      GCST000028 EFO_0001360
## 4          19           19250926 0.48100000      GCST000028 EFO_0001360
## 5          19           19262880 0.76410000      GCST000028 EFO_0001360
## 6          19           19263252 0.48500000      GCST000028 EFO_0001360
## 7          19           19296909 0.13780000      GCST000028 EFO_0001360
## 8          19           19208675 0.34851112      GCST000392 EFO_0001359
## 9          19           19219115 0.02980604      GCST000392 EFO_0001359
## 10         19           19224274 0.72647403      GCST000392 EFO_0001359
## 11         19           19225799 0.09019226      GCST000392 EFO_0001359
## 12         19           19230637 0.59469796      GCST000392 EFO_0001359
## 13         19           19233770 0.93955453      GCST000392 EFO_0001359
## 14         19           19247863 0.76167551      GCST000392 EFO_0001359
## 15         19           19250926 0.24164792      GCST000392 EFO_0001359
## 16         19           19274602 0.71375546      GCST000392 EFO_0001359
## 17         19           19277637 0.43687139      GCST000392 EFO_0001359
## 18         19           19280236 0.76699603      GCST000392 EFO_0001359
## 19         19           19281592 0.70582591      GCST000392 EFO_0001359

You can restrict those results to a particular study.

result <- get_asso(chr=19, bp_lower=19200000, bp_upper=19300000, study="GCST000392")
result[,c("chromosome", "base_pair_location", "p_value", "study_accession", "trait")]
##    chromosome base_pair_location    p_value study_accession       trait
## 0          19           19208675 0.34851112      GCST000392 EFO_0001359
## 1          19           19219115 0.02980604      GCST000392 EFO_0001359
## 2          19           19224274 0.72647403      GCST000392 EFO_0001359
## 3          19           19225799 0.09019226      GCST000392 EFO_0001359
## 4          19           19230637 0.59469796      GCST000392 EFO_0001359
## 5          19           19233770 0.93955453      GCST000392 EFO_0001359
## 6          19           19247863 0.76167551      GCST000392 EFO_0001359
## 7          19           19250926 0.24164792      GCST000392 EFO_0001359
## 8          19           19274602 0.71375546      GCST000392 EFO_0001359
## 9          19           19277637 0.43687139      GCST000392 EFO_0001359
## 10         19           19280236 0.76699603      GCST000392 EFO_0001359
## 11         19           19281592 0.70582591      GCST000392 EFO_0001359
## 12         19           19297355 0.78331140      GCST000392 EFO_0001359

To get associations for a given trait, use get_trait_asso(). You can’t restrict this to a given chromosome region.

result <- get_trait_asso("EFO_0001360", p_upper=1e-100, size=1000)
nrow(result)
## [1] 71
result[1:5, c("chromosome", "base_pair_location", "p_value", "study_accession", "trait")]
##   chromosome base_pair_location   p_value study_accession       trait
## 0         14           63485893  0.00e+00      GCST006801 EFO_0001360
## 1          8          141795579  0.00e+00      GCST006801 EFO_0001360
## 2         10          112988738 4.38e-109      GCST006867 EFO_0001360
## 3         10          112988858 4.96e-127      GCST006867 EFO_0001360
## 4         10          112989975 2.98e-221      GCST006867 EFO_0001360