The GWASapi package provides access to the NHGRI-EBI catalog of GWAS summary statistics. For details on the API, see its documentation, as well as Pjotr Prins’s documentation at github.
You can install GWASapi from GitHub.
You first need to install the devtools.
Then use devtools::install_github()
to install GWASapi.
Load the package with library()
.
The purpose of the GWASapi package is to provide access to summary statistics for human GWAS. First, you can get lists of studies and traits that are available.
To get lists of studies, use list_studies()
. The default is to return just 20 studies. You can control that limit with the argument size
. You can also use start
to step through the full set.
## [1] "GCST000028" "GCST000392" "GCST000568" "GCST000569" "GCST000571"
To retrieve all studies, set a higher limit
## [1] 382
To get a list of traits, use list_traits()
. Again the default is to return just 20 values. To get all traits, use the size
argument.
## [1] 382
The traits are returned as identifiers like EFO_0001360. To get a description of a trait, you can use the ontology lookup service, for example https://www.ebi.ac.uk/efo/EFO_0001360
Note that the traits returned are not all distinct.
##
## 1 2 3 4 5 6 8 9 10
## 177 30 18 5 4 4 1 1 1
Chromosomes are stored as integers 1-24.
## [1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
To get associations for a specific variant by its rs-number, use get_variant()
. If you know the chromosome it is on, you’ll get faster results by providing the chromosome. And again, the default is to return just 20 values, so use the size
and start
arguments if you want a comprehensive list.
## p_value study_accession trait
## 0 0.02980604 GCST000392 EFO_0001359
## 1 0.39340000 GCST000568 EFO_0004465
## 2 0.39340000 GCST000568 EFO_0004465
## 3 0.60860000 GCST000569 EFO_0004307
## 4 0.29170000 GCST000571 EFO_0004466
Use the arguments p_lower
and p_upper
to focus on associations with p-value in a specified range. For example, to get all of the associations with p-value < 10-10, you would do:
result <- get_variant("rs2228603", 19, p_upper=1e-10)
result[,c("p_value", "study_accession", "trait")]
## p_value study_accession trait
## 0 1.739e-57 GCST002216 EFO_0004530
## 1 1.739e-57 GCST002216 EFO_0004530
## 2 1.049e-62 GCST002221 EFO_0004574
## 3 1.049e-62 GCST002221 EFO_0004574
## 4 4.433e-44 GCST002222 EFO_0004611
## 5 4.433e-44 GCST002222 EFO_0004611
To get associations for a specific region, use get_asso()
. For example, to get the region from 19.2 Mbp to 19.3 Mbp on chr 19:
result <- get_asso(chr=19, bp_lower=19200000, bp_upper=19300000)
result[,c("chromosome", "base_pair_location", "p_value", "study_accession", "trait")]
## chromosome base_pair_location p_value study_accession trait
## 0 19 19208675 0.34851112 GCST000392 EFO_0001359
## 1 19 19219115 0.02980604 GCST000392 EFO_0001359
## 2 19 19224274 0.72647403 GCST000392 EFO_0001359
## 3 19 19225799 0.09019226 GCST000392 EFO_0001359
## 4 19 19230637 0.59469796 GCST000392 EFO_0001359
## 5 19 19233770 0.93955453 GCST000392 EFO_0001359
## 6 19 19247863 0.76167551 GCST000392 EFO_0001359
## 7 19 19250926 0.24164792 GCST000392 EFO_0001359
## 8 19 19274602 0.71375546 GCST000392 EFO_0001359
## 9 19 19277637 0.43687139 GCST000392 EFO_0001359
## 10 19 19280236 0.76699603 GCST000392 EFO_0001359
## 11 19 19281592 0.70582591 GCST000392 EFO_0001359
## 12 19 19297355 0.78331140 GCST000392 EFO_0001359
## 13 19 19207349 0.83550000 GCST000568 EFO_0004465
## 14 19 19207349 0.83550000 GCST000568 EFO_0004465
## 15 19 19208675 0.91250000 GCST000568 EFO_0004465
## 16 19 19208675 0.91250000 GCST000568 EFO_0004465
## 17 19 19213634 0.73850000 GCST000568 EFO_0004465
## 18 19 19213634 0.73850000 GCST000568 EFO_0004465
## 19 19 19219115 0.39340000 GCST000568 EFO_0004465
You can restrict those results to a particular study.
result <- get_asso(chr=19, bp_lower=19200000, bp_upper=19300000, study="GCST000392")
result[,c("chromosome", "base_pair_location", "p_value", "study_accession", "trait")]
## chromosome base_pair_location p_value study_accession trait
## 0 19 19208675 0.34851112 GCST000392 EFO_0001359
## 1 19 19219115 0.02980604 GCST000392 EFO_0001359
## 2 19 19224274 0.72647403 GCST000392 EFO_0001359
## 3 19 19225799 0.09019226 GCST000392 EFO_0001359
## 4 19 19230637 0.59469796 GCST000392 EFO_0001359
## 5 19 19233770 0.93955453 GCST000392 EFO_0001359
## 6 19 19247863 0.76167551 GCST000392 EFO_0001359
## 7 19 19250926 0.24164792 GCST000392 EFO_0001359
## 8 19 19274602 0.71375546 GCST000392 EFO_0001359
## 9 19 19277637 0.43687139 GCST000392 EFO_0001359
## 10 19 19280236 0.76699603 GCST000392 EFO_0001359
## 11 19 19281592 0.70582591 GCST000392 EFO_0001359
## 12 19 19297355 0.78331140 GCST000392 EFO_0001359
To get associations for a given trait, use get_trait_asso()
. You can’t restrict this to a given chromosome region.
## [1] 106
## chromosome base_pair_location p_value study_accession trait
## 0 10 112988738 4.38e-109 GCST006867 EFO_0001360
## 1 10 112988858 4.96e-127 GCST006867 EFO_0001360
## 2 10 112989975 2.98e-221 GCST006867 EFO_0001360
## 3 10 112990477 7.82e-202 GCST006867 EFO_0001360
## 4 10 112990621 2.17e-202 GCST006867 EFO_0001360