GWAS Central: a comprehensive resource for the comparison and interrogation of genome-wide association studies (2024)

The genome-wide association study (GWAS) era has improved our understanding of disease aetiology by identifying genetic variants associated with complex human traits and disease phenotypes. However, to fully evaluate the data emerging from these studies, researchers need convenient ways to access and visualize the totality of investigations so far completed, while not compromising any individual’s privacy or informed consent. To this end, we herein describe GWAS Central, a comprehensive genetic association database, designed to enable multiple study integration via graphical displays and extensive textual content.

Other GWAS depositories, such as dbGaP1 (http://www.ncbi.nlm.nih.gov/gap/) and EGA (http://www.ebi.ac.uk/ega/), act as archival systems that provide controlled access to individual-level GWAS data and open access to some categories of summary-level data. This approach is merited, given that it is possible to identify the participation of a research subject within the full range of summary-level data.2 Smaller amounts of summary-level GWAS data are available from resources such as the NHGRI GWAS Catalog3 and the Open Access Database of Genome-wide Association Results (OADGAR),4 with their content being restricted to marker signals that exceed predefined P-value thresholds. The semi-arbitrary imposition of such cut-offs is unfortunate, in that it prevents direct comparison across the totality of signals (within and between related studies), in order to identify consistently positive markers.

Convenient, dedicated resources that provide unfettered access to all GWAS summary-level data are therefore needed, powered by user-friendly tools for instant interrogation and visualization of unified views of the data. In particular, such displays also need to incorporate information about the tested markers, such as chromosome location, alleles and 5′ and 3′ flanking sequences of SNPs. Ideally, the investigated phenotypes will be represented by standardized terminologies, thus allowing meaningful cross-study searches to be conducted.

On the basis of the above considerations, we designed and created the GWAS Central resource (http://www.gwascentral.org). Here we describe the ways in which this database enables experimental biologists to explore and compare data in the GWAS domain, from either a genotype or phenotype starting point.

Implementation

GWAS Central collates association data and study metadata from many disparate sources whose data are available in different formats and to differing degrees of detail. These diverse GWAS data are integrated in a flexible and coherent data model that was described previously in an earlier incarnation of the database, named HGVbaseG2P.5 GWAS Central builds upon core genomic variation visualization and comparison concepts from HGVbaseG2P to provide new features, such as downloadable detailed data reports, semantically standardized phenotype ontology searching, optimized data visualizations, private upload and comparison of user data, and tools for remote data interrogation. The various resources collated by GWAS Central include data sets from other sites, such as the NHGRI GWAS Catalog, OADGAR and complete association data sets from the 10 trait-based investigations of the 1958 Birth Cohort.6 In addition, a substantial amount of data have also been obtained by directly requesting data from researchers and consortia and from numerous unsolicited data submissions from researchers who wish for their newly published data to be included in GWAS Central. All data submitters are fully acknowledged, with the contributing resources and the original authors of each study cited on the website.

The gathered and submitted data are extensively curated to maximize quality and completeness. This includes checking that all genetic markers have valid dbSNP rs numbers, assessing whether the alleles and strand representation of these are correct, eliminating duplicate markers, combining multiple data sets for discrete studies and populating extensive metadata. In addition, we manually evaluate each study for its range of phenotype content and apply appropriately chosen ontology terms to ensure that the phenotype descriptions are standardized across all studies. In this task, we identify for each phenotype an equivalent or most appropriate term from the National Library of Medicine’s MeSH controlled vocabulary. MeSH is used because it offers familiarity to biologists as a result of PubMed MEDLINE indexing and it also provides good generalized descriptions of phenotypes. The Human Phenotype Ontology (HPO) is also used to annotate phenotypes in cases where HPO offers a more specific description.7

To allow flexible access and data discovery, GWAS Central queries are structured into three types, namely, genotype, phenotype or keyword orientated. Genotype searches can be based on HGNC gene symbols, genomic region coordinates or dbSNP rs numbers. Phenotype searches are linked to MeSH and HPO annotations, as well as to the original free-text descriptions used in publications. Keyword searches interrogate text contained in study titles and abstracts, PubMed IDs and author names.

GWAS Central: a comprehensive resource for the comparison and interrogation of genome-wide association studies (2024)
Top Articles
Latest Posts
Article information

Author: Mr. See Jast

Last Updated:

Views: 6786

Rating: 4.4 / 5 (75 voted)

Reviews: 82% of readers found this page helpful

Author information

Name: Mr. See Jast

Birthday: 1999-07-30

Address: 8409 Megan Mountain, New Mathew, MT 44997-8193

Phone: +5023589614038

Job: Chief Executive

Hobby: Leather crafting, Flag Football, Candle making, Flying, Poi, Gunsmithing, Swimming

Introduction: My name is Mr. See Jast, I am a open, jolly, gorgeous, courageous, inexpensive, friendly, homely person who loves writing and wants to share my knowledge and understanding with you.