By M. Elizabeth Ross, MD, PhD
Nathan Cummings Professor and Head, Laboratory of Neurogenetics and Development; Director, Center
for Neurogenetics; Chair, Neuroscience Graduate Program; Weill Cornell Medical College
SYNOPSIS: The combination of genome-wide association studies with the analysis of messenger ribonucleic acid and unique proteins in the brain, cerebrospinal fluid, and plasma can shed new light on our understanding of the genetic risks for the development of various neurological diseases.
SOURCE: Yang C, Farias FHG, Ibanez L, et al. Genomic atlas of the proteome from brain, CSF and plasma prioritizes proteins implicated in neurological disorders. Nat Neurosci 2021; Jul 8. doi: 10.1038/s41593-021-00886-6. [Online ahead of print].
A number of genome-wide association studies (GWAS) have used populations to identify genome regions (loci) associated with complex traits in various common medical diseases, including diabetes, cardiovascular disease, Alzheimer’s disease, and other neurodegenerative disorders. When combined with ribonucleic acid (RNA) sequence data, typically from blood mononuclear cells, it has been possible to generate expression quantitative trait loci, or eQTLs, to help identify the genetic variants that drive phenotypic manifestations of disease. The report by Yang and colleagues at Washington University provides a next critical step in the process by using protein expression combined with genetic loci to identify protein QTLs, or pQTLs, to find biologically meaningful associations, new biomarkers, and promising drug targets for treating neurological disease.
Since deoxyribonucleic acid (DNA) sequence alterations may lead to altered protein levels without affecting levels of messenger RNA, pQTLs have the potential to reveal important disease associations that otherwise would not be detected using eQTLs. This report is of special interest since it examines protein levels not only in plasma but also in parietal brain tissue and cerebrospinal fluid (CSF) as well.
Starting with a cohort of 1,537 participants of European ancestry, groups broke down into 971 CSF samples from participants (249 with Alzheimer’s disease, 717 cognitively normal); 636 plasma samples (230 patients with Alzheimer’s disease, 401 cognitively normal); 458 parietal brain samples (297 patients, 27 normal controls, and 134 with unknown status [e.g., frontotemporal dementia and other neurological disorders]). Although other studies have examined the proteome using mass spectrometry, this project employed the capture of proteins using an aptamer-based platform consisting of modified, fluorescently tagged, single-strand DNA molecules that individually bind to specific proteins. Relative protein concentrations were measured by fluorescence intensity, which allowed for high throughput evaluation of 1,305 proteins.
The differential expression of proteins between patients and controls was compared to DNA genome sequence, single nucleotide polymorphisms (SNPs) that were defined as in “cis” (within 1 Mb upstream or downstream of the differentially expressed protein) or in “trans” (an SNP more than 2 Mb away from the gene encoding a particular protein differentially expressed).
Investigators found 274 significant pQTLs in CSF (223 of them novel), 127 in plasma (17 novel), and 32 in brain samples (27 of them novel). Several take-home points emerged. The majority of pQTLs (76% to 94%) were in cis, with SNPs close by to the gene encoding a differentially expressed protein. This likely accounts for the observed 42% to 53% of cis-pQTLs relating to a protein-coding SNP, contrasted with only 2% to 5% of RNA-based eQTLs, explained by protein-coding SNPs. Of those pQTLs in trans, more than 90% were on a different chromosome than the gene encoding the protein in question, suggesting an indirect, downstream effect of that genetic locus on the target protein.
Compared with other types of QTLs, 48% of brain pQTLs and 76.6% of CSF pQTLs in this study had no overlap with RNA expression, RNA splicing, DNA methylation, or histone acetylation. The authors interpreted this as indicating that protein level expression may provide some of the missing heritability of neurological disease, especially when combined with other molecular traits. They also analyzed proteins implicated in these pQTLs to statistically identify proteins associated with disease risk (an approach called Mendelian randomization, or MR).
Relating to Alzheimer’s disease risk, the researchers found three proteins in CSF, 13 in plasma, and seven in brain samples. For example, variants in CD33 — a microglia-specific gene — emerged as a signal for Alzheimer’s disease risk. A clinical trial for anti-CD33 antibody as a therapeutic for Alzheimer’s disease currently is underway.
COMMENTARY
This report is significant as an advance in the availability of large-scale proteomic information in multi-tissue datasets that may be applied to pQTLs using existing GWAS data for these and other neurological disorders. The report constitutes a unique resource for expanding MR analyses of complex traits contributing to neurological disorders. Through integration with other QTL and omic data analyses, this approach has the potential to fill a substantial gap in understanding the contribution of genetic variation — in combination with environmental influences — to brain disease.