Characterizing the genetic
basis of transcriptome diversity
through RNA-sequencing of 922
individuals
Study
Understanding the consequences of regulatory variation in the human
genome remains a major challenge, with important implications for
understanding gene regulation and interpreting the many disease-risk
variants that fall outside of protein-coding regions. Here, we provide a
direct window into the regulatory consequences of genetic variation by
sequencing RNA from 922 genotyped individuals. We present a
comprehensive description of the distribution of regulatory variation –
by the specific expression phenotypes altered, the properties of
affected genes, and the genomic characteristics of regulatory variants.
We detect variants influencing expression of over ten thousand genes,
and through the enhanced resolution offered by RNA-sequencing, for the
first time we identify thousands of variants associated with specific
phenotypes including splicing and allelic expression. Evaluating the
effects of both long-range intra- chromosomal and trans
(cross-chromosomal) regulation, we observe modularity in the regulatory
network, with three-dimensional chromosomal configuration playing a
particular role in regulatory modules within each chromosome. We also
observe a significant depletion of regulatory variants affecting central
and critical genes, along with a trend of reduced effect sizes as
variant frequency increases, providing evidence that purifying selection
and buffering have limited the deleterious impact of regulatory
variation on the cell. Further, generalizing beyond observed variants,
we have analyzed the genomic properties of variants associated with
expression and splicing, and developed a Bayesian model to predict
regulatory consequences of genetic variants, applicable to the
interpretation of individual genomes and disease studies. Together,
these results represent a critical step toward characterizing the
complete landscape of human regulatory variation.
Code
Matlab implementation of LRVM available here
lrvm.zip
Data
Genotype, raw RNA-seq, quantified expression, and covariate data are
available by application through the NIMH Center for Collaborative
Genomic Studies on Mental Disorders. Instructions for requesting access
to data can be found at:
https:www.nimhgenetics.org/access_data_biomaterial.php,
Inquiries
should
reference the “Depression Genes and Networks study (D. Levinson, PI)”.
For convenience, significant QTLs (FDR 0.05) are
available for download here.
Related publications
Alexis Battle, Sara Mostafavi,
Xiaowei Zhu, James
B. Potash, Myrna M. Weissman, Courtney
McCormick, Christian D. Haudenschild, Kenneth
B. Beckman, Jianxin Shi,
Rui Mei, Alexander
E. Urban, Stephen B. Montgomery, Douglas
F. Levinson, Daphne Koller, Characterizing the
genetic basis of transcriptome diversity through
RNA-sequencing of 922 individuals, Genome Research 2013.
(accepted preprint)
Contact information
Questions can be addressed to:
ajbattle@cs.stanford.edu